Perform hierarchical clustering on coefficient data.
Usage
stat_hclust(
data,
formula = NULL,
method = "ward.D2",
dist_method = "euclidean",
center = TRUE,
scale = NULL,
k = NULL,
h = NULL,
...
)Arguments
- data
A tibble with coefficient columns
- formula
A formula specifying predictors. Can be:
Missing: auto-detects single coe column
Bare column name:
coeFormula:
~ coe,~ coe + size,~ coe1 + coe2
- method
Character. Agglomeration method for hierarchical clustering. One of "ward.D2" (default), "single", "complete", "average", "mcquitty", "median", "centroid". See
stats::hclust().- dist_method
Character. Distance metric. Default is "euclidean". Other options: "manhattan", "maximum", "canberra", "binary", "minkowski". See
stats::dist().- center
Logical. Should data be centered? Default
TRUE.- scale
Logical or NULL. Should data be scaled to unit variance? If
NULL(default), automatically determined based on predictor types.- k
Integer. Optional. If provided, cuts tree to k clusters.
- h
Numeric. Optional. If provided, cuts tree at height h.
- ...
Additional arguments passed to
stats::hclust()
Value
An object of class c("stat_hclust", "momstats") containing:
data: Original tibble (unchanged)model: Thestats::hclust()objectdist_matrix: Distance matrix used for clusteringmethod: Agglomeration methoddist_method: Distance metric usedcall: The function callformula: Formula used (if any)predictor_cols: All predictor column namescenter: Logical, was centering appliedscale: Logical, was scaling appliedk: Number of clusters (if tree was cut)h: Height threshold (if tree was cut by height)clusters: Cluster assignments (if tree was cut)
Details
stat_hclust() provides hierarchical clustering for morphometric data with
proper handling of coefficient columns and optional covariates.
Agglomeration methods
"ward.D2"(default): Ward's minimum variance method - typically best for morphometric data as it minimizes within-cluster variance"complete": Maximum distance between clusters"average": UPGMA - average distance between clusters"single": Minimum distance (tends to chain)
Distance metrics
"euclidean"(default): Standard L2 distance"manhattan": L1 distance, more robust to outliers"maximum": Chebyshev distance
Cutting the tree
The tree can be cut during creation or later:
# Cut during creation
hc <- stat_hclust(data, k = 4)
# Cut later via collect
hc <- stat_hclust(data)
data_clustered <- collect(hc, k = 4)Getting results
Use collect() to add cluster assignments:
hc <- boteft %>% stat_hclust()
boteft_clustered <- collect(hc, k = 4)Use transduce() to reconstruct shapes:
Examples
if (FALSE) { # \dontrun{
# Basic hierarchical clustering
hc1 <- boteft %>% stat_hclust()
# With specific method
hc2 <- boteft %>% stat_hclust(method = "average")
# Cut tree during creation
hc3 <- boteft %>% stat_hclust(k = 4)
# Different distance
hc4 <- boteft %>% stat_hclust(dist_method = "manhattan")
# Add cluster assignments
boteft_clustered <- collect(hc1, k = 4)
# Get cluster center shapes
centers <- transduce(hc3, tibble(cluster = 1:4))
# Get shapes at internal nodes
nodes <- transduce(hc1, tibble(node = c(45, 50)))
# Plot (requires ape package)
plot(hc1) # Unrooted phylogram (default)
plot(hc1, color = type) # Color by grouping
plot(hc1, type = "dendrogram") # Classic dendrogram
} # }
