variable_clustering
Hierarchical correlation clustering for variable reduction before logistic regression.
CorrVarClus
Bases: BaseEstimator
Hierarchical correlation clustering for variable reduction.
Groups features into clusters using average-linkage hierarchical clustering with a correlation distance metric. Ranks features within each cluster by absolute Gini so the most predictive representative can be selected.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_correlation
|
float
|
Dendrogram cut height. Features correlated above this threshold end up in the same cluster. |
0.5
|
max_clusters
|
int | None
|
Hard cap on number of clusters. Overrides
|
None
|
sample
|
int
|
Subsample rows before clustering for speed on large datasets.
|
0
|
Attributes:
| Name | Type | Description |
|---|---|---|
features_ |
Column names after dropping zero-variance columns. |
|
labels_ |
Cluster label per feature (1-indexed). |
|
Z_ |
Linkage matrix from |
|
corr_line_ |
The correlation threshold used to cut the dendrogram. |
|
cluster_table_ |
DataFrame with columns |