grouping
Stability-constrained optimal binning and Weight of Evidence (WOE) encoding.
StabilityGrouping
StabilityGrouping(
max_bins: int = 10,
stability_threshold: float = 0.1,
min_leaf_share: float = 0.05,
min_leaf_minority: int = 100,
important_minorities: list[str] | None = None,
must_have: list[str] | None = None,
)
Bases: BaseEstimator, TransformerMixin
Stability-constrained optimal binning with WOE encoding.
Finds optimal bins for each feature using LightGBM, then merges bins whose event rate shifts significantly across time periods (measured by RSI). Requires both a train and validation split plus a time column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_bins
|
int
|
Upper bound on number of bins per feature. |
10
|
stability_threshold
|
float
|
Maximum RSI allowed per bin across time periods. Bins exceeding this are merged with a neighbour. |
0.1
|
min_leaf_share
|
float
|
Minimum fraction of total records per bin leaf. |
0.05
|
min_leaf_minority
|
int
|
Minimum records per bin for minority features. |
100
|
important_minorities
|
list[str] | None
|
Features where |
None
|
must_have
|
list[str] | None
|
Features that are never excluded even if unstable. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
bin_specs_ |
Dict of bin definitions produced after fitting. |
|
transformer_ |
Fitted |
|
excluded_ |
Features that could not be grouped. |
Source code in datasci_toolkit/grouping.py
WOETransformer
Bases: BaseEstimator, TransformerMixin
Encodes features as Weight of Evidence values using pre-computed bin specs.
Sklearn-compatible transformer (works in Pipeline, GridSearchCV).
Bin specs must be provided at construction — use StabilityGrouping or
BinEditor.accept() to produce them.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bin_specs
|
dict[str, dict[str, Any]] | None
|
Mapping of feature name to spec dict with keys |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
binners_ |
Dict of fitted |
|
feature_names_in_ |
List of feature names seen during |