Skip to content

grouping

Stability-constrained optimal binning and Weight of Evidence (WOE) encoding.

StabilityGrouping

StabilityGrouping(
    max_bins: int = 10,
    stability_threshold: float = 0.1,
    min_leaf_share: float = 0.05,
    min_leaf_minority: int = 100,
    important_minorities: list[str] | None = None,
    must_have: list[str] | None = None,
)

Bases: BaseEstimator, TransformerMixin

Stability-constrained optimal binning with WOE encoding.

Finds optimal bins for each feature using LightGBM, then merges bins whose event rate shifts significantly across time periods (measured by RSI). Requires both a train and validation split plus a time column.

Parameters:

Name Type Description Default
max_bins int

Upper bound on number of bins per feature.

10
stability_threshold float

Maximum RSI allowed per bin across time periods. Bins exceeding this are merged with a neighbour.

0.1
min_leaf_share float

Minimum fraction of total records per bin leaf.

0.05
min_leaf_minority int

Minimum records per bin for minority features.

100
important_minorities list[str] | None

Features where min_leaf_minority applies.

None
must_have list[str] | None

Features that are never excluded even if unstable.

None

Attributes:

Name Type Description
bin_specs_

Dict of bin definitions produced after fitting.

transformer_

Fitted WOETransformer instance.

excluded_

Features that could not be grouped.

Source code in datasci_toolkit/grouping.py
def __init__(
    self,
    max_bins: int = 10,
    stability_threshold: float = 0.10,
    min_leaf_share: float = 0.05,
    min_leaf_minority: int = 100,
    important_minorities: list[str] | None = None,
    must_have: list[str] | None = None,
) -> None:
    self.max_bins = max_bins
    self.stability_threshold = stability_threshold
    self.min_leaf_share = min_leaf_share
    self.min_leaf_minority = min_leaf_minority
    self.important_minorities = important_minorities
    self.must_have = must_have

WOETransformer

WOETransformer(
    bin_specs: dict[str, dict[str, Any]] | None = None,
)

Bases: BaseEstimator, TransformerMixin

Encodes features as Weight of Evidence values using pre-computed bin specs.

Sklearn-compatible transformer (works in Pipeline, GridSearchCV). Bin specs must be provided at construction — use StabilityGrouping or BinEditor.accept() to produce them.

Parameters:

Name Type Description Default
bin_specs dict[str, dict[str, Any]] | None

Mapping of feature name to spec dict with keys dtype ("float" or "category") and bins (list of cut points or {category: group_index} dict).

None

Attributes:

Name Type Description
binners_

Dict of fitted OptimalBinning instances keyed by feature.

feature_names_in_

List of feature names seen during fit.

Source code in datasci_toolkit/grouping.py
def __init__(self, bin_specs: dict[str, dict[str, Any]] | None = None) -> None:
    self.bin_specs = bin_specs