Skip to content

bin_editor

Headless state machine for editing bin boundaries, with an optional anywidget-based UI.

BinEditor

BinEditor(
    bin_specs: dict[str, dict[str, Any]],
    features: DataFrame,
    target: Series,
    time_periods: Series | None = None,
    weights: Series | None = None,
    stability_threshold: float = 0.1,
)

Headless state machine for editing bin boundaries.

Works identically in plain Python scripts, notebooks, and agents. All edits are logged per feature with undo support. Call accept() to export the final bin specs dict for use with WOETransformer.

Parameters:

Name Type Description Default
bin_specs dict[str, dict[str, Any]]

Initial bin specifications — a dict produced by StabilityGrouping.bin_specs_ or built manually.

required
features DataFrame

Feature DataFrame matching the features in bin_specs.

required
target Series

Binary target series (0/1 or float).

required
time_periods Series | None

Optional time series for temporal stability metrics.

None
weights Series | None

Optional sample weight series.

None
stability_threshold float

RSI threshold used to flag unstable bins in the state dict (does not block edits).

0.1
Note

All state is accessible via state(feat), which returns a FeatureState dataclass with attributes bins, n_bins, counts, event_rates, woe, iv, dtype, groups, and temporal.

Source code in datasci_toolkit/bin_editor.py
def __init__(
    self,
    bin_specs: dict[str, dict[str, Any]],
    features: pl.DataFrame,
    target: pl.Series,
    time_periods: pl.Series | None = None,
    weights: pl.Series | None = None,
    stability_threshold: float = 0.1,
) -> None:
    self._targets = target.cast(pl.Float64).to_numpy()
    self._weights = weights.cast(pl.Float64).to_numpy() if weights is not None else np.ones(len(self._targets))
    self._time: np.ndarray | None = time_periods.to_numpy() if time_periods is not None else None
    self._threshold = stability_threshold
    self._x: dict[str, np.ndarray] = {}
    self._splits: dict[str, list[float]] = {}
    self._cat_bins: dict[str, dict[str, int]] = {}
    self._history: dict[str, list[tuple[str, Any]]] = {}
    self._orig: dict[str, dict[str, Any]] = {}

    for feat, spec in bin_specs.items():
        if feat not in features.columns:
            continue
        self._orig[feat] = spec
        self._history[feat] = []
        if spec["dtype"] == FeatureDtype.NUMERIC:
            self._x[feat] = features[feat].cast(pl.Float64).to_numpy()
            self._splits[feat] = [float(s) for s in spec["bins"][1:-1] if np.isfinite(s)]
        else:
            self._x[feat] = features[feat].cast(pl.Utf8).to_numpy().astype(str)
            self._cat_bins[feat] = {str(k): int(v) for k, v in spec["bins"].items()}

BinEditorWidget

BinEditorWidget(*args: Any, **kwargs: Any)
Source code in datasci_toolkit/bin_editor_widget.py
def __init__(self, *args: Any, **kwargs: Any) -> None:
    raise ImportError("anywidget and matplotlib are required for BinEditorWidget")