model_selection

Gini-based stepwise logistic regression for scorecard-style feature selection.

AUCStepwiseLogit

AUCStepwiseLogit(
    initial_predictors: list[str] | None = None,
    all_predictors: list[str] | None = None,
    selection_method: str = "stepwise",
    max_iter: int = 1000,
    min_increase: float = 0.005,
    max_decrease: float = 0.0025,
    max_predictors: int = 0,
    max_correlation: float = 1.0,
    enforce_coef_sign: bool = False,
    penalty: str = "l2",
    C: float = 1000.0,
    correlation_sample: int = 10000,
    use_cv: bool = False,
    cv_folds: int = 5,
    cv_seed: int = 42,
    cv_stratify: bool = True,
)

Bases: BaseEstimator

Gini-based stepwise logistic regression.

Selects features by Gini improvement rather than p-values, with optional correlation filtering, sign enforcement, and cross-validated scoring.

Parameters:

Name	Type	Description	Default
`initial_predictors`	`list[str] \| None`	Features forced into the model at the start.	`None`
`all_predictors`	`list[str] \| None`	Candidate pool (defaults to all columns in `X`).	`None`
`selection_method`	`str`	`"forward"`, `"backward"`, or `"stepwise"`.	`'stepwise'`
`max_iter`	`int`	Maximum number of add/remove steps.	`1000`
`min_increase`	`float`	Minimum Gini gain required to add a feature.	`0.005`
`max_decrease`	`float`	Maximum Gini drop allowed before removing a feature.	`0.0025`
`max_predictors`	`int`	Hard cap on model size (0 = unlimited).	`0`
`max_correlation`	`float`	Reject candidates correlated above this with any already-selected feature.	`1.0`
`enforce_coef_sign`	`bool`	Reject features that flip a coefficient sign.	`False`
`penalty`	`str`	Regularisation type passed to `LogisticRegression`.	`'l2'`
`C`	`float`	Regularisation strength.	`1000.0`
`correlation_sample`	`int`	Max rows used for the correlation check.	`10000`
`use_cv`	`bool`	Score via k-fold CV instead of a held-out validation set.	`False`
`cv_folds`	`int`	Number of CV folds.	`5`
`cv_seed`	`int`	Random seed for CV splits.	`42`
`cv_stratify`	`bool`	Use stratified folds.	`True`

Attributes:

Name	Type	Description
`predictors_`		Ordered list of selected feature names.
`coef_`		Coefficients for selected features.
`intercept_`		Model intercept.
`progress_`		DataFrame logging each add/remove step with Gini deltas.

Source code in datasci_toolkit/model_selection.py

def __init__(
    self,
    initial_predictors: list[str] | None = None,
    all_predictors: list[str] | None = None,
    selection_method: str = "stepwise",
    max_iter: int = 1000,
    min_increase: float = 0.005,
    max_decrease: float = 0.0025,
    max_predictors: int = 0,
    max_correlation: float = 1.0,
    enforce_coef_sign: bool = False,
    penalty: str = "l2",
    C: float = 1000.0,
    correlation_sample: int = 10000,
    use_cv: bool = False,
    cv_folds: int = 5,
    cv_seed: int = 42,
    cv_stratify: bool = True,
) -> None:
    self.initial_predictors = initial_predictors
    self.all_predictors = all_predictors
    self.selection_method = selection_method
    self.max_iter = max_iter
    self.min_increase = min_increase
    self.max_decrease = max_decrease
    self.max_predictors = max_predictors
    self.max_correlation = max_correlation
    self.enforce_coef_sign = enforce_coef_sign
    self.penalty = penalty
    self.C = C
    self.correlation_sample = correlation_sample
    self.use_cv = use_cv
    self.cv_folds = cv_folds
    self.cv_seed = cv_seed
    self.cv_stratify = cv_stratify