label_imputation
Imputation strategies for missing labels — useful when part of the population has unknown outcomes (reject inference, holdout groups, etc.).
KNNLabelImputer
KNNLabelImputer(
n_neighbors: int = 10,
metric: str = "minkowski",
method: str = "weighted",
cutoff: float = 0.5,
seed: int = 42,
)
Bases: BaseEstimator
KNN-based label imputation for missing-outcome records.
Finds k nearest labeled neighbours in feature space for each unlabeled
record. Distance-weighted average of neighbour labels gives P(event).
transform() converts these probabilities to training rows via
TargetImputer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_neighbors
|
int
|
Number of nearest neighbours. |
10
|
metric
|
str
|
Distance metric (any sklearn-compatible string). |
'minkowski'
|
method
|
str
|
Passed to |
'weighted'
|
cutoff
|
float
|
Threshold for |
0.5
|
seed
|
int
|
Random seed for |
42
|
Attributes:
| Name | Type | Description |
|---|---|---|
nn_ |
Fitted |
|
y_ |
Labeled target array. |
|
weights_ |
Sample weights for labeled records. |
Source code in datasci_toolkit/label_imputation.py
TargetImputer
Bases: BaseEstimator
Converts predicted probabilities into training rows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
|
'weighted'
|
cutoff
|
float
|
Threshold used when |
0.5
|
seed
|
int
|
Random seed for |
42
|
Attributes:
| Name | Type | Description |
|---|---|---|
proba_ |
Probability array stored after |
|
weights_ |
Sample weight array stored after |