Interactions
Pairwise binned WOE interaction encoding for discovering and encoding non-linear feature interactions.
BinnedInteractionEncoder
Creates pairwise feature interactions by binning two features independently, combining their bin indices into a joint index, and WOE-encoding the result. Captures non-linear interactions that individual features miss.
When to use
- Credit scoring: applicant age and income individually predict default, but the combination matters more -- a 25-year-old with 30k income is very different risk than a 55-year-old with 30k income. Binned interactions capture these joint effects as a single WOE feature.
- Fraud detection: transaction amount and time-of-day interact -- a $500 purchase at 3am is suspicious, at 3pm it's normal. Individual features can't express this.
- Marketing response: campaign channel and customer segment interact -- email works for segment A, SMS for segment B. The interaction feature encodes this directly.
- Any tabular model where you suspect pairwise non-linear effects and want interpretable WOE-encoded features rather than opaque tree splits.
Visualizing why interactions matter
The plot below shows default rates across an age-income grid. Neither feature alone captures the risk -- it's their combination that matters.
import numpy as np
import polars as pl
import matplotlib.pyplot as plt
rng = np.random.default_rng(0)
n = 2000
df = pl.DataFrame({
"age": rng.uniform(18, 70, n).tolist(),
"income": rng.uniform(15000, 150000, n).tolist(),
"tenure": rng.uniform(0, 30, n).tolist(),
})
y = pl.Series(((df["age"].to_numpy() > 40) & (df["income"].to_numpy() < 50000)).astype(float).tolist())
age_np = df["age"].to_numpy()
income_np = df["income"].to_numpy()
y_np = y.to_numpy()
age_edges = np.linspace(18, 70, 6)
income_edges = np.linspace(15000, 150000, 6)
event_rate = np.full((5, 5), np.nan)
for i in range(5):
for j in range(5):
mask = (
(age_np >= age_edges[i]) & (age_np < age_edges[i + 1])
& (income_np >= income_edges[j]) & (income_np < income_edges[j + 1])
)
if mask.sum() > 0:
event_rate[i, j] = y_np[mask].mean()
fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(event_rate, origin="lower", aspect="auto", cmap="RdYlGn_r")
ax.set_xticks(range(5), [f"{int(e/1000)}k" for e in income_edges[:-1]])
ax.set_yticks(range(5), [f"{int(e)}" for e in age_edges[:-1]])
ax.set_xlabel("Income")
ax.set_ylabel("Age")
ax.set_title("Event rate by Age x Income\n(high rate = top-left quadrant)")
plt.colorbar(im, ax=ax, label="Event rate")
plt.tight_layout()
plt.show()
The high-risk region (older, low-income) is clearly visible in the 2D grid but invisible to either feature alone.
Discover interactions
Find which feature pairs have the strongest interactions before encoding:
from datasci_toolkit import BinnedInteractionEncoder
top_pairs = BinnedInteractionEncoder.discover(df, y, top_n=3)
for feat_a, feat_b, iv_score in top_pairs:
print(f"{feat_a} x {feat_b}: IV = {iv_score:.4f}")
Encode with default quantile binning
pairs = [(a, b) for a, b, _ in top_pairs]
enc = BinnedInteractionEncoder(pairs=pairs).fit(df, y)
interactions = enc.transform(df)
print(interactions)
Visualizing WOE-encoded interactions
After encoding, each observation gets a WOE value that reflects the log-odds of the event in its joint bin. Plotting these values shows how the encoder captures the interaction structure.
woe_col = interactions["age__x__income"].to_numpy()
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
sc = axes[0].scatter(age_np, income_np, c=y_np, cmap="RdYlGn_r", s=8, alpha=0.5)
axes[0].set_xlabel("Age")
axes[0].set_ylabel("Income")
axes[0].set_title("Raw target (0/1)")
plt.colorbar(sc, ax=axes[0], label="Event")
sc2 = axes[1].scatter(age_np, income_np, c=woe_col, cmap="RdBu_r", s=8, alpha=0.5)
axes[1].set_xlabel("Age")
axes[1].set_ylabel("Income")
axes[1].set_title("WOE-encoded interaction")
plt.colorbar(sc2, ax=axes[1], label="WOE")
plt.tight_layout()
plt.show()
Encode with optimal (supervised) binning
from datasci_toolkit import OptimalBinner
enc = BinnedInteractionEncoder(
binner=OptimalBinner(),
pairs=[("age", "income")],
).fit(df, y)
interactions = enc.transform(df)
Reuse StabilityGrouping bins
from datasci_toolkit import PrecomputedBinner, StabilityGrouping
# sg = StabilityGrouping(...).fit(...)
# enc = BinnedInteractionEncoder(
# binner=PrecomputedBinner(bin_specs=sg.bin_specs_),
# pairs=[("age", "income")],
# ).fit(df, y)
Binning strategies
| Binner | When to use |
|---|---|
QuantileBinner(n_bins=10) |
Default. Fast, unsupervised, no target leakage in binning step. |
OptimalBinner() |
Supervised binning via OptimalBinning. Better bins but risk of overfitting. |
PrecomputedBinner(bin_specs=...) |
Reuse bins from StabilityGrouping. Ensures interaction bins match your scorecard bins. |
Parameters
| Parameter | Default | Description |
|---|---|---|
binner |
None |
Binning strategy instance. None defaults to QuantileBinner(n_bins=10). |
pairs |
None |
List of (feat_a, feat_b) tuples. Required for fit. |
2D Optimal Binning
Joint 2D binning that optimizes bin boundaries for both features simultaneously, capturing true interaction structure that independent binning misses.
When to use 2D vs independent binning
- Independent binning (
QuantileBinner,OptimalBinner): fast, simple, good for discovery and initial encoding - 2D optimal binning (
OptimalBinning2D): captures joint structure, supports monotonicity constraints, better IV but slower (solver-based)
Visualizing independent vs 2D binning
Independent binning creates a regular grid (left), while 2D optimal binning finds irregularly-shaped rectangular regions that better capture the interaction (right).
from datasci_toolkit import OptimalBinning2D, QuantileBinner
# Independent binning: regular grid
qb_a = QuantileBinner(n_bins=5).fit(age_np, y_np)
qb_b = QuantileBinner(n_bins=5).fit(income_np, y_np)
edges_a = qb_a.edges_
edges_b = qb_b.edges_
# 2D optimal binning: solver-optimized regions
b2d = OptimalBinning2D().fit(df, y, feature_a="age", feature_b="income")
splits_x = b2d.splits_x_
splits_y = b2d.splits_y_
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Left: independent grid
axes[0].scatter(age_np, income_np, c=y_np, cmap="RdYlGn_r", s=8, alpha=0.4)
for e in edges_a[1:-1]:
axes[0].axvline(e, color="black", linewidth=0.8, alpha=0.6)
for e in edges_b[1:-1]:
axes[0].axhline(e, color="black", linewidth=0.8, alpha=0.6)
axes[0].set_xlabel("Age")
axes[0].set_ylabel("Income")
axes[0].set_title("Independent binning (QuantileBinner)")
# Right: 2D optimal splits
axes[1].scatter(age_np, income_np, c=y_np, cmap="RdYlGn_r", s=8, alpha=0.4)
for sx in splits_x:
axes[1].axvline(sx, color="black", linewidth=0.8, alpha=0.6)
for sy in splits_y:
axes[1].axhline(sy, color="black", linewidth=0.8, alpha=0.6)
axes[1].set_xlabel("Age")
axes[1].set_ylabel("Income")
axes[1].set_title("2D Optimal Binning (OptimalBinning2D)")
plt.tight_layout()
plt.show()
Standalone usage (binary target)
b = OptimalBinning2D().fit(df, y, feature_a="age", feature_b="income")
woe_values = b.transform(df)
print(woe_values)
Visualizing 2D WOE surface
woe_2d = b.transform(df, metric="woe").to_numpy()
event_rate_2d = b.transform(df, metric="event_rate").to_numpy()
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
sc = axes[0].scatter(age_np, income_np, c=event_rate_2d, cmap="RdYlGn_r", s=8, alpha=0.5)
axes[0].set_xlabel("Age")
axes[0].set_ylabel("Income")
axes[0].set_title("2D Binning: Event rate per region")
plt.colorbar(sc, ax=axes[0], label="Event rate")
sc2 = axes[1].scatter(age_np, income_np, c=woe_2d, cmap="RdBu_r", s=8, alpha=0.5)
axes[1].set_xlabel("Age")
axes[1].set_ylabel("Income")
axes[1].set_title("2D Binning: WOE per region")
plt.colorbar(sc2, ax=axes[1], label="WOE")
plt.tight_layout()
plt.show()
As encoder plugin
from datasci_toolkit import BinnedInteractionEncoder, OptimalBinning2D
enc = BinnedInteractionEncoder(
binner=OptimalBinning2D(),
pairs=[("age", "income"), ("age", "tenure")],
).fit(df, y)
interactions = enc.transform(df)
With monotonicity constraints
b = OptimalBinning2D(
monotonic_trend_x="ascending",
monotonic_trend_y="ascending",
).fit(df, y, feature_a="age", feature_b="income")
Continuous target
from datasci_toolkit import ContinuousOptimalBinning2D
y_continuous = pl.Series((age_np * 0.3 + income_np * 0.001 + rng.normal(0, 1, n)).tolist())
b_cont = ContinuousOptimalBinning2D().fit(df, y_continuous, feature_a="age", feature_b="income")
mean_values = b_cont.transform(df)
Visualizing continuous 2D binning
mean_np = mean_values.to_numpy()
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
sc = axes[0].scatter(age_np, income_np, c=y_continuous.to_numpy(), cmap="viridis", s=8, alpha=0.4)
axes[0].set_xlabel("Age")
axes[0].set_ylabel("Income")
axes[0].set_title("Raw continuous target")
plt.colorbar(sc, ax=axes[0], label="Target")
sc2 = axes[1].scatter(age_np, income_np, c=mean_np, cmap="viridis", s=8, alpha=0.5)
axes[1].set_xlabel("Age")
axes[1].set_ylabel("Income")
axes[1].set_title("ContinuousOptimalBinning2D: Mean per region")
plt.colorbar(sc2, ax=axes[1], label="Bin mean")
plt.tight_layout()
plt.show()
Parameters
| Parameter | Default | Description |
|---|---|---|
solver |
"cp" |
"cp" (constraint programming) or "mip" (mixed-integer programming) |
monotonic_trend_x |
None |
"ascending", "descending", or None |
monotonic_trend_y |
None |
"ascending", "descending", or None |
min_bin_size |
None |
Minimum fraction of data per bin |
max_n_bins |
None |
Maximum number of bins |