Skip to content

Interactions

Pairwise binned WOE interaction encoding for discovering and encoding non-linear feature interactions.

BinnedInteractionEncoder

Creates pairwise feature interactions by binning two features independently, combining their bin indices into a joint index, and WOE-encoding the result. Captures non-linear interactions that individual features miss.

When to use

  • Credit scoring: applicant age and income individually predict default, but the combination matters more -- a 25-year-old with 30k income is very different risk than a 55-year-old with 30k income. Binned interactions capture these joint effects as a single WOE feature.
  • Fraud detection: transaction amount and time-of-day interact -- a $500 purchase at 3am is suspicious, at 3pm it's normal. Individual features can't express this.
  • Marketing response: campaign channel and customer segment interact -- email works for segment A, SMS for segment B. The interaction feature encodes this directly.
  • Any tabular model where you suspect pairwise non-linear effects and want interpretable WOE-encoded features rather than opaque tree splits.

Visualizing why interactions matter

The plot below shows default rates across an age-income grid. Neither feature alone captures the risk -- it's their combination that matters.

import numpy as np
import polars as pl
import matplotlib.pyplot as plt

rng = np.random.default_rng(0)
n = 2000
df = pl.DataFrame({
    "age": rng.uniform(18, 70, n).tolist(),
    "income": rng.uniform(15000, 150000, n).tolist(),
    "tenure": rng.uniform(0, 30, n).tolist(),
})
y = pl.Series(((df["age"].to_numpy() > 40) & (df["income"].to_numpy() < 50000)).astype(float).tolist())

age_np = df["age"].to_numpy()
income_np = df["income"].to_numpy()
y_np = y.to_numpy()

age_edges = np.linspace(18, 70, 6)
income_edges = np.linspace(15000, 150000, 6)

event_rate = np.full((5, 5), np.nan)
for i in range(5):
    for j in range(5):
        mask = (
            (age_np >= age_edges[i]) & (age_np < age_edges[i + 1])
            & (income_np >= income_edges[j]) & (income_np < income_edges[j + 1])
        )
        if mask.sum() > 0:
            event_rate[i, j] = y_np[mask].mean()

fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(event_rate, origin="lower", aspect="auto", cmap="RdYlGn_r")
ax.set_xticks(range(5), [f"{int(e/1000)}k" for e in income_edges[:-1]])
ax.set_yticks(range(5), [f"{int(e)}" for e in age_edges[:-1]])
ax.set_xlabel("Income")
ax.set_ylabel("Age")
ax.set_title("Event rate by Age x Income\n(high rate = top-left quadrant)")
plt.colorbar(im, ax=ax, label="Event rate")
plt.tight_layout()
plt.show()

The high-risk region (older, low-income) is clearly visible in the 2D grid but invisible to either feature alone.

Discover interactions

Find which feature pairs have the strongest interactions before encoding:

from datasci_toolkit import BinnedInteractionEncoder

top_pairs = BinnedInteractionEncoder.discover(df, y, top_n=3)
for feat_a, feat_b, iv_score in top_pairs:
    print(f"{feat_a} x {feat_b}: IV = {iv_score:.4f}")

Encode with default quantile binning

pairs = [(a, b) for a, b, _ in top_pairs]
enc = BinnedInteractionEncoder(pairs=pairs).fit(df, y)
interactions = enc.transform(df)
print(interactions)

Visualizing WOE-encoded interactions

After encoding, each observation gets a WOE value that reflects the log-odds of the event in its joint bin. Plotting these values shows how the encoder captures the interaction structure.

woe_col = interactions["age__x__income"].to_numpy()

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

sc = axes[0].scatter(age_np, income_np, c=y_np, cmap="RdYlGn_r", s=8, alpha=0.5)
axes[0].set_xlabel("Age")
axes[0].set_ylabel("Income")
axes[0].set_title("Raw target (0/1)")
plt.colorbar(sc, ax=axes[0], label="Event")

sc2 = axes[1].scatter(age_np, income_np, c=woe_col, cmap="RdBu_r", s=8, alpha=0.5)
axes[1].set_xlabel("Age")
axes[1].set_ylabel("Income")
axes[1].set_title("WOE-encoded interaction")
plt.colorbar(sc2, ax=axes[1], label="WOE")

plt.tight_layout()
plt.show()

Encode with optimal (supervised) binning

from datasci_toolkit import OptimalBinner

enc = BinnedInteractionEncoder(
    binner=OptimalBinner(),
    pairs=[("age", "income")],
).fit(df, y)
interactions = enc.transform(df)

Reuse StabilityGrouping bins

from datasci_toolkit import PrecomputedBinner, StabilityGrouping

# sg = StabilityGrouping(...).fit(...)
# enc = BinnedInteractionEncoder(
#     binner=PrecomputedBinner(bin_specs=sg.bin_specs_),
#     pairs=[("age", "income")],
# ).fit(df, y)

Binning strategies

Binner When to use
QuantileBinner(n_bins=10) Default. Fast, unsupervised, no target leakage in binning step.
OptimalBinner() Supervised binning via OptimalBinning. Better bins but risk of overfitting.
PrecomputedBinner(bin_specs=...) Reuse bins from StabilityGrouping. Ensures interaction bins match your scorecard bins.

Parameters

Parameter Default Description
binner None Binning strategy instance. None defaults to QuantileBinner(n_bins=10).
pairs None List of (feat_a, feat_b) tuples. Required for fit.

2D Optimal Binning

Joint 2D binning that optimizes bin boundaries for both features simultaneously, capturing true interaction structure that independent binning misses.

When to use 2D vs independent binning

  • Independent binning (QuantileBinner, OptimalBinner): fast, simple, good for discovery and initial encoding
  • 2D optimal binning (OptimalBinning2D): captures joint structure, supports monotonicity constraints, better IV but slower (solver-based)

Visualizing independent vs 2D binning

Independent binning creates a regular grid (left), while 2D optimal binning finds irregularly-shaped rectangular regions that better capture the interaction (right).

from datasci_toolkit import OptimalBinning2D, QuantileBinner

# Independent binning: regular grid
qb_a = QuantileBinner(n_bins=5).fit(age_np, y_np)
qb_b = QuantileBinner(n_bins=5).fit(income_np, y_np)
edges_a = qb_a.edges_
edges_b = qb_b.edges_

# 2D optimal binning: solver-optimized regions
b2d = OptimalBinning2D().fit(df, y, feature_a="age", feature_b="income")
splits_x = b2d.splits_x_
splits_y = b2d.splits_y_

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: independent grid
axes[0].scatter(age_np, income_np, c=y_np, cmap="RdYlGn_r", s=8, alpha=0.4)
for e in edges_a[1:-1]:
    axes[0].axvline(e, color="black", linewidth=0.8, alpha=0.6)
for e in edges_b[1:-1]:
    axes[0].axhline(e, color="black", linewidth=0.8, alpha=0.6)
axes[0].set_xlabel("Age")
axes[0].set_ylabel("Income")
axes[0].set_title("Independent binning (QuantileBinner)")

# Right: 2D optimal splits
axes[1].scatter(age_np, income_np, c=y_np, cmap="RdYlGn_r", s=8, alpha=0.4)
for sx in splits_x:
    axes[1].axvline(sx, color="black", linewidth=0.8, alpha=0.6)
for sy in splits_y:
    axes[1].axhline(sy, color="black", linewidth=0.8, alpha=0.6)
axes[1].set_xlabel("Age")
axes[1].set_ylabel("Income")
axes[1].set_title("2D Optimal Binning (OptimalBinning2D)")

plt.tight_layout()
plt.show()

Standalone usage (binary target)

b = OptimalBinning2D().fit(df, y, feature_a="age", feature_b="income")
woe_values = b.transform(df)
print(woe_values)

Visualizing 2D WOE surface

woe_2d = b.transform(df, metric="woe").to_numpy()
event_rate_2d = b.transform(df, metric="event_rate").to_numpy()

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

sc = axes[0].scatter(age_np, income_np, c=event_rate_2d, cmap="RdYlGn_r", s=8, alpha=0.5)
axes[0].set_xlabel("Age")
axes[0].set_ylabel("Income")
axes[0].set_title("2D Binning: Event rate per region")
plt.colorbar(sc, ax=axes[0], label="Event rate")

sc2 = axes[1].scatter(age_np, income_np, c=woe_2d, cmap="RdBu_r", s=8, alpha=0.5)
axes[1].set_xlabel("Age")
axes[1].set_ylabel("Income")
axes[1].set_title("2D Binning: WOE per region")
plt.colorbar(sc2, ax=axes[1], label="WOE")

plt.tight_layout()
plt.show()

As encoder plugin

from datasci_toolkit import BinnedInteractionEncoder, OptimalBinning2D

enc = BinnedInteractionEncoder(
    binner=OptimalBinning2D(),
    pairs=[("age", "income"), ("age", "tenure")],
).fit(df, y)
interactions = enc.transform(df)

With monotonicity constraints

b = OptimalBinning2D(
    monotonic_trend_x="ascending",
    monotonic_trend_y="ascending",
).fit(df, y, feature_a="age", feature_b="income")

Continuous target

from datasci_toolkit import ContinuousOptimalBinning2D

y_continuous = pl.Series((age_np * 0.3 + income_np * 0.001 + rng.normal(0, 1, n)).tolist())
b_cont = ContinuousOptimalBinning2D().fit(df, y_continuous, feature_a="age", feature_b="income")
mean_values = b_cont.transform(df)

Visualizing continuous 2D binning

mean_np = mean_values.to_numpy()

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

sc = axes[0].scatter(age_np, income_np, c=y_continuous.to_numpy(), cmap="viridis", s=8, alpha=0.4)
axes[0].set_xlabel("Age")
axes[0].set_ylabel("Income")
axes[0].set_title("Raw continuous target")
plt.colorbar(sc, ax=axes[0], label="Target")

sc2 = axes[1].scatter(age_np, income_np, c=mean_np, cmap="viridis", s=8, alpha=0.5)
axes[1].set_xlabel("Age")
axes[1].set_ylabel("Income")
axes[1].set_title("ContinuousOptimalBinning2D: Mean per region")
plt.colorbar(sc2, ax=axes[1], label="Bin mean")

plt.tight_layout()
plt.show()

Parameters

Parameter Default Description
solver "cp" "cp" (constraint programming) or "mip" (mixed-integer programming)
monotonic_trend_x None "ascending", "descending", or None
monotonic_trend_y None "ascending", "descending", or None
min_bin_size None Minimum fraction of data per bin
max_n_bins None Maximum number of bins