Skip to content

Temporal Feature Engineering

Generate aggregated features from longitudinal event data — any dataset with a grouping key (entity) and timestamps. Works with a single table or multiple joined tables.

Quick start — single table

import polars as pl
from datetime import date
from datasci_toolkit import TemporalFeatureEngineer

transactions = pl.DataFrame({
    "user_id": [1, 1, 1, 2, 2],
    "date": [
        date(2023, 12, 22), date(2023, 12, 12), date(2023, 11, 17),
        date(2023, 12, 27), date(2023, 11,  2),
    ],
    "amount": [100.0, 200.0, 300.0, 150.0, 250.0],
    "status": ["paid", "paid", "unpaid", "paid", "unpaid"],
})

fe = (
    TemporalFeatureEngineer()
    .add_aggregation("amount", ["sum", "mean", "count"], windows=["30d", "90d", "inf"], table="transactions")
)
features = fe.fit_transform(
    {"transactions": transactions},
    entity_col="user_id",
    time_col="date",
    reference_date="2024-01-01",
    primary="transactions",
)
print(features)

Output: one row per user_id, columns SUM_AMOUNT_30d, MEAN_AMOUNT_30d, COUNT_AMOUNT_30d, SUM_AMOUNT_90d, etc.

Multi-table input

Pass a dict of DataFrames. Each table must contain entity_col. Secondary tables are left-joined onto the primary on entity_col.

user_info = pl.DataFrame({
    "user_id": [1, 2, 3],
    "tier": ["gold", "silver", "bronze"],
})

fe = (
    TemporalFeatureEngineer()
    .add_aggregation("amount", ["sum"], windows=["30d", "inf"], table="transactions")
)
features = fe.fit_transform(
    {"transactions": transactions, "user_info": user_info},
    entity_col="user_id",
    time_col="date",
    reference_date="2024-01-01",
    primary="transactions",
)

User 3 (present in user_info but not transactions) appears with null feature values — the entity set is derived from the primary table during fit.

Time-since features

How many days/months since the first or last event per entity.

fe = (
    TemporalFeatureEngineer()
    .add_time_since("date", from_="last", unit="days",   table="transactions")
    .add_time_since("date", from_="first", unit="months", table="transactions")
)
features = fe.fit_transform(
    {"transactions": transactions},
    entity_col="user_id",
    time_col="date",
    reference_date="2024-01-01",
    primary="transactions",
)
# Columns: TIME_SINCE_LAST_DATE_days, TIME_SINCE_FIRST_DATE_months

Supported units: "days", "months", "hours".

Query filters

Restrict rows before aggregating using a SQL WHERE clause string.

fe = (
    TemporalFeatureEngineer()
    .add_aggregation("amount", ["sum", "count"], windows=["30d", "inf"],
                     table="transactions", query="status = 'paid'")
    .add_time_since("date", from_="last", unit="days",
                    table="transactions", query="status = 'paid'")
)
features = fe.fit_transform(
    {"transactions": transactions},
    entity_col="user_id",
    time_col="date",
    reference_date="2024-01-01",
    primary="transactions",
)
# Columns include: SUM_AMOUNT_30d__status__paid, TIME_SINCE_LAST_DATE_days

Ratio features

Divide two already-computed aggregation columns. Reference them by their generated column name.

fe = (
    TemporalFeatureEngineer()
    .add_aggregation("amount", ["sum"], windows=["30d", "inf"], table="transactions")
    .add_ratio("SUM_AMOUNT_30d", "SUM_AMOUNT_inf")
)
features = fe.fit_transform(
    {"transactions": transactions},
    entity_col="user_id",
    time_col="date",
    reference_date="2024-01-01",
    primary="transactions",
)
# Column: RATIO_SUM_AMOUNT_30d__SUM_AMOUNT_inf
# Zero or null denominator → null (no inf values)

Config dict

Equivalent to the fluent builder — useful for serialising feature specs to YAML/JSON.

from datasci_toolkit import TemporalFeatureEngineer

cfg = {
    "meta": {
        "entity_col":     "user_id",
        "time_col":       "date",
        "reference_date": "2024-01-01",
        "primary":        "transactions",
    },
    "aggregations": [
        {
            "variable":  "amount",
            "functions": ["sum", "mean", "count"],
            "windows":   ["30d", "90d", "inf"],
            "table":     "transactions",
        },
        {
            "variable":  "amount",
            "functions": ["sum"],
            "windows":   ["30d"],
            "table":     "transactions",
            "query":     "status = 'paid'",
        },
    ],
    "time_since": [
        {"variable": "date", "from": "last",  "unit": "days",   "table": "transactions"},
        {"variable": "date", "from": "first", "unit": "months", "table": "transactions"},
    ],
    "ratios": [
        {"numerator": "SUM_AMOUNT_30d", "denominator": "SUM_AMOUNT_inf"},
    ],
}

fe = TemporalFeatureEngineer.from_config(cfg)
features = fe.fit_transform({"transactions": transactions})

Feature naming reference

Feature type Column name pattern Example
Aggregation {FUNC}_{VAR}_{WINDOW} SUM_AMOUNT_30d
Aggregation + query {FUNC}_{VAR}_{WINDOW}__{sanitised_query} COUNT_AMOUNT_30d__status__paid
Time-since TIME_SINCE_{FROM}_{VAR}_{UNIT} TIME_SINCE_LAST_DATE_days
Ratio RATIO_{NUMERATOR}__{DENOMINATOR} RATIO_SUM_AMOUNT_30d__SUM_AMOUNT_inf

Supported aggregation functions: sum, mean, min, max, count, std, mode

Supported time windows: "7d", "30d", "90d", "1mo", "inf" (any Nd or Nmo pattern)

Supported time-since units: "days", "months", "hours"