Skip to content

DemingRegressor

mcup.deming.DemingRegressor

Bases: BaseRegressor

Regression estimator using Deming (total least squares) joint optimisation.

Optimises jointly over model parameters and latent true x values, giving an exact treatment of both x and y measurement errors. Slower than XYWeightedRegressor but more accurate when x errors are large or the model is strongly nonlinear.

Supports two solvers selected via the method argument:

  • "analytical" — joint optimisation with (J^T W J)^{-1} covariance.
  • "mc" — Monte Carlo sampling with Welford online covariance (default, robust for nonlinear models).

Parameters:

Name Type Description Default
func Callable

Model function with signature func(x, params) -> float, where x is a scalar for 1-D input or a length-k array for k-dimensional input.

required
method str

Solver to use, either "analytical" or "mc". Default "mc".

'mc'
n_iter int

Maximum number of Monte Carlo iterations. Default 10_000.

10000
rtol Optional[float]

Relative tolerance for MC convergence stopping. Default None (disabled).

None
atol Optional[float]

Absolute tolerance for MC convergence stopping. Default None (disabled).

None
optimizer str

SciPy optimizer name used for parameter fitting. Default "BFGS".

'BFGS'

Attributes:

Name Type Description
params_

Fitted parameter array.

params_std_

Standard deviations of fitted parameters.

covariance_

Full parameter covariance matrix.

n_iter_

Actual number of MC iterations run (MC method only).

Notes

X may be shape (n,) for scalar input or (n, k) for k-dimensional input per data point. x_err must match X.shape exactly; set a column to zero for any feature that is known exactly. Zero-error features are pinned to their observed values inside the joint optimisation — they are excluded from the latent-variable cost to avoid division by zero.

Source code in mcup/deming.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
class DemingRegressor(BaseRegressor):
    """Regression estimator using Deming (total least squares) joint optimisation.

    Optimises jointly over model parameters and latent true x values, giving an
    exact treatment of both x and y measurement errors. Slower than
    ``XYWeightedRegressor`` but more accurate when x errors are large or the
    model is strongly nonlinear.

    Supports two solvers selected via the ``method`` argument:

    - ``"analytical"`` — joint optimisation with ``(J^T W J)^{-1}`` covariance.
    - ``"mc"`` — Monte Carlo sampling with Welford online covariance
      (default, robust for nonlinear models).

    Parameters:
        func: Model function with signature ``func(x, params) -> float``, where
            ``x`` is a scalar for 1-D input or a length-k array for k-dimensional
            input.
        method: Solver to use, either ``"analytical"`` or ``"mc"``. Default ``"mc"``.
        n_iter: Maximum number of Monte Carlo iterations. Default ``10_000``.
        rtol: Relative tolerance for MC convergence stopping. Default ``None`` (disabled).
        atol: Absolute tolerance for MC convergence stopping. Default ``None`` (disabled).
        optimizer: SciPy optimizer name used for parameter fitting. Default ``"BFGS"``.

    Attributes:
        params_: Fitted parameter array.
        params_std_: Standard deviations of fitted parameters.
        covariance_: Full parameter covariance matrix.
        n_iter_: Actual number of MC iterations run (MC method only).

    Notes:
        ``X`` may be shape ``(n,)`` for scalar input or ``(n, k)`` for k-dimensional
        input per data point.  ``x_err`` must match ``X.shape`` exactly; set a column
        to zero for any feature that is known exactly.  Zero-error features are pinned
        to their observed values inside the joint optimisation — they are excluded from
        the latent-variable cost to avoid division by zero.
    """

    def __init__(
        self,
        func: Callable,
        method: str = "mc",
        n_iter: int = 10_000,
        rtol: Optional[float] = None,
        atol: Optional[float] = None,
        optimizer: str = "BFGS",
    ) -> None:
        super().__init__(
            func,
            method=method,
            n_iter=n_iter,
            rtol=rtol,
            atol=atol,
            optimizer=optimizer,
        )

    def fit(  # type: ignore[override]
        self,
        X: np.ndarray,
        y: np.ndarray,
        x_err: np.ndarray,
        y_err: np.ndarray,
        p0: np.ndarray,
    ) -> "DemingRegressor":
        X, y, y_err, x_err = self._validate_inputs(X, y, y_err, x_err)  # type: ignore[misc]
        p0 = np.asarray(p0, dtype=float)
        n_beta = len(p0)
        n_data = len(y)

        if self.method == "analytical":
            params, cov = deming_analytical_solve(
                self.func, X, y, x_err, y_err, p0, self.optimizer
            )
            self.params_ = params
            self.covariance_ = cov
            self.params_std_ = np.sqrt(np.diag(cov))
        else:
            x_var: np.ndarray = x_err**2  # type: ignore[assignment]
            y_var: np.ndarray = y_err**2  # type: ignore[assignment]
            # Zero-error features are known exactly: pin them to observed values
            # and exclude them from the x-residual cost to avoid division by zero.
            x_free: np.ndarray = x_var > 0  # type: ignore[assignment]
            x_var_safe: np.ndarray = np.where(x_free, x_var, 1.0)  # type: ignore[assignment]

            def cost_fn_builder(
                x_s: np.ndarray, y_s: np.ndarray, params_est: np.ndarray
            ) -> object:
                def cost(theta: np.ndarray) -> float:
                    beta = theta[:n_beta]
                    eta = theta[n_beta:].reshape(X.shape)
                    eta_eval = np.where(x_free, eta, x_s)
                    x_sq = np.where(x_free, (x_s - eta) ** 2 / x_var_safe, 0.0)
                    x_term: float = float(np.sum(x_sq))
                    y_hat = np.array([self.func(eta_eval[i], beta) for i in range(n_data)])
                    y_term: float = float(
                        np.sum(
                            (y_s - y_hat) ** 2
                            / y_var
                        )
                    )
                    return x_term + y_term

                return cost

            def extract_params(theta: np.ndarray) -> np.ndarray:
                return theta[:n_beta]

            def p0_fn(x_s: np.ndarray, y_s: np.ndarray) -> np.ndarray:
                return np.concatenate([p0, x_s.ravel()])  # type: ignore[no-any-return]

            theta0 = np.concatenate([p0, X.ravel()])

            mean, cov, n = mc_solve(
                cost_fn_builder,
                X,
                y,
                x_err,
                y_err,
                theta0,
                self.n_iter,
                self.rtol,
                self.atol,
                self.optimizer,
                extract_params=extract_params,
                p0_fn=p0_fn,
            )
            self.params_ = mean
            self.covariance_ = cov
            self.params_std_ = np.sqrt(np.diag(cov))
            self.n_iter_ = n

        return self