Ensemble Kalman Filter (EnKF)¶

The Ensemble Kalman Filter uses an ensemble of state vectors to represent the probability distribution, computing the Kalman gain from sample covariances. It is ideal for high-dimensional systems where maintaining a full covariance matrix is impractical.

Fundamental Concepts¶

The Idea¶

Instead of tracking a single state + full covariance matrix ($n \times n$), the EnKF maintains $N$ ensemble members (state samples). Statistics are computed from the ensemble:

\[\hat{x} = \frac{1}{N} \sum_{i=1}^{N} x^{(i)} \quad \text{(ensemble mean)}\]

\[P \approx \frac{1}{N-1} \sum_{i=1}^{N} (x^{(i)} - \hat{x})(x^{(i)} - \hat{x})^T \quad \text{(sample covariance)}\]

The Algorithm¶

Predict: Propagate each ensemble member + add noise

\[x^{(i)}_{k|k-1} = f(x^{(i)}_{k-1}, dt) + w^{(i)}_k, \quad w^{(i)} \sim \mathcal{N}(0, Q)\]

Update (Stochastic EnKF): Perturb observations and update each member

\[K = P_{xz} P_{zz}^{-1}$$ $$x^{(i)}_k = x^{(i)}_{k|k-1} + K (z + \epsilon^{(i)} - h(x^{(i)}_{k|k-1}))\]

where $\epsilon^{(i)} \sim \mathcal{N}(0, R)$ are observation perturbations.

When to Use¶

✅ Use EnKF when	❌ Don't use when
State dimension is high ($n > 10$)	State dimension is small (UKF is more accurate)
Full covariance is too expensive	Need exact Bayesian solution (use PF)
System is non-linear	Noise is strongly non-Gaussian

How to Use¶

import numpy as np
from kalbee import EnsembleKalmanFilter

state = np.array([[0.0], [0.0]])
cov = np.eye(2) * 10.0
Q = np.eye(2) * 0.01
R = np.eye(1) * 0.5

def transition(x, dt):
    F = np.array([[1, dt], [0, 1]])
    return F @ x

def measurement(x):
    return x[:1]  # Observe position

enkf = EnsembleKalmanFilter(
    state, cov, Q, R,
    transition_function=transition,
    measurement_function=measurement,
    ensemble_size=100,
)

np.random.seed(42)
for t in range(1, 11):
    enkf.predict(dt=1.0)
    z = np.array([[float(t) + np.random.randn() * 0.5]])
    enkf.update(z)
    print(f"True: {t}  Estimated: {enkf.x[0,0]:.2f}")

Run an Experiment¶

from kalbee import run_experiment

report = run_experiment(
    signal="sine",
    filters=["kf", "ukf", "enkf"],
    noise_std=0.3,
    duration=10.0,
    seed=42,
)
print(report.summary())

Ensemble Size

The experiment runner uses 100 ensemble members by default. Increasing this improves accuracy but slows execution. For most problems, 50–200 is sufficient.

✅ Use EnKF when	❌ Don't use when
State dimension is high (\(n > 10\))	State dimension is small (UKF is more accurate)
Full covariance is too expensive	Need exact Bayesian solution (use PF)
System is non-linear	Noise is strongly non-Gaussian