Wunder Fund LOB Predictorium — Causal Global-Norm GRU Ensemble (CPU/ONNX)

Streaming model for the Wunder Fund "LOB Fairy Predictorium" challenge: predict two anonymized future price-movement targets (t0, t1) from sequences of Limit-Order-Book states, under the official contract — 1 CPU core, 16 GB RAM, offline, ≤60 min, scored by Weighted Pearson Correlation (weights |target|, predictions clipped to [-6, 6]), averaged over the two targets.

Portfolio project on the completed public competition, trained only on the provided data. All scores are local validation (valid.parquet), not private-leaderboard results. Code: https://github.com/msrishav-28/wunder-fund-LOB-fairy-predictorium

Results (official scorer, full `valid.parquet`, 1,444 sequences)

Solution	weighted_pearson	t0	t1
Provided baseline (vanilla GRU)	0.2595	0.388	0.131
This ensemble (per-target weighted blend)	0.2846	0.4163	0.1528

+0.025 over baseline (+9.7%), ≈3σ. Sequence-bootstrap 95% CI [0.271, 0.301]. Streaming inference reproduces the offline blend exactly; ≈1.1 ms/row on one core → ≈28 min for a 1,500-sequence test set.

Model

Input: 32 raw LOB/trade features per step (p0..p11 bid/ask prices, v0..v11 bid/ask volumes, dp0..dp3 trade prices, dv0..dv3 trade volumes).
Preprocessing: global (train-fixed) z-normalization → causal momentum features [norm, lag1, delta, rolling-mean{5,10,20,40}] (160 or 224 dims; built online, no future leak).
Backbone: ensemble of 10 unidirectional GRUs (2 layers, hidden 96–192), each exported as a stateful one-step ONNX graph (features, h0) → (prediction, h1). State resets per sequence.
Combination: per-target blend — cross-validated non-negative weights for t0 (which generalize), uniform average for t1 (weight-fitting overfits the near-noise target). Weights live in ensemble_config.json.

Files

File	Purpose
`solution.py`	Streaming `PredictionModel.predict(DataPoint)` — the exact inference path
`ensemble_config.json`	Global mean/std + per-model ONNX names, input sizes, per-target weights
`*.onnx` (×10)	The stateful one-step GRU members
`utils.py`	Competition `DataPoint` + official scorer (unchanged)
`technical_report.md`, `RESULTS.md`, `FINDINGS.md`	Methodology, experiment ledger, data forensics

Usage

# pip install onnxruntime numpy
# Files (solution.py, utils.py, ensemble_config.json, *.onnx) must sit in one folder.
from utils import DataPoint
from solution import PredictionModel

model = PredictionModel()
# Feed one DataPoint per step in chronological order; reset is automatic on seq_ix change.
# Warm-up steps (0..98) -> returns None; scored steps (99..999) -> np.ndarray shape (2,).
pred = model.predict(DataPoint(seq_ix=0, step_in_seq=99, need_prediction=True, state=state_32))

Key findings (why the score is shaped this way)

t0 is a short-horizon, causally-learnable move (≈0.42); a non-causal 3-step reconstruction reaches ≈0.75 corr, so most of the remaining gap is genuine future information.
t1 is a long-horizon, near-noise target (causal ceiling ≈0.15) — this structurally caps the averaged metric. Microstructure feature engineering and t1-autocorrelation "tricks" added nothing (the latter only via target leakage, which is invalid at inference).

Limitations & honesty

Local-validation numbers only; the hidden test set is unavailable, so the published public-LB leader (0.3240) cannot be matched-or-claimed and sits above the bootstrap CI. Weighted correlation is a statistical score, not a claim of trading profit. No competition data is redistributed here.

License: MIT (code/weights). Trained solely on the competition-provided dataset.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Time Series Forecasting

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support