LBJLincoln Claude Opus 4.6 commited on
Commit
25ed3b5
·
1 Parent(s): 853341e

feat: 7 SOTA neural network models for NBA prediction

Browse files

- LSTM Bidirectional (sequence model, last 10 games)
- Transformer Attention (self-attention over game history)
- TabNet (attention-based tabular, interpretable)
- FT-Transformer (feature tokenizer, SOTA tabular 2025-2026)
- Deep Ensemble (10 ResNet MLPs, uncertainty estimation)
- Conformal Prediction (calibrated intervals, guaranteed coverage)
- AutoGluon Ensemble (auto-search hundreds of configs)

All models: BaseNBAModel interface, NaN handling, early stopping,
save/load, CPU-only PyTorch. Ready for 6021 features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (3) hide show
  1. models/__init__.py +35 -0
  2. models/neural_models.py +1598 -0
  3. requirements.txt +5 -0
models/__init__.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ NBA Quant AI — Neural Network Models
3
+ =====================================
4
+ SOTA 2025-2026 neural architectures for NBA game prediction.
5
+
6
+ All models conform to the same interface:
7
+ - fit(X_train, y_train, X_val, y_val)
8
+ - predict_proba(X)
9
+ - get_params()
10
+ - save(path) / load(path)
11
+
12
+ Runs on HF Spaces (16 GB RAM, CPU-only PyTorch).
13
+ """
14
+
15
+ from .neural_models import (
16
+ LSTMSequenceModel,
17
+ TransformerAttentionModel,
18
+ TabNetModel,
19
+ FTTransformerModel,
20
+ DeepEnsemble,
21
+ ConformalPredictionWrapper,
22
+ AutoGluonEnsemble,
23
+ NEURAL_MODEL_REGISTRY,
24
+ )
25
+
26
+ __all__ = [
27
+ "LSTMSequenceModel",
28
+ "TransformerAttentionModel",
29
+ "TabNetModel",
30
+ "FTTransformerModel",
31
+ "DeepEnsemble",
32
+ "ConformalPredictionWrapper",
33
+ "AutoGluonEnsemble",
34
+ "NEURAL_MODEL_REGISTRY",
35
+ ]
models/neural_models.py ADDED
@@ -0,0 +1,1598 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ NBA Quant AI — Neural Network Models (2025-2026 SOTA)
4
+ ======================================================
5
+ Real, production-grade neural architectures for NBA game prediction.
6
+
7
+ Models implemented:
8
+ 1. LSTMSequenceModel — Bidirectional LSTM over last N games
9
+ 2. TransformerAttentionModel — Self-attention over game history
10
+ 3. TabNetModel — Attention-based tabular learning (Arik & Pfister 2021)
11
+ 4. FTTransformerModel — Feature Tokenizer + Transformer (Gorishniy et al. 2021)
12
+ 5. DeepEnsemble — N independent nets, averaged predictions
13
+ 6. ConformalPredictionWrapper — Calibrated prediction intervals (any base model)
14
+ 7. AutoGluonEnsemble — Auto-stacking over hundreds of configs
15
+
16
+ All models:
17
+ - Handle NaN gracefully (median imputation)
18
+ - Work with 6000+ features
19
+ - Use early stopping
20
+ - CPU-only PyTorch (no CUDA needed)
21
+ - Fit in 16 GB RAM (HF Spaces free tier)
22
+
23
+ THIS RUNS ON HF SPACES ONLY — NOT ON VM.
24
+ """
25
+
26
+ from __future__ import annotations
27
+
28
+ import copy
29
+ import json
30
+ import math
31
+ import os
32
+ import pickle
33
+ import warnings
34
+ from abc import ABC, abstractmethod
35
+ from pathlib import Path
36
+ from typing import Any, Dict, List, Optional, Tuple, Union
37
+
38
+ import numpy as np
39
+ from sklearn.model_selection import train_test_split
40
+ from sklearn.preprocessing import StandardScaler
41
+
42
+ warnings.filterwarnings("ignore", category=UserWarning)
43
+
44
+ # ---------------------------------------------------------------------------
45
+ # Lazy imports — heavy libraries loaded only when a model is instantiated
46
+ # ---------------------------------------------------------------------------
47
+
48
+ def _import_torch():
49
+ """Import torch lazily to avoid startup cost."""
50
+ import torch
51
+ import torch.nn as nn
52
+ import torch.optim as optim
53
+ from torch.utils.data import DataLoader, TensorDataset
54
+ return torch, nn, optim, DataLoader, TensorDataset
55
+
56
+
57
+ # ---------------------------------------------------------------------------
58
+ # Base class — common interface for all models
59
+ # ---------------------------------------------------------------------------
60
+
61
+ class BaseNBAModel(ABC):
62
+ """Abstract base for all NBA prediction models."""
63
+
64
+ def __init__(self, **params):
65
+ self.params = params
66
+ self._scaler: Optional[StandardScaler] = None
67
+ self._feature_medians: Optional[np.ndarray] = None
68
+ self._is_fitted = False
69
+
70
+ # --- public interface ---------------------------------------------------
71
+
72
+ @abstractmethod
73
+ def fit(
74
+ self,
75
+ X_train: np.ndarray,
76
+ y_train: np.ndarray,
77
+ X_val: Optional[np.ndarray] = None,
78
+ y_val: Optional[np.ndarray] = None,
79
+ ) -> "BaseNBAModel":
80
+ """Train the model. Returns self."""
81
+ ...
82
+
83
+ @abstractmethod
84
+ def predict_proba(self, X: np.ndarray) -> np.ndarray:
85
+ """Return P(home_win) for each row — shape (n,)."""
86
+ ...
87
+
88
+ def get_params(self) -> Dict[str, Any]:
89
+ """Return hyperparameter dict (JSON-serialisable)."""
90
+ return {k: v for k, v in self.params.items() if _is_jsonable(v)}
91
+
92
+ def save(self, path: Union[str, Path]) -> None:
93
+ """Persist to disk."""
94
+ path = Path(path)
95
+ path.parent.mkdir(parents=True, exist_ok=True)
96
+ with open(path, "wb") as f:
97
+ pickle.dump(self, f, protocol=pickle.HIGHEST_PROTOCOL)
98
+
99
+ @classmethod
100
+ def load(cls, path: Union[str, Path]) -> "BaseNBAModel":
101
+ """Load from disk."""
102
+ with open(path, "rb") as f:
103
+ obj = pickle.load(f)
104
+ return obj
105
+
106
+ # --- NaN handling & scaling --------------------------------------------
107
+
108
+ def _impute(self, X: np.ndarray, fit: bool = False) -> np.ndarray:
109
+ """Replace NaN/Inf with column medians. If *fit*, compute medians first."""
110
+ X = np.array(X, dtype=np.float32)
111
+ X = np.where(np.isfinite(X), X, np.nan)
112
+ if fit:
113
+ self._feature_medians = np.nanmedian(X, axis=0)
114
+ self._feature_medians = np.where(
115
+ np.isfinite(self._feature_medians), self._feature_medians, 0.0
116
+ )
117
+ medians = self._feature_medians if self._feature_medians is not None else np.zeros(X.shape[1])
118
+ inds = np.where(np.isnan(X))
119
+ X[inds] = np.take(medians, inds[1])
120
+ return X
121
+
122
+ def _scale(self, X: np.ndarray, fit: bool = False) -> np.ndarray:
123
+ """Standard-scale features."""
124
+ if fit:
125
+ self._scaler = StandardScaler()
126
+ return self._scaler.fit_transform(X).astype(np.float32)
127
+ if self._scaler is not None:
128
+ return self._scaler.transform(X).astype(np.float32)
129
+ return X.astype(np.float32)
130
+
131
+ def _prepare(self, X: np.ndarray, fit: bool = False) -> np.ndarray:
132
+ """Impute + scale."""
133
+ X = self._impute(X, fit=fit)
134
+ X = self._scale(X, fit=fit)
135
+ return X
136
+
137
+ def _auto_val_split(
138
+ self,
139
+ X: np.ndarray,
140
+ y: np.ndarray,
141
+ X_val: Optional[np.ndarray],
142
+ y_val: Optional[np.ndarray],
143
+ val_frac: float = 0.15,
144
+ ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
145
+ """If no validation set provided, carve one from the tail (time-ordered)."""
146
+ if X_val is not None and y_val is not None:
147
+ return X, y, X_val, y_val
148
+ split = int(len(X) * (1 - val_frac))
149
+ return X[:split], y[:split], X[split:], y[split:]
150
+
151
+
152
+ # ===========================================================================
153
+ # 1. LSTM Game Sequence Model
154
+ # ===========================================================================
155
+
156
+ class LSTMSequenceModel(BaseNBAModel):
157
+ """
158
+ Bidirectional LSTM over the last *seq_len* games of features per team.
159
+
160
+ Input shape: (batch, seq_len, n_features)
161
+ Architecture: BiLSTM(128) -> BiLSTM(64) -> Dense(32) -> Sigmoid
162
+
163
+ For flat input (n_samples, n_features), the model internally reshapes
164
+ using a sliding window of *seq_len* rows, treating consecutive games as
165
+ the sequence dimension. For true per-team sequences, pass 3-D arrays
166
+ directly.
167
+ """
168
+
169
+ def __init__(
170
+ self,
171
+ seq_len: int = 10,
172
+ hidden1: int = 128,
173
+ hidden2: int = 64,
174
+ dense_dim: int = 32,
175
+ dropout: float = 0.3,
176
+ lr: float = 1e-3,
177
+ weight_decay: float = 1e-5,
178
+ batch_size: int = 256,
179
+ epochs: int = 120,
180
+ patience: int = 15,
181
+ **kw,
182
+ ):
183
+ super().__init__(
184
+ seq_len=seq_len, hidden1=hidden1, hidden2=hidden2,
185
+ dense_dim=dense_dim, dropout=dropout, lr=lr,
186
+ weight_decay=weight_decay, batch_size=batch_size,
187
+ epochs=epochs, patience=patience, **kw,
188
+ )
189
+ self.seq_len = seq_len
190
+ self.hidden1 = hidden1
191
+ self.hidden2 = hidden2
192
+ self.dense_dim = dense_dim
193
+ self.dropout = dropout
194
+ self.lr = lr
195
+ self.weight_decay = weight_decay
196
+ self.batch_size = batch_size
197
+ self.epochs = epochs
198
+ self.patience = patience
199
+ self._net = None
200
+
201
+ # --- PyTorch module (defined inside method to keep torch lazy) ----------
202
+
203
+ @staticmethod
204
+ def _build_net(n_features: int, cfg: dict):
205
+ torch, nn, _, _, _ = _import_torch()
206
+
207
+ class BiLSTMNet(nn.Module):
208
+ def __init__(self):
209
+ super().__init__()
210
+ self.lstm1 = nn.LSTM(
211
+ input_size=n_features,
212
+ hidden_size=cfg["hidden1"],
213
+ batch_first=True,
214
+ bidirectional=True,
215
+ dropout=cfg["dropout"] if cfg["hidden2"] else 0,
216
+ )
217
+ self.lstm2 = nn.LSTM(
218
+ input_size=cfg["hidden1"] * 2, # bidirectional doubles
219
+ hidden_size=cfg["hidden2"],
220
+ batch_first=True,
221
+ bidirectional=True,
222
+ )
223
+ self.dropout = nn.Dropout(cfg["dropout"])
224
+ self.fc1 = nn.Linear(cfg["hidden2"] * 2, cfg["dense_dim"])
225
+ self.relu = nn.ReLU()
226
+ self.fc2 = nn.Linear(cfg["dense_dim"], 1)
227
+
228
+ def forward(self, x):
229
+ # x: (batch, seq_len, features)
230
+ out, _ = self.lstm1(x)
231
+ out = self.dropout(out)
232
+ out, _ = self.lstm2(out)
233
+ # Take last hidden state
234
+ out = out[:, -1, :]
235
+ out = self.dropout(out)
236
+ out = self.relu(self.fc1(out))
237
+ out = self.dropout(out)
238
+ return torch.sigmoid(self.fc2(out)).squeeze(-1)
239
+
240
+ return BiLSTMNet()
241
+
242
+ # --- Sequence construction from flat arrays ----------------------------
243
+
244
+ def _make_sequences(
245
+ self, X: np.ndarray, y: np.ndarray
246
+ ) -> Tuple[np.ndarray, np.ndarray]:
247
+ """
248
+ Convert flat (n_games, n_features) into (n_sequences, seq_len, n_features).
249
+ Uses a sliding window — game i maps to window [i-seq_len+1 .. i].
250
+ The first seq_len-1 games are dropped (not enough history).
251
+ """
252
+ if X.ndim == 3:
253
+ return X, y # already sequential
254
+ seqs, labels = [], []
255
+ for i in range(self.seq_len - 1, len(X)):
256
+ seqs.append(X[i - self.seq_len + 1 : i + 1])
257
+ labels.append(y[i])
258
+ return np.array(seqs, dtype=np.float32), np.array(labels, dtype=np.float32)
259
+
260
+ # --- fit / predict -----------------------------------------------------
261
+
262
+ def fit(
263
+ self,
264
+ X_train: np.ndarray,
265
+ y_train: np.ndarray,
266
+ X_val: Optional[np.ndarray] = None,
267
+ y_val: Optional[np.ndarray] = None,
268
+ ) -> "LSTMSequenceModel":
269
+ torch, nn, optim, DataLoader, TensorDataset = _import_torch()
270
+
271
+ # Prepare
272
+ X_train = self._prepare(X_train, fit=True)
273
+ X_train, y_train, X_val, y_val = self._auto_val_split(X_train, y_train, X_val, y_val)
274
+ if X_val is not None:
275
+ X_val = self._prepare(X_val)
276
+
277
+ # Build sequences
278
+ X_tr_seq, y_tr_seq = self._make_sequences(X_train, y_train)
279
+ X_va_seq, y_va_seq = self._make_sequences(X_val, y_val)
280
+
281
+ n_features = X_tr_seq.shape[2]
282
+ self._net = self._build_net(n_features, {
283
+ "hidden1": self.hidden1, "hidden2": self.hidden2,
284
+ "dense_dim": self.dense_dim, "dropout": self.dropout,
285
+ })
286
+
287
+ optimizer = optim.AdamW(
288
+ self._net.parameters(), lr=self.lr, weight_decay=self.weight_decay
289
+ )
290
+ scheduler = optim.lr_scheduler.ReduceLROnPlateau(
291
+ optimizer, mode="min", factor=0.5, patience=5, min_lr=1e-6
292
+ )
293
+ criterion = nn.BCELoss()
294
+
295
+ train_ds = TensorDataset(
296
+ torch.from_numpy(X_tr_seq), torch.from_numpy(y_tr_seq)
297
+ )
298
+ train_dl = DataLoader(train_ds, batch_size=self.batch_size, shuffle=True)
299
+
300
+ val_X_t = torch.from_numpy(X_va_seq)
301
+ val_y_t = torch.from_numpy(y_va_seq)
302
+
303
+ best_val_loss = float("inf")
304
+ best_state = None
305
+ wait = 0
306
+
307
+ self._net.train()
308
+ for epoch in range(self.epochs):
309
+ epoch_loss = 0.0
310
+ for xb, yb in train_dl:
311
+ optimizer.zero_grad()
312
+ preds = self._net(xb)
313
+ loss = criterion(preds, yb)
314
+ loss.backward()
315
+ torch.nn.utils.clip_grad_norm_(self._net.parameters(), 1.0)
316
+ optimizer.step()
317
+ epoch_loss += loss.item() * len(xb)
318
+ epoch_loss /= len(train_ds)
319
+
320
+ # Validation
321
+ self._net.eval()
322
+ with torch.no_grad():
323
+ val_preds = self._net(val_X_t)
324
+ val_loss = criterion(val_preds, val_y_t).item()
325
+ self._net.train()
326
+
327
+ scheduler.step(val_loss)
328
+
329
+ if val_loss < best_val_loss - 1e-6:
330
+ best_val_loss = val_loss
331
+ best_state = copy.deepcopy(self._net.state_dict())
332
+ wait = 0
333
+ else:
334
+ wait += 1
335
+ if wait >= self.patience:
336
+ break
337
+
338
+ if best_state is not None:
339
+ self._net.load_state_dict(best_state)
340
+ self._net.eval()
341
+ self._is_fitted = True
342
+ return self
343
+
344
+ def predict_proba(self, X: np.ndarray) -> np.ndarray:
345
+ torch, _, _, _, _ = _import_torch()
346
+ assert self._is_fitted, "Model not fitted yet"
347
+
348
+ X = self._prepare(X)
349
+ # If flat, create sequences with padding for early games
350
+ if X.ndim == 2:
351
+ seqs = []
352
+ for i in range(len(X)):
353
+ start = max(0, i - self.seq_len + 1)
354
+ seq = X[start : i + 1]
355
+ if len(seq) < self.seq_len:
356
+ pad = np.zeros((self.seq_len - len(seq), X.shape[1]), dtype=np.float32)
357
+ seq = np.concatenate([pad, seq], axis=0)
358
+ seqs.append(seq)
359
+ X_seq = np.array(seqs, dtype=np.float32)
360
+ else:
361
+ X_seq = X.astype(np.float32)
362
+
363
+ self._net.eval()
364
+ with torch.no_grad():
365
+ preds = self._net(torch.from_numpy(X_seq))
366
+ return preds.numpy()
367
+
368
+
369
+ # ===========================================================================
370
+ # 2. Transformer Attention Model
371
+ # ===========================================================================
372
+
373
+ class TransformerAttentionModel(BaseNBAModel):
374
+ """
375
+ Self-attention over team performance history.
376
+
377
+ Architecture:
378
+ Linear projection -> Positional encoding ->
379
+ TransformerEncoder (2 layers, 4 heads) ->
380
+ Global average pool -> Dense -> Sigmoid
381
+
382
+ For flat input the model treats each game as one token in a
383
+ sequence of *seq_len* tokens (same sliding-window as LSTM model).
384
+ """
385
+
386
+ def __init__(
387
+ self,
388
+ seq_len: int = 10,
389
+ d_model: int = 128,
390
+ n_heads: int = 4,
391
+ n_layers: int = 2,
392
+ dim_ff: int = 256,
393
+ dropout: float = 0.2,
394
+ lr: float = 5e-4,
395
+ weight_decay: float = 1e-4,
396
+ batch_size: int = 256,
397
+ epochs: int = 120,
398
+ patience: int = 15,
399
+ **kw,
400
+ ):
401
+ super().__init__(
402
+ seq_len=seq_len, d_model=d_model, n_heads=n_heads,
403
+ n_layers=n_layers, dim_ff=dim_ff, dropout=dropout,
404
+ lr=lr, weight_decay=weight_decay, batch_size=batch_size,
405
+ epochs=epochs, patience=patience, **kw,
406
+ )
407
+ self.seq_len = seq_len
408
+ self.d_model = d_model
409
+ self.n_heads = n_heads
410
+ self.n_layers = n_layers
411
+ self.dim_ff = dim_ff
412
+ self.dropout = dropout
413
+ self.lr = lr
414
+ self.weight_decay = weight_decay
415
+ self.batch_size = batch_size
416
+ self.epochs = epochs
417
+ self.patience = patience
418
+ self._net = None
419
+
420
+ @staticmethod
421
+ def _build_net(n_features: int, cfg: dict):
422
+ torch, nn, _, _, _ = _import_torch()
423
+
424
+ class PositionalEncoding(nn.Module):
425
+ """Sinusoidal positional encoding for game order."""
426
+ def __init__(self, d_model: int, max_len: int = 200):
427
+ super().__init__()
428
+ pe = torch.zeros(max_len, d_model)
429
+ position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
430
+ div_term = torch.exp(
431
+ torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model)
432
+ )
433
+ pe[:, 0::2] = torch.sin(position * div_term)
434
+ pe[:, 1::2] = torch.cos(position * div_term[: d_model // 2]) # handle odd d_model
435
+ pe = pe.unsqueeze(0) # (1, max_len, d_model)
436
+ self.register_buffer("pe", pe)
437
+
438
+ def forward(self, x):
439
+ return x + self.pe[:, : x.size(1), :]
440
+
441
+ class TransformerNet(nn.Module):
442
+ def __init__(self):
443
+ super().__init__()
444
+ self.input_proj = nn.Linear(n_features, cfg["d_model"])
445
+ self.pos_enc = PositionalEncoding(cfg["d_model"], max_len=cfg["seq_len"] + 10)
446
+ self.layer_norm_in = nn.LayerNorm(cfg["d_model"])
447
+ encoder_layer = nn.TransformerEncoderLayer(
448
+ d_model=cfg["d_model"],
449
+ nhead=cfg["n_heads"],
450
+ dim_feedforward=cfg["dim_ff"],
451
+ dropout=cfg["dropout"],
452
+ batch_first=True,
453
+ activation="gelu",
454
+ )
455
+ self.encoder = nn.TransformerEncoder(
456
+ encoder_layer, num_layers=cfg["n_layers"]
457
+ )
458
+ self.dropout = nn.Dropout(cfg["dropout"])
459
+ self.fc1 = nn.Linear(cfg["d_model"], cfg["d_model"] // 2)
460
+ self.gelu = nn.GELU()
461
+ self.fc2 = nn.Linear(cfg["d_model"] // 2, 1)
462
+
463
+ def forward(self, x):
464
+ # x: (batch, seq_len, n_features)
465
+ x = self.input_proj(x)
466
+ x = self.pos_enc(x)
467
+ x = self.layer_norm_in(x)
468
+ x = self.encoder(x)
469
+ # Global average pooling across sequence dim
470
+ x = x.mean(dim=1)
471
+ x = self.dropout(x)
472
+ x = self.gelu(self.fc1(x))
473
+ x = self.dropout(x)
474
+ return torch.sigmoid(self.fc2(x)).squeeze(-1)
475
+
476
+ return TransformerNet()
477
+
478
+ def _make_sequences(self, X: np.ndarray, y: np.ndarray):
479
+ if X.ndim == 3:
480
+ return X, y
481
+ seqs, labels = [], []
482
+ for i in range(self.seq_len - 1, len(X)):
483
+ seqs.append(X[i - self.seq_len + 1 : i + 1])
484
+ labels.append(y[i])
485
+ return np.array(seqs, dtype=np.float32), np.array(labels, dtype=np.float32)
486
+
487
+ def fit(
488
+ self,
489
+ X_train: np.ndarray,
490
+ y_train: np.ndarray,
491
+ X_val: Optional[np.ndarray] = None,
492
+ y_val: Optional[np.ndarray] = None,
493
+ ) -> "TransformerAttentionModel":
494
+ torch, nn, optim, DataLoader, TensorDataset = _import_torch()
495
+
496
+ X_train = self._prepare(X_train, fit=True)
497
+ X_train, y_train, X_val, y_val = self._auto_val_split(X_train, y_train, X_val, y_val)
498
+ if X_val is not None:
499
+ X_val = self._prepare(X_val)
500
+
501
+ X_tr_seq, y_tr_seq = self._make_sequences(X_train, y_train)
502
+ X_va_seq, y_va_seq = self._make_sequences(X_val, y_val)
503
+
504
+ n_features = X_tr_seq.shape[2]
505
+ self._net = self._build_net(n_features, {
506
+ "d_model": self.d_model, "n_heads": self.n_heads,
507
+ "n_layers": self.n_layers, "dim_ff": self.dim_ff,
508
+ "dropout": self.dropout, "seq_len": self.seq_len,
509
+ })
510
+
511
+ optimizer = optim.AdamW(
512
+ self._net.parameters(), lr=self.lr, weight_decay=self.weight_decay
513
+ )
514
+ scheduler = optim.lr_scheduler.CosineAnnealingWarmRestarts(
515
+ optimizer, T_0=10, T_mult=2, eta_min=1e-6
516
+ )
517
+ criterion = nn.BCELoss()
518
+
519
+ train_ds = TensorDataset(
520
+ torch.from_numpy(X_tr_seq), torch.from_numpy(y_tr_seq)
521
+ )
522
+ train_dl = DataLoader(train_ds, batch_size=self.batch_size, shuffle=True)
523
+
524
+ val_X_t = torch.from_numpy(X_va_seq)
525
+ val_y_t = torch.from_numpy(y_va_seq)
526
+
527
+ best_val_loss = float("inf")
528
+ best_state = None
529
+ wait = 0
530
+
531
+ self._net.train()
532
+ for epoch in range(self.epochs):
533
+ epoch_loss = 0.0
534
+ for xb, yb in train_dl:
535
+ optimizer.zero_grad()
536
+ preds = self._net(xb)
537
+ loss = criterion(preds, yb)
538
+ loss.backward()
539
+ torch.nn.utils.clip_grad_norm_(self._net.parameters(), 1.0)
540
+ optimizer.step()
541
+ epoch_loss += loss.item() * len(xb)
542
+ epoch_loss /= len(train_ds)
543
+ scheduler.step(epoch + epoch_loss) # warm restart input
544
+
545
+ self._net.eval()
546
+ with torch.no_grad():
547
+ val_preds = self._net(val_X_t)
548
+ val_loss = criterion(val_preds, val_y_t).item()
549
+ self._net.train()
550
+
551
+ if val_loss < best_val_loss - 1e-6:
552
+ best_val_loss = val_loss
553
+ best_state = copy.deepcopy(self._net.state_dict())
554
+ wait = 0
555
+ else:
556
+ wait += 1
557
+ if wait >= self.patience:
558
+ break
559
+
560
+ if best_state is not None:
561
+ self._net.load_state_dict(best_state)
562
+ self._net.eval()
563
+ self._is_fitted = True
564
+ return self
565
+
566
+ def predict_proba(self, X: np.ndarray) -> np.ndarray:
567
+ torch, _, _, _, _ = _import_torch()
568
+ assert self._is_fitted, "Model not fitted yet"
569
+
570
+ X = self._prepare(X)
571
+ if X.ndim == 2:
572
+ seqs = []
573
+ for i in range(len(X)):
574
+ start = max(0, i - self.seq_len + 1)
575
+ seq = X[start : i + 1]
576
+ if len(seq) < self.seq_len:
577
+ pad = np.zeros((self.seq_len - len(seq), X.shape[1]), dtype=np.float32)
578
+ seq = np.concatenate([pad, seq], axis=0)
579
+ seqs.append(seq)
580
+ X_seq = np.array(seqs, dtype=np.float32)
581
+ else:
582
+ X_seq = X.astype(np.float32)
583
+
584
+ self._net.eval()
585
+ with torch.no_grad():
586
+ preds = self._net(torch.from_numpy(X_seq))
587
+ return preds.numpy()
588
+
589
+
590
+ # ===========================================================================
591
+ # 3. TabNet — Attention-based Tabular Model
592
+ # ===========================================================================
593
+
594
+ class TabNetModel(BaseNBAModel):
595
+ """
596
+ TabNet (Arik & Pfister 2021) — SOTA attention-based tabular learning.
597
+
598
+ Uses sequential attention to select features at each decision step,
599
+ providing built-in interpretability via attention masks.
600
+
601
+ Wraps pytorch_tabnet.TabNetClassifier with NaN handling and
602
+ early stopping.
603
+ """
604
+
605
+ def __init__(
606
+ self,
607
+ n_d: int = 32,
608
+ n_a: int = 32,
609
+ n_steps: int = 5,
610
+ gamma: float = 1.5,
611
+ lambda_sparse: float = 1e-4,
612
+ n_independent: int = 2,
613
+ n_shared: int = 2,
614
+ lr: float = 2e-2,
615
+ batch_size: int = 1024,
616
+ virtual_batch_size: int = 256,
617
+ epochs: int = 200,
618
+ patience: int = 20,
619
+ mask_type: str = "entmax",
620
+ **kw,
621
+ ):
622
+ super().__init__(
623
+ n_d=n_d, n_a=n_a, n_steps=n_steps, gamma=gamma,
624
+ lambda_sparse=lambda_sparse, n_independent=n_independent,
625
+ n_shared=n_shared, lr=lr, batch_size=batch_size,
626
+ virtual_batch_size=virtual_batch_size, epochs=epochs,
627
+ patience=patience, mask_type=mask_type, **kw,
628
+ )
629
+ self.n_d = n_d
630
+ self.n_a = n_a
631
+ self.n_steps = n_steps
632
+ self.gamma = gamma
633
+ self.lambda_sparse = lambda_sparse
634
+ self.n_independent = n_independent
635
+ self.n_shared = n_shared
636
+ self.lr = lr
637
+ self.batch_size = batch_size
638
+ self.virtual_batch_size = virtual_batch_size
639
+ self.epochs = epochs
640
+ self.patience = patience
641
+ self.mask_type = mask_type
642
+ self._clf = None
643
+ self._feature_importances: Optional[np.ndarray] = None
644
+
645
+ def fit(
646
+ self,
647
+ X_train: np.ndarray,
648
+ y_train: np.ndarray,
649
+ X_val: Optional[np.ndarray] = None,
650
+ y_val: Optional[np.ndarray] = None,
651
+ ) -> "TabNetModel":
652
+ from pytorch_tabnet.tab_model import TabNetClassifier
653
+
654
+ X_train = self._impute(X_train, fit=True)
655
+ X_train, y_train, X_val, y_val = self._auto_val_split(X_train, y_train, X_val, y_val)
656
+ if X_val is not None:
657
+ X_val = self._impute(X_val)
658
+
659
+ y_train = y_train.astype(np.int64)
660
+ y_val = y_val.astype(np.int64)
661
+
662
+ self._clf = TabNetClassifier(
663
+ n_d=self.n_d,
664
+ n_a=self.n_a,
665
+ n_steps=self.n_steps,
666
+ gamma=self.gamma,
667
+ lambda_sparse=self.lambda_sparse,
668
+ n_independent=self.n_independent,
669
+ n_shared=self.n_shared,
670
+ optimizer_fn=None, # default Adam
671
+ optimizer_params={"lr": self.lr},
672
+ mask_type=self.mask_type,
673
+ scheduler_fn=None,
674
+ scheduler_params=None,
675
+ verbose=0,
676
+ device_name="cpu",
677
+ )
678
+
679
+ self._clf.fit(
680
+ X_train=X_train,
681
+ y_train=y_train,
682
+ eval_set=[(X_val, y_val)],
683
+ eval_name=["val"],
684
+ eval_metric=["logloss"],
685
+ max_epochs=self.epochs,
686
+ patience=self.patience,
687
+ batch_size=self.batch_size,
688
+ virtual_batch_size=min(self.virtual_batch_size, self.batch_size),
689
+ drop_last=False,
690
+ )
691
+
692
+ self._feature_importances = self._clf.feature_importances_
693
+ self._is_fitted = True
694
+ return self
695
+
696
+ def predict_proba(self, X: np.ndarray) -> np.ndarray:
697
+ assert self._is_fitted, "Model not fitted yet"
698
+ X = self._impute(X)
699
+ proba = self._clf.predict_proba(X) # shape (n, 2)
700
+ return proba[:, 1]
701
+
702
+ def get_feature_importances(self) -> Optional[np.ndarray]:
703
+ """Return TabNet attention-based feature importances."""
704
+ return self._feature_importances
705
+
706
+ def explain(self, X: np.ndarray) -> np.ndarray:
707
+ """Return per-sample feature attention masks."""
708
+ assert self._is_fitted, "Model not fitted yet"
709
+ X = self._impute(X)
710
+ masks, _ = self._clf.explain(X)
711
+ return masks
712
+
713
+
714
+ # ===========================================================================
715
+ # 4. FT-Transformer (Feature Tokenizer + Transformer)
716
+ # ===========================================================================
717
+
718
+ class FTTransformerModel(BaseNBAModel):
719
+ """
720
+ FT-Transformer (Gorishniy et al. 2021) — confirmed SOTA for tabular
721
+ data in 2025-2026 benchmarks.
722
+
723
+ Each numerical feature is projected into a *d_token*-dimensional embedding.
724
+ A [CLS] token is prepended. Self-attention across all feature tokens
725
+ captures cross-feature interactions. The [CLS] representation feeds a
726
+ classification head.
727
+
728
+ Because the full 6000+ features would create 6000+ tokens (too large for
729
+ self-attention on CPU), we first apply a learned linear bottleneck to
730
+ reduce to *n_tokens* feature groups.
731
+ """
732
+
733
+ def __init__(
734
+ self,
735
+ n_tokens: int = 128,
736
+ d_token: int = 64,
737
+ n_heads: int = 4,
738
+ n_layers: int = 3,
739
+ dim_ff: int = 256,
740
+ dropout: float = 0.2,
741
+ attention_dropout: float = 0.1,
742
+ lr: float = 1e-4,
743
+ weight_decay: float = 1e-5,
744
+ batch_size: int = 512,
745
+ epochs: int = 120,
746
+ patience: int = 15,
747
+ **kw,
748
+ ):
749
+ super().__init__(
750
+ n_tokens=n_tokens, d_token=d_token, n_heads=n_heads,
751
+ n_layers=n_layers, dim_ff=dim_ff, dropout=dropout,
752
+ attention_dropout=attention_dropout, lr=lr,
753
+ weight_decay=weight_decay, batch_size=batch_size,
754
+ epochs=epochs, patience=patience, **kw,
755
+ )
756
+ self.n_tokens = n_tokens
757
+ self.d_token = d_token
758
+ self.n_heads = n_heads
759
+ self.n_layers = n_layers
760
+ self.dim_ff = dim_ff
761
+ self.dropout = dropout
762
+ self.attention_dropout = attention_dropout
763
+ self.lr = lr
764
+ self.weight_decay = weight_decay
765
+ self.batch_size = batch_size
766
+ self.epochs = epochs
767
+ self.patience = patience
768
+ self._net = None
769
+
770
+ @staticmethod
771
+ def _build_net(n_features: int, cfg: dict):
772
+ torch, nn, _, _, _ = _import_torch()
773
+
774
+ class FTTransformerNet(nn.Module):
775
+ """
776
+ Feature Tokenizer + Transformer.
777
+
778
+ 1) Bottleneck: Linear(n_features -> n_tokens) — group features
779
+ 2) Token embed: each of *n_tokens* scalars -> d_token vector
780
+ 3) Prepend [CLS] token
781
+ 4) TransformerEncoder
782
+ 5) [CLS] output -> classification head
783
+ """
784
+
785
+ def __init__(self):
786
+ super().__init__()
787
+ n_tok = cfg["n_tokens"]
788
+ d_tok = cfg["d_token"]
789
+
790
+ # Bottleneck projection: reduce 6000 features to n_tokens groups
791
+ self.bottleneck = nn.Linear(n_features, n_tok)
792
+ self.bn_norm = nn.LayerNorm(n_tok)
793
+
794
+ # Per-token embedding: each scalar -> d_token vector
795
+ # Implemented as a shared Linear(1 -> d_token) + per-token bias
796
+ self.token_weight = nn.Parameter(torch.randn(n_tok, d_tok) * 0.02)
797
+ self.token_bias = nn.Parameter(torch.zeros(n_tok, d_tok))
798
+
799
+ # [CLS] token
800
+ self.cls_token = nn.Parameter(torch.randn(1, 1, d_tok) * 0.02)
801
+
802
+ # Transformer
803
+ self.layer_norm = nn.LayerNorm(d_tok)
804
+ encoder_layer = nn.TransformerEncoderLayer(
805
+ d_model=d_tok,
806
+ nhead=cfg["n_heads"],
807
+ dim_feedforward=cfg["dim_ff"],
808
+ dropout=cfg["dropout"],
809
+ batch_first=True,
810
+ activation="gelu",
811
+ )
812
+ self.encoder = nn.TransformerEncoder(
813
+ encoder_layer, num_layers=cfg["n_layers"]
814
+ )
815
+
816
+ # Head
817
+ self.head = nn.Sequential(
818
+ nn.LayerNorm(d_tok),
819
+ nn.Linear(d_tok, d_tok // 2),
820
+ nn.GELU(),
821
+ nn.Dropout(cfg["dropout"]),
822
+ nn.Linear(d_tok // 2, 1),
823
+ )
824
+
825
+ def forward(self, x):
826
+ # x: (batch, n_features)
827
+ batch_size = x.size(0)
828
+
829
+ # Bottleneck: (batch, n_features) -> (batch, n_tokens)
830
+ x = self.bn_norm(self.bottleneck(x))
831
+
832
+ # Token embedding: (batch, n_tokens) -> (batch, n_tokens, d_token)
833
+ # x_i * weight_i + bias_i for each token
834
+ x = x.unsqueeze(-1) * self.token_weight.unsqueeze(0) + self.token_bias.unsqueeze(0)
835
+
836
+ # Prepend [CLS]
837
+ cls = self.cls_token.expand(batch_size, -1, -1)
838
+ x = torch.cat([cls, x], dim=1) # (batch, 1 + n_tokens, d_token)
839
+
840
+ x = self.layer_norm(x)
841
+ x = self.encoder(x)
842
+
843
+ # Extract [CLS] output
844
+ cls_out = x[:, 0, :]
845
+ return torch.sigmoid(self.head(cls_out)).squeeze(-1)
846
+
847
+ return FTTransformerNet()
848
+
849
+ def fit(
850
+ self,
851
+ X_train: np.ndarray,
852
+ y_train: np.ndarray,
853
+ X_val: Optional[np.ndarray] = None,
854
+ y_val: Optional[np.ndarray] = None,
855
+ ) -> "FTTransformerModel":
856
+ torch, nn, optim, DataLoader, TensorDataset = _import_torch()
857
+
858
+ X_train = self._prepare(X_train, fit=True)
859
+ X_train, y_train, X_val, y_val = self._auto_val_split(X_train, y_train, X_val, y_val)
860
+ if X_val is not None:
861
+ X_val = self._prepare(X_val)
862
+
863
+ y_train = y_train.astype(np.float32)
864
+ y_val = y_val.astype(np.float32)
865
+
866
+ n_features = X_train.shape[1]
867
+ self._net = self._build_net(n_features, {
868
+ "n_tokens": min(self.n_tokens, n_features),
869
+ "d_token": self.d_token,
870
+ "n_heads": self.n_heads,
871
+ "n_layers": self.n_layers,
872
+ "dim_ff": self.dim_ff,
873
+ "dropout": self.dropout,
874
+ })
875
+
876
+ optimizer = optim.AdamW(
877
+ self._net.parameters(), lr=self.lr, weight_decay=self.weight_decay
878
+ )
879
+ scheduler = optim.lr_scheduler.OneCycleLR(
880
+ optimizer, max_lr=self.lr * 10, total_steps=self.epochs,
881
+ pct_start=0.1, anneal_strategy="cos",
882
+ )
883
+ criterion = nn.BCELoss()
884
+
885
+ train_ds = TensorDataset(
886
+ torch.from_numpy(X_train), torch.from_numpy(y_train)
887
+ )
888
+ train_dl = DataLoader(train_ds, batch_size=self.batch_size, shuffle=True)
889
+
890
+ val_X_t = torch.from_numpy(X_val)
891
+ val_y_t = torch.from_numpy(y_val)
892
+
893
+ best_val_loss = float("inf")
894
+ best_state = None
895
+ wait = 0
896
+
897
+ self._net.train()
898
+ for epoch in range(self.epochs):
899
+ epoch_loss = 0.0
900
+ for xb, yb in train_dl:
901
+ optimizer.zero_grad()
902
+ preds = self._net(xb)
903
+ loss = criterion(preds, yb)
904
+ loss.backward()
905
+ torch.nn.utils.clip_grad_norm_(self._net.parameters(), 1.0)
906
+ optimizer.step()
907
+ epoch_loss += loss.item() * len(xb)
908
+ epoch_loss /= len(train_ds)
909
+ scheduler.step()
910
+
911
+ self._net.eval()
912
+ with torch.no_grad():
913
+ val_preds = self._net(val_X_t)
914
+ val_loss = criterion(val_preds, val_y_t).item()
915
+ self._net.train()
916
+
917
+ if val_loss < best_val_loss - 1e-6:
918
+ best_val_loss = val_loss
919
+ best_state = copy.deepcopy(self._net.state_dict())
920
+ wait = 0
921
+ else:
922
+ wait += 1
923
+ if wait >= self.patience:
924
+ break
925
+
926
+ if best_state is not None:
927
+ self._net.load_state_dict(best_state)
928
+ self._net.eval()
929
+ self._is_fitted = True
930
+ return self
931
+
932
+ def predict_proba(self, X: np.ndarray) -> np.ndarray:
933
+ torch, _, _, _, _ = _import_torch()
934
+ assert self._is_fitted, "Model not fitted yet"
935
+
936
+ X = self._prepare(X)
937
+ X_t = torch.from_numpy(X)
938
+
939
+ self._net.eval()
940
+ # Batch to avoid OOM on large inputs
941
+ preds_list = []
942
+ bs = self.batch_size
943
+ for i in range(0, len(X_t), bs):
944
+ with torch.no_grad():
945
+ p = self._net(X_t[i : i + bs])
946
+ preds_list.append(p.numpy())
947
+ return np.concatenate(preds_list)
948
+
949
+
950
+ # ===========================================================================
951
+ # 5. Deep Ensemble
952
+ # ===========================================================================
953
+
954
+ class DeepEnsemble(BaseNBAModel):
955
+ """
956
+ Train N independent neural networks with different random seeds.
957
+
958
+ Average their predictions for:
959
+ - Better calibration (ensemble smoothing)
960
+ - Uncertainty estimation (prediction variance)
961
+
962
+ Each member is a simple but effective MLP with skip connections (ResNet-style),
963
+ which is the 2025 consensus best architecture for tabular deep learning
964
+ when ensembled (Kadra et al. 2021 "Well-Tuned Simple Nets").
965
+ """
966
+
967
+ def __init__(
968
+ self,
969
+ n_members: int = 10,
970
+ hidden_dims: Tuple[int, ...] = (512, 256, 128),
971
+ dropout: float = 0.3,
972
+ lr: float = 1e-3,
973
+ weight_decay: float = 1e-4,
974
+ batch_size: int = 512,
975
+ epochs: int = 100,
976
+ patience: int = 12,
977
+ **kw,
978
+ ):
979
+ super().__init__(
980
+ n_members=n_members, hidden_dims=list(hidden_dims),
981
+ dropout=dropout, lr=lr, weight_decay=weight_decay,
982
+ batch_size=batch_size, epochs=epochs, patience=patience, **kw,
983
+ )
984
+ self.n_members = n_members
985
+ self.hidden_dims = hidden_dims
986
+ self.dropout = dropout
987
+ self.lr = lr
988
+ self.weight_decay = weight_decay
989
+ self.batch_size = batch_size
990
+ self.epochs = epochs
991
+ self.patience = patience
992
+ self._members: List = []
993
+
994
+ @staticmethod
995
+ def _build_mlp(n_features: int, hidden_dims: Tuple[int, ...], dropout: float, seed: int):
996
+ """Build one ResNet-style MLP member."""
997
+ torch, nn, _, _, _ = _import_torch()
998
+ torch.manual_seed(seed)
999
+
1000
+ class ResBlock(nn.Module):
1001
+ """Pre-activation residual block."""
1002
+ def __init__(self, dim: int, drop: float):
1003
+ super().__init__()
1004
+ self.net = nn.Sequential(
1005
+ nn.LayerNorm(dim),
1006
+ nn.GELU(),
1007
+ nn.Linear(dim, dim),
1008
+ nn.Dropout(drop),
1009
+ nn.LayerNorm(dim),
1010
+ nn.GELU(),
1011
+ nn.Linear(dim, dim),
1012
+ nn.Dropout(drop),
1013
+ )
1014
+
1015
+ def forward(self, x):
1016
+ return x + self.net(x)
1017
+
1018
+ layers = []
1019
+ in_dim = n_features
1020
+ for h_dim in hidden_dims:
1021
+ layers.append(nn.Linear(in_dim, h_dim))
1022
+ layers.append(nn.GELU())
1023
+ layers.append(nn.Dropout(dropout))
1024
+ # Add residual block at each hidden layer
1025
+ layers.append(ResBlock(h_dim, dropout))
1026
+ in_dim = h_dim
1027
+ layers.append(nn.Linear(in_dim, 1))
1028
+
1029
+ class EnsembleMLP(nn.Module):
1030
+ def __init__(self, layer_list):
1031
+ super().__init__()
1032
+ self.net = nn.Sequential(*layer_list)
1033
+
1034
+ def forward(self, x):
1035
+ return torch.sigmoid(self.net(x)).squeeze(-1)
1036
+
1037
+ return EnsembleMLP(layers)
1038
+
1039
+ def fit(
1040
+ self,
1041
+ X_train: np.ndarray,
1042
+ y_train: np.ndarray,
1043
+ X_val: Optional[np.ndarray] = None,
1044
+ y_val: Optional[np.ndarray] = None,
1045
+ ) -> "DeepEnsemble":
1046
+ torch, nn, optim, DataLoader, TensorDataset = _import_torch()
1047
+
1048
+ X_train = self._prepare(X_train, fit=True)
1049
+ X_train, y_train, X_val, y_val = self._auto_val_split(X_train, y_train, X_val, y_val)
1050
+ if X_val is not None:
1051
+ X_val = self._prepare(X_val)
1052
+
1053
+ y_train = y_train.astype(np.float32)
1054
+ y_val = y_val.astype(np.float32)
1055
+ n_features = X_train.shape[1]
1056
+
1057
+ val_X_t = torch.from_numpy(X_val)
1058
+ val_y_t = torch.from_numpy(y_val)
1059
+ criterion = nn.BCELoss()
1060
+
1061
+ self._members = []
1062
+ for member_idx in range(self.n_members):
1063
+ seed = 42 + member_idx * 1337
1064
+ net = self._build_mlp(n_features, self.hidden_dims, self.dropout, seed)
1065
+
1066
+ # Each member gets a different random seed for data shuffling too
1067
+ torch.manual_seed(seed)
1068
+ np.random.seed(seed)
1069
+
1070
+ optimizer = optim.AdamW(
1071
+ net.parameters(), lr=self.lr, weight_decay=self.weight_decay
1072
+ )
1073
+ scheduler = optim.lr_scheduler.ReduceLROnPlateau(
1074
+ optimizer, mode="min", factor=0.5, patience=5, min_lr=1e-6
1075
+ )
1076
+
1077
+ train_ds = TensorDataset(
1078
+ torch.from_numpy(X_train), torch.from_numpy(y_train)
1079
+ )
1080
+ train_dl = DataLoader(train_ds, batch_size=self.batch_size, shuffle=True)
1081
+
1082
+ best_val_loss = float("inf")
1083
+ best_state = None
1084
+ wait = 0
1085
+
1086
+ net.train()
1087
+ for epoch in range(self.epochs):
1088
+ for xb, yb in train_dl:
1089
+ optimizer.zero_grad()
1090
+ preds = net(xb)
1091
+ loss = criterion(preds, yb)
1092
+ loss.backward()
1093
+ torch.nn.utils.clip_grad_norm_(net.parameters(), 1.0)
1094
+ optimizer.step()
1095
+
1096
+ net.eval()
1097
+ with torch.no_grad():
1098
+ vp = net(val_X_t)
1099
+ vl = criterion(vp, val_y_t).item()
1100
+ net.train()
1101
+ scheduler.step(vl)
1102
+
1103
+ if vl < best_val_loss - 1e-6:
1104
+ best_val_loss = vl
1105
+ best_state = copy.deepcopy(net.state_dict())
1106
+ wait = 0
1107
+ else:
1108
+ wait += 1
1109
+ if wait >= self.patience:
1110
+ break
1111
+
1112
+ if best_state is not None:
1113
+ net.load_state_dict(best_state)
1114
+ net.eval()
1115
+ self._members.append(net)
1116
+
1117
+ self._is_fitted = True
1118
+ return self
1119
+
1120
+ def predict_proba(self, X: np.ndarray) -> np.ndarray:
1121
+ """Return mean prediction across ensemble members."""
1122
+ torch, _, _, _, _ = _import_torch()
1123
+ assert self._is_fitted and self._members, "Model not fitted yet"
1124
+
1125
+ X = self._prepare(X)
1126
+ X_t = torch.from_numpy(X)
1127
+
1128
+ all_preds = []
1129
+ for net in self._members:
1130
+ net.eval()
1131
+ with torch.no_grad():
1132
+ p = net(X_t).numpy()
1133
+ all_preds.append(p)
1134
+
1135
+ return np.mean(all_preds, axis=0)
1136
+
1137
+ def predict_uncertainty(self, X: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
1138
+ """
1139
+ Return (mean_prediction, std_prediction) across ensemble members.
1140
+ High std = high model uncertainty = less confident prediction.
1141
+ """
1142
+ torch, _, _, _, _ = _import_torch()
1143
+ assert self._is_fitted and self._members, "Model not fitted yet"
1144
+
1145
+ X = self._prepare(X)
1146
+ X_t = torch.from_numpy(X)
1147
+
1148
+ all_preds = []
1149
+ for net in self._members:
1150
+ net.eval()
1151
+ with torch.no_grad():
1152
+ p = net(X_t).numpy()
1153
+ all_preds.append(p)
1154
+
1155
+ stacked = np.array(all_preds) # (n_members, n_samples)
1156
+ return stacked.mean(axis=0), stacked.std(axis=0)
1157
+
1158
+
1159
+ # ===========================================================================
1160
+ # 6. Conformal Prediction Wrapper
1161
+ # ===========================================================================
1162
+
1163
+ class ConformalPredictionWrapper(BaseNBAModel):
1164
+ """
1165
+ Wraps ANY model to provide calibrated prediction intervals with
1166
+ guaranteed coverage.
1167
+
1168
+ Uses split conformal prediction:
1169
+ 1. Train base model on training set
1170
+ 2. Compute non-conformity scores on calibration holdout
1171
+ 3. At inference, use quantile of scores to produce prediction sets
1172
+
1173
+ For binary classification:
1174
+ - Returns P(home_win) from base model (point prediction)
1175
+ - Also provides prediction_set() that returns {0}, {1}, or {0,1}
1176
+ with guaranteed marginal coverage >= (1 - alpha)
1177
+ """
1178
+
1179
+ def __init__(
1180
+ self,
1181
+ base_model: BaseNBAModel,
1182
+ alpha: float = 0.10,
1183
+ cal_fraction: float = 0.20,
1184
+ **kw,
1185
+ ):
1186
+ super().__init__(alpha=alpha, cal_fraction=cal_fraction, **kw)
1187
+ self.base_model = base_model
1188
+ self.alpha = alpha
1189
+ self.cal_fraction = cal_fraction
1190
+ self._qhat: Optional[float] = None
1191
+ self._cal_scores: Optional[np.ndarray] = None
1192
+
1193
+ def fit(
1194
+ self,
1195
+ X_train: np.ndarray,
1196
+ y_train: np.ndarray,
1197
+ X_val: Optional[np.ndarray] = None,
1198
+ y_val: Optional[np.ndarray] = None,
1199
+ ) -> "ConformalPredictionWrapper":
1200
+ """
1201
+ Split data into proper-training and calibration sets.
1202
+ Train base model on proper-training, compute conformal scores on calibration.
1203
+ """
1204
+ n = len(X_train)
1205
+ cal_size = int(n * self.cal_fraction)
1206
+ # Use the LAST cal_size samples for calibration (time-ordered)
1207
+ X_proper = X_train[: n - cal_size]
1208
+ y_proper = y_train[: n - cal_size]
1209
+ X_cal = X_train[n - cal_size :]
1210
+ y_cal = y_train[n - cal_size :]
1211
+
1212
+ # Train base model
1213
+ self.base_model.fit(X_proper, y_proper, X_val, y_val)
1214
+
1215
+ # Compute non-conformity scores on calibration set
1216
+ cal_probs = self.base_model.predict_proba(X_cal)
1217
+ # Score = 1 - P(true_class)
1218
+ scores = np.where(y_cal == 1, 1.0 - cal_probs, cal_probs)
1219
+ self._cal_scores = np.sort(scores)
1220
+
1221
+ # Quantile for desired coverage
1222
+ n_cal = len(self._cal_scores)
1223
+ level = np.ceil((1.0 - self.alpha) * (n_cal + 1)) / n_cal
1224
+ level = min(level, 1.0)
1225
+ self._qhat = np.quantile(self._cal_scores, level, method="higher")
1226
+
1227
+ self._is_fitted = True
1228
+ return self
1229
+
1230
+ def predict_proba(self, X: np.ndarray) -> np.ndarray:
1231
+ """Return point predictions from base model."""
1232
+ assert self._is_fitted, "Model not fitted yet"
1233
+ return self.base_model.predict_proba(X)
1234
+
1235
+ def predict_sets(self, X: np.ndarray) -> List[set]:
1236
+ """
1237
+ Return prediction sets with guaranteed (1-alpha) coverage.
1238
+
1239
+ Each set is one of:
1240
+ - {1} — confident home win
1241
+ - {0} — confident away win
1242
+ - {0, 1} — uncertain (both plausible)
1243
+ """
1244
+ assert self._is_fitted, "Model not fitted yet"
1245
+ probs = self.base_model.predict_proba(X)
1246
+ sets = []
1247
+ for p in probs:
1248
+ s = set()
1249
+ # Include class 1 if score would be <= qhat
1250
+ if 1.0 - p <= self._qhat:
1251
+ s.add(1)
1252
+ # Include class 0 if score would be <= qhat
1253
+ if p <= self._qhat:
1254
+ s.add(0)
1255
+ if not s:
1256
+ # Shouldn't happen, but include most likely
1257
+ s.add(1 if p >= 0.5 else 0)
1258
+ sets.append(s)
1259
+ return sets
1260
+
1261
+ def predict_intervals(self, X: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
1262
+ """
1263
+ Return (lower_bound, upper_bound) calibrated probability intervals.
1264
+
1265
+ Width of interval reflects model uncertainty after conformal calibration.
1266
+ """
1267
+ assert self._is_fitted, "Model not fitted yet"
1268
+ probs = self.base_model.predict_proba(X)
1269
+ lower = np.clip(probs - self._qhat, 0.0, 1.0)
1270
+ upper = np.clip(probs + self._qhat, 0.0, 1.0)
1271
+ return lower, upper
1272
+
1273
+ def get_params(self) -> Dict[str, Any]:
1274
+ base_params = self.base_model.get_params()
1275
+ return {
1276
+ "wrapper": "conformal",
1277
+ "alpha": self.alpha,
1278
+ "cal_fraction": self.cal_fraction,
1279
+ "qhat": float(self._qhat) if self._qhat is not None else None,
1280
+ "base_model": base_params,
1281
+ }
1282
+
1283
+
1284
+ # ===========================================================================
1285
+ # 7. AutoGluon Ensemble
1286
+ # ===========================================================================
1287
+
1288
+ class AutoGluonEnsemble(BaseNBAModel):
1289
+ """
1290
+ AutoGluon Tabular — auto-search and stack hundreds of model configurations.
1291
+
1292
+ Time-budgeted: runs for *max_time* seconds, tries GBMs, neural nets,
1293
+ linear models, k-NN, then stacks the best ones.
1294
+
1295
+ Presets: "best_quality" = maximum stacking/bagging (slow but best),
1296
+ "good_quality" = reasonable speed/quality trade-off,
1297
+ "medium_quality" = fastest.
1298
+ """
1299
+
1300
+ def __init__(
1301
+ self,
1302
+ max_time: int = 3600,
1303
+ preset: str = "best_quality",
1304
+ eval_metric: str = "log_loss",
1305
+ num_bag_folds: int = 5,
1306
+ num_stack_levels: int = 1,
1307
+ verbosity: int = 1,
1308
+ **kw,
1309
+ ):
1310
+ super().__init__(
1311
+ max_time=max_time, preset=preset, eval_metric=eval_metric,
1312
+ num_bag_folds=num_bag_folds, num_stack_levels=num_stack_levels,
1313
+ verbosity=verbosity, **kw,
1314
+ )
1315
+ self.max_time = max_time
1316
+ self.preset = preset
1317
+ self.eval_metric = eval_metric
1318
+ self.num_bag_folds = num_bag_folds
1319
+ self.num_stack_levels = num_stack_levels
1320
+ self.verbosity = verbosity
1321
+ self._predictor = None
1322
+
1323
+ def fit(
1324
+ self,
1325
+ X_train: np.ndarray,
1326
+ y_train: np.ndarray,
1327
+ X_val: Optional[np.ndarray] = None,
1328
+ y_val: Optional[np.ndarray] = None,
1329
+ ) -> "AutoGluonEnsemble":
1330
+ try:
1331
+ from autogluon.tabular import TabularPredictor
1332
+ import pandas as pd
1333
+ except ImportError:
1334
+ raise ImportError(
1335
+ "autogluon.tabular not installed. Install with: "
1336
+ "pip install autogluon.tabular"
1337
+ )
1338
+
1339
+ X_train = self._impute(X_train, fit=True)
1340
+
1341
+ # Build DataFrame with feature columns + label
1342
+ n_features = X_train.shape[1]
1343
+ col_names = [f"f_{i}" for i in range(n_features)]
1344
+ df_train = pd.DataFrame(X_train, columns=col_names)
1345
+ df_train["label"] = y_train.astype(int)
1346
+
1347
+ # Validation data (optional tuning set)
1348
+ df_val = None
1349
+ if X_val is not None and y_val is not None:
1350
+ X_val = self._impute(X_val)
1351
+ df_val = pd.DataFrame(X_val, columns=col_names)
1352
+ df_val["label"] = y_val.astype(int)
1353
+
1354
+ self._col_names = col_names
1355
+
1356
+ self._predictor = TabularPredictor(
1357
+ label="label",
1358
+ eval_metric=self.eval_metric,
1359
+ problem_type="binary",
1360
+ verbosity=self.verbosity,
1361
+ )
1362
+
1363
+ fit_kwargs = {
1364
+ "train_data": df_train,
1365
+ "time_limit": self.max_time,
1366
+ "presets": self.preset,
1367
+ "num_bag_folds": self.num_bag_folds,
1368
+ "num_stack_levels": self.num_stack_levels,
1369
+ }
1370
+ if df_val is not None:
1371
+ fit_kwargs["tuning_data"] = df_val
1372
+
1373
+ self._predictor.fit(**fit_kwargs)
1374
+ self._is_fitted = True
1375
+ return self
1376
+
1377
+ def predict_proba(self, X: np.ndarray) -> np.ndarray:
1378
+ import pandas as pd
1379
+
1380
+ assert self._is_fitted, "Model not fitted yet"
1381
+ X = self._impute(X)
1382
+ df = pd.DataFrame(X, columns=self._col_names)
1383
+ proba = self._predictor.predict_proba(df)
1384
+ # Returns DataFrame with columns 0, 1 — we want P(class=1)
1385
+ if isinstance(proba, pd.DataFrame):
1386
+ return proba[1].values
1387
+ return proba
1388
+
1389
+ def leaderboard(self):
1390
+ """Return AutoGluon model leaderboard."""
1391
+ assert self._is_fitted, "Model not fitted yet"
1392
+ return self._predictor.leaderboard(silent=True)
1393
+
1394
+ def feature_importance(self, X: np.ndarray, y: np.ndarray) -> "pd.DataFrame":
1395
+ """Return permutation feature importance."""
1396
+ import pandas as pd
1397
+
1398
+ X = self._impute(X)
1399
+ df = pd.DataFrame(X, columns=self._col_names)
1400
+ df["label"] = y.astype(int)
1401
+ return self._predictor.feature_importance(df)
1402
+
1403
+ def save(self, path: Union[str, Path]) -> None:
1404
+ """AutoGluon has its own save mechanism."""
1405
+ path = Path(path)
1406
+ path.mkdir(parents=True, exist_ok=True)
1407
+ if self._predictor is not None:
1408
+ self._predictor.save(str(path / "autogluon_predictor"))
1409
+ # Save wrapper state
1410
+ state = {
1411
+ "params": self.params,
1412
+ "_col_names": getattr(self, "_col_names", None),
1413
+ "_feature_medians": self._feature_medians.tolist() if self._feature_medians is not None else None,
1414
+ "_is_fitted": self._is_fitted,
1415
+ }
1416
+ with open(path / "wrapper_state.json", "w") as f:
1417
+ json.dump(state, f)
1418
+
1419
+ @classmethod
1420
+ def load(cls, path: Union[str, Path]) -> "AutoGluonEnsemble":
1421
+ from autogluon.tabular import TabularPredictor
1422
+
1423
+ path = Path(path)
1424
+ with open(path / "wrapper_state.json") as f:
1425
+ state = json.load(f)
1426
+
1427
+ obj = cls(**state["params"])
1428
+ obj._col_names = state["_col_names"]
1429
+ if state["_feature_medians"] is not None:
1430
+ obj._feature_medians = np.array(state["_feature_medians"], dtype=np.float32)
1431
+ obj._predictor = TabularPredictor.load(str(path / "autogluon_predictor"))
1432
+ obj._is_fitted = state["_is_fitted"]
1433
+ return obj
1434
+
1435
+
1436
+ # ===========================================================================
1437
+ # Utilities
1438
+ # ===========================================================================
1439
+
1440
+ def _is_jsonable(v: Any) -> bool:
1441
+ """Check if a value is JSON serialisable."""
1442
+ try:
1443
+ json.dumps(v)
1444
+ return True
1445
+ except (TypeError, OverflowError, ValueError):
1446
+ return False
1447
+
1448
+
1449
+ # ===========================================================================
1450
+ # Model Registry — maps names to classes for the genetic algorithm
1451
+ # ===========================================================================
1452
+
1453
+ NEURAL_MODEL_REGISTRY: Dict[str, type] = {
1454
+ "lstm": LSTMSequenceModel,
1455
+ "transformer": TransformerAttentionModel,
1456
+ "tabnet": TabNetModel,
1457
+ "ft_transformer": FTTransformerModel,
1458
+ "deep_ensemble": DeepEnsemble,
1459
+ "conformal": ConformalPredictionWrapper,
1460
+ "autogluon": AutoGluonEnsemble,
1461
+ }
1462
+
1463
+
1464
+ def build_neural_model(model_type: str, **params) -> BaseNBAModel:
1465
+ """
1466
+ Factory function to build a neural model by name.
1467
+
1468
+ Usage:
1469
+ model = build_neural_model("ft_transformer", n_tokens=128, d_token=64)
1470
+ model.fit(X_train, y_train)
1471
+ probs = model.predict_proba(X_test)
1472
+
1473
+ For conformal wrapper, pass base_model_type and base_model_params:
1474
+ model = build_neural_model(
1475
+ "conformal",
1476
+ base_model_type="deep_ensemble",
1477
+ base_model_params={"n_members": 5},
1478
+ alpha=0.1,
1479
+ )
1480
+ """
1481
+ if model_type == "conformal":
1482
+ base_type = params.pop("base_model_type", "deep_ensemble")
1483
+ base_params = params.pop("base_model_params", {})
1484
+ base_model = build_neural_model(base_type, **base_params)
1485
+ return ConformalPredictionWrapper(base_model=base_model, **params)
1486
+
1487
+ cls = NEURAL_MODEL_REGISTRY.get(model_type)
1488
+ if cls is None:
1489
+ raise ValueError(
1490
+ f"Unknown model type '{model_type}'. "
1491
+ f"Available: {list(NEURAL_MODEL_REGISTRY.keys())}"
1492
+ )
1493
+ return cls(**params)
1494
+
1495
+
1496
+ # ===========================================================================
1497
+ # Quick smoke test (runs if executed directly)
1498
+ # ===========================================================================
1499
+
1500
+ if __name__ == "__main__":
1501
+ print("=" * 60)
1502
+ print("NBA Quant AI — Neural Models Smoke Test")
1503
+ print("=" * 60)
1504
+
1505
+ np.random.seed(42)
1506
+ N_TRAIN, N_TEST, N_FEAT = 500, 100, 200
1507
+
1508
+ X_train = np.random.randn(N_TRAIN, N_FEAT).astype(np.float32)
1509
+ # Inject some NaNs to test imputation
1510
+ mask = np.random.random(X_train.shape) < 0.05
1511
+ X_train[mask] = np.nan
1512
+ y_train = (np.random.random(N_TRAIN) > 0.5).astype(np.float32)
1513
+
1514
+ X_test = np.random.randn(N_TEST, N_FEAT).astype(np.float32)
1515
+ y_test = (np.random.random(N_TEST) > 0.5).astype(np.float32)
1516
+
1517
+ # Test each model (with small configs for speed)
1518
+ tests = [
1519
+ ("FT-Transformer", FTTransformerModel(
1520
+ n_tokens=32, d_token=16, n_heads=2, n_layers=1,
1521
+ epochs=5, patience=3, batch_size=128,
1522
+ )),
1523
+ ("Deep Ensemble (3 members)", DeepEnsemble(
1524
+ n_members=3, hidden_dims=(64, 32),
1525
+ epochs=5, patience=3, batch_size=128,
1526
+ )),
1527
+ ("LSTM Sequence", LSTMSequenceModel(
1528
+ seq_len=5, hidden1=32, hidden2=16, dense_dim=16,
1529
+ epochs=5, patience=3, batch_size=128,
1530
+ )),
1531
+ ("Transformer Attention", TransformerAttentionModel(
1532
+ seq_len=5, d_model=32, n_heads=2, n_layers=1,
1533
+ dim_ff=64, epochs=5, patience=3, batch_size=128,
1534
+ )),
1535
+ ]
1536
+
1537
+ for name, model in tests:
1538
+ print(f"\n--- {name} ---")
1539
+ try:
1540
+ model.fit(X_train, y_train)
1541
+ probs = model.predict_proba(X_test)
1542
+ print(f" Predictions shape: {probs.shape}")
1543
+ print(f" Mean pred: {probs.mean():.4f}, Std: {probs.std():.4f}")
1544
+ print(f" Min: {probs.min():.4f}, Max: {probs.max():.4f}")
1545
+ print(f" Params: {list(model.get_params().keys())}")
1546
+ except Exception as e:
1547
+ print(f" ERROR: {e}")
1548
+
1549
+ # Test conformal wrapper
1550
+ print("\n--- Conformal Prediction Wrapper ---")
1551
+ try:
1552
+ base = DeepEnsemble(
1553
+ n_members=2, hidden_dims=(64, 32),
1554
+ epochs=5, patience=3, batch_size=128,
1555
+ )
1556
+ conformal = ConformalPredictionWrapper(base_model=base, alpha=0.1)
1557
+ conformal.fit(X_train, y_train)
1558
+ probs = conformal.predict_proba(X_test)
1559
+ sets = conformal.predict_sets(X_test)
1560
+ lower, upper = conformal.predict_intervals(X_test)
1561
+ print(f" Point preds shape: {probs.shape}")
1562
+ print(f" Prediction sets (first 5): {sets[:5]}")
1563
+ print(f" Intervals: [{lower[:3]}] - [{upper[:3]}]")
1564
+ print(f" Avg interval width: {(upper - lower).mean():.4f}")
1565
+ except Exception as e:
1566
+ print(f" ERROR: {e}")
1567
+
1568
+ # Test TabNet (may fail if pytorch_tabnet not installed)
1569
+ print("\n--- TabNet ---")
1570
+ try:
1571
+ tab = TabNetModel(
1572
+ n_d=8, n_a=8, n_steps=3, epochs=5, patience=3, batch_size=128,
1573
+ )
1574
+ tab.fit(X_train, y_train)
1575
+ probs = tab.predict_proba(X_test)
1576
+ print(f" Predictions shape: {probs.shape}")
1577
+ print(f" Mean pred: {probs.mean():.4f}")
1578
+ fi = tab.get_feature_importances()
1579
+ if fi is not None:
1580
+ print(f" Feature importances shape: {fi.shape}")
1581
+ except ImportError:
1582
+ print(" SKIPPED (pytorch_tabnet not installed)")
1583
+ except Exception as e:
1584
+ print(f" ERROR: {e}")
1585
+
1586
+ # Test factory
1587
+ print("\n--- Factory: build_neural_model ---")
1588
+ try:
1589
+ m = build_neural_model("ft_transformer", n_tokens=32, d_token=16,
1590
+ n_heads=2, n_layers=1, epochs=3, batch_size=128)
1591
+ m.fit(X_train, y_train)
1592
+ print(f" Factory FT-Transformer OK, preds mean: {m.predict_proba(X_test).mean():.4f}")
1593
+ except Exception as e:
1594
+ print(f" ERROR: {e}")
1595
+
1596
+ print("\n" + "=" * 60)
1597
+ print("Smoke test complete.")
1598
+ print("=" * 60)
requirements.txt CHANGED
@@ -7,3 +7,8 @@ gradio>=5.0
7
  uvicorn>=0.30
8
  catboost>=1.2
9
  psycopg2-binary>=2.9
 
 
 
 
 
 
7
  uvicorn>=0.30
8
  catboost>=1.2
9
  psycopg2-binary>=2.9
10
+ # --- Neural network models (2025-2026 SOTA) ---
11
+ torch>=2.3 --index-url https://download.pytorch.org/whl/cpu
12
+ pytorch_tabnet>=4.1
13
+ mapie>=0.9
14
+ # autogluon.tabular>=1.2 # OPTIONAL — large install (~2GB), uncomment if needed