PredictLM v11.0 + Mini ship-bundle
Browse files
README.md
CHANGED
|
@@ -82,29 +82,10 @@ That's it. On the first `.predict()` call the package silently downloads its par
|
|
| 82 |
**Edge cases:**
|
| 83 |
|
| 84 |
- **No internet / air-gapped.** Pass `auto_duo=False` at load to disable partner download — `.predict()` returns the single-model in-context result.
|
| 85 |
-
- **Want explicit Duo control** (custom `w`, `n_inner`, manual orchestration)? Use the explicit `duo_ttt_predict(mini, base, ...)` helper documented below.
|
| 86 |
- **Real-time inference** (<10 ms latency)? Use `auto_duo=False` zero-tuning. Duo + TTT adds ~1-60 s per query depending on table size.
|
| 87 |
|
| 88 |
**TTT** ([Test-Time Training](https://arxiv.org/abs/2503.11842), grounded in TabPFN-2.5's [recipe](https://arxiv.org/abs/2511.08667)) does ~15 inner Adam steps of self-supervised fine-tuning on the user's in-context examples before predicting. Per-task specialization on top of a generic ICL prior. 19 / 20 datasets improved vs zero-tuning; no dataset regressed by more than 0.006.
|
| 89 |
|
| 90 |
-
### Advanced — explicit Duo + TTT (manual orchestration)
|
| 91 |
-
|
| 92 |
-
```python
|
| 93 |
-
from predictlm import PredictLM, duo_ttt_predict
|
| 94 |
-
|
| 95 |
-
mini = PredictLM.from_pretrained("zerooneresearch/predictlm-mini-13m", auto_duo=False)
|
| 96 |
-
base = PredictLM.from_pretrained("zerooneresearch/predictlm-base-26m", auto_duo=False)
|
| 97 |
-
|
| 98 |
-
p_mini = mini.fit_and_predict_with_ttt(X_train, y_train, X_test, n_inner=15, lr=1e-4)
|
| 99 |
-
p_base = base.fit_and_predict_with_ttt(X_train, y_train, X_test, n_inner=15, lr=1e-4)
|
| 100 |
-
|
| 101 |
-
# Weighted softmax average: w=0.40 for cls, w=0.25 for reg
|
| 102 |
-
p = 0.40 * p_mini + 0.60 * p_base
|
| 103 |
-
preds = p.argmax(-1)
|
| 104 |
-
```
|
| 105 |
-
|
| 106 |
-
Same numerical result as the default `.predict()`, but you control `w` (mini logit weight), `n_inner`, `lr`, etc.
|
| 107 |
-
|
| 108 |
## Architecture
|
| 109 |
|
| 110 |
Unified architecture: a shared backbone with two task heads (regression via a 1024-bin BarDistribution, classification via per-task masked softmax). The model auto-detects task type from the dtype of `y_train` and routes through the matching head. One `fit/predict` API for both. This unified framing follows [TabICLv2](https://huggingface.co/papers/2602.11139) (Soda Inria, Feb 2026); the closest non-unified precedent is [TabPFN v2](https://huggingface.co/Prior-Labs/TabPFN-v2-clf), which ships separate classifier and regressor checkpoints.
|
|
|
|
| 82 |
**Edge cases:**
|
| 83 |
|
| 84 |
- **No internet / air-gapped.** Pass `auto_duo=False` at load to disable partner download — `.predict()` returns the single-model in-context result.
|
|
|
|
| 85 |
- **Real-time inference** (<10 ms latency)? Use `auto_duo=False` zero-tuning. Duo + TTT adds ~1-60 s per query depending on table size.
|
| 86 |
|
| 87 |
**TTT** ([Test-Time Training](https://arxiv.org/abs/2503.11842), grounded in TabPFN-2.5's [recipe](https://arxiv.org/abs/2511.08667)) does ~15 inner Adam steps of self-supervised fine-tuning on the user's in-context examples before predicting. Per-task specialization on top of a generic ICL prior. 19 / 20 datasets improved vs zero-tuning; no dataset regressed by more than 0.006.
|
| 88 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
## Architecture
|
| 90 |
|
| 91 |
Unified architecture: a shared backbone with two task heads (regression via a 1024-bin BarDistribution, classification via per-task masked softmax). The model auto-detects task type from the dtype of `y_train` and routes through the matching head. One `fit/predict` API for both. This unified framing follows [TabICLv2](https://huggingface.co/papers/2602.11139) (Soda Inria, Feb 2026); the closest non-unified precedent is [TabPFN v2](https://huggingface.co/Prior-Labs/TabPFN-v2-clf), which ships separate classifier and regressor checkpoints.
|