zerooneresearch
/

predictlm-base-26m

@@ -82,29 +82,10 @@ That's it. On the first `.predict()` call the package silently downloads its par
 **Edge cases:**
 - **No internet / air-gapped.** Pass `auto_duo=False` at load to disable partner download — `.predict()` returns the single-model in-context result.
-- **Want explicit Duo control** (custom `w`, `n_inner`, manual orchestration)? Use the explicit `duo_ttt_predict(mini, base, ...)` helper documented below.
 - **Real-time inference** (<10 ms latency)? Use `auto_duo=False` zero-tuning. Duo + TTT adds ~1-60 s per query depending on table size.
 **TTT** ([Test-Time Training](https://arxiv.org/abs/2503.11842), grounded in TabPFN-2.5's [recipe](https://arxiv.org/abs/2511.08667)) does ~15 inner Adam steps of self-supervised fine-tuning on the user's in-context examples before predicting. Per-task specialization on top of a generic ICL prior. 19 / 20 datasets improved vs zero-tuning; no dataset regressed by more than 0.006.
-### Advanced — explicit Duo + TTT (manual orchestration)
-```python
-from predictlm import PredictLM, duo_ttt_predict
-mini = PredictLM.from_pretrained("zerooneresearch/predictlm-mini-13m", auto_duo=False)
-base = PredictLM.from_pretrained("zerooneresearch/predictlm-base-26m", auto_duo=False)
-p_mini = mini.fit_and_predict_with_ttt(X_train, y_train, X_test, n_inner=15, lr=1e-4)
-p_base = base.fit_and_predict_with_ttt(X_train, y_train, X_test, n_inner=15, lr=1e-4)
-# Weighted softmax average: w=0.40 for cls, w=0.25 for reg
-p = 0.40 * p_mini + 0.60 * p_base
-preds = p.argmax(-1)
-```
-Same numerical result as the default `.predict()`, but you control `w` (mini logit weight), `n_inner`, `lr`, etc.
 ## Architecture
 Unified architecture: a shared backbone with two task heads (regression via a 1024-bin BarDistribution, classification via per-task masked softmax). The model auto-detects task type from the dtype of `y_train` and routes through the matching head. One `fit/predict` API for both. This unified framing follows [TabICLv2](https://huggingface.co/papers/2602.11139) (Soda Inria, Feb 2026); the closest non-unified precedent is [TabPFN v2](https://huggingface.co/Prior-Labs/TabPFN-v2-clf), which ships separate classifier and regressor checkpoints.

 **Edge cases:**
 - **No internet / air-gapped.** Pass `auto_duo=False` at load to disable partner download — `.predict()` returns the single-model in-context result.
 - **Real-time inference** (<10 ms latency)? Use `auto_duo=False` zero-tuning. Duo + TTT adds ~1-60 s per query depending on table size.
 **TTT** ([Test-Time Training](https://arxiv.org/abs/2503.11842), grounded in TabPFN-2.5's [recipe](https://arxiv.org/abs/2511.08667)) does ~15 inner Adam steps of self-supervised fine-tuning on the user's in-context examples before predicting. Per-task specialization on top of a generic ICL prior. 19 / 20 datasets improved vs zero-tuning; no dataset regressed by more than 0.006.
 ## Architecture
 Unified architecture: a shared backbone with two task heads (regression via a 1024-bin BarDistribution, classification via per-task masked softmax). The model auto-detects task type from the dtype of `y_train` and routes through the matching head. One `fit/predict` API for both. This unified framing follows [TabICLv2](https://huggingface.co/papers/2602.11139) (Soda Inria, Feb 2026); the closest non-unified precedent is [TabPFN v2](https://huggingface.co/Prior-Labs/TabPFN-v2-clf), which ships separate classifier and regressor checkpoints.