PredictLM v11.0 + Mini ship-bundle
Browse files
README.md
CHANGED
|
@@ -15,12 +15,6 @@ metrics:
|
|
| 15 |
- roc_auc
|
| 16 |
- r2
|
| 17 |
- rmse
|
| 18 |
-
co2_eq_emissions:
|
| 19 |
-
emissions: 20000
|
| 20 |
-
source: estimated from RunPod EU-NL grid factor (~0.3 kg CO₂/kWh) and ~80 GPU-hours at H100 ~700W TDP
|
| 21 |
-
training_type: pre-training
|
| 22 |
-
geographical_location: Netherlands
|
| 23 |
-
hardware_used: 1× NVIDIA H100 80GB SXM5 (primary), 1× NVIDIA A100 40GB SXM4 (initial run), 1× NVIDIA L40S 46GB (failed Plan-B+ probe)
|
| 24 |
model-index:
|
| 25 |
- name: predictlm-base-26m
|
| 26 |
results:
|
|
@@ -44,6 +38,26 @@ model-index:
|
|
| 44 |
- type: r2
|
| 45 |
value: 0.589
|
| 46 |
name: mean R² (n=13, seed=42, fair-set n_features ≤ 128)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
---
|
| 48 |
|
| 49 |
# predictlm-base-26m
|
|
@@ -99,7 +113,7 @@ Unified architecture: a shared backbone with two task heads (regression via a 10
|
|
| 99 |
|
| 100 |
- **Developed by**: ZeroOne Research
|
| 101 |
- **Model card contact**: open an issue at the [code repo](https://github.com/zerooneresearch/predictlm-v11) or message the org on the Hub
|
| 102 |
-
- **License**: Apache 2.0 — permissive, no attribution-only restriction
|
| 103 |
|
| 104 |
## Intended use
|
| 105 |
|
|
@@ -180,7 +194,7 @@ Per-dataset deltas (predictlm-base-26m minus baseline):
|
|
| 180 |
| Cls vs TabPFN-2.5 | -0.096 | [-0.133, -0.059] | 12 | ✅ significant loss |
|
| 181 |
| Cls vs TabICLv2 | -0.108 | [-0.150, -0.066] | 12 | ✅ significant loss |
|
| 182 |
|
| 183 |
-
**Honest read on the headline number.** The +7.3 pp mean R² advantage over XGBoost on regression is the point estimate; the 95% paired-bootstrap CI is [−4.1 pp, +19.6 pp], **so the regression win does not survive 95%-CI hypothesis testing on this 13-dataset sample.** Within-dataset variance is large (some datasets predictlm wins by 10+ pp, others XGBoost wins by 5+ pp). What we can say: on this evaluation, predictlm-base-26m trends ahead of XGBoost on regression with a positive point estimate, while neither method has a statistically dominant advantage.
|
| 184 |
|
| 185 |
**Significant losses (real, not noise):** loses to XGBoost on classification (-5.8 pp, CI [-9.4, -2.4]); loses to TabPFN-2.5 and TabICLv2 on both axes — these are commercial / SOTA models 2-4× our parameter count.
|
| 186 |
|
|
|
|
| 15 |
- roc_auc
|
| 16 |
- r2
|
| 17 |
- rmse
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
model-index:
|
| 19 |
- name: predictlm-base-26m
|
| 20 |
results:
|
|
|
|
| 38 |
- type: r2
|
| 39 |
value: 0.589
|
| 40 |
name: mean R² (n=13, seed=42, fair-set n_features ≤ 128)
|
| 41 |
+
- task:
|
| 42 |
+
type: tabular-classification
|
| 43 |
+
name: Tabular Classification (Duo + TTT recipe)
|
| 44 |
+
dataset:
|
| 45 |
+
type: openml
|
| 46 |
+
name: Locked OpenML eval (CC-18 + AMLB + TabPFN-extras), fair-set n_features ≤ 128
|
| 47 |
+
metrics:
|
| 48 |
+
- type: accuracy
|
| 49 |
+
value: 0.751
|
| 50 |
+
name: mean accuracy with Duo + TTT recipe (Mini + Base + test-time training)
|
| 51 |
+
- task:
|
| 52 |
+
type: tabular-regression
|
| 53 |
+
name: Tabular Regression (Duo + TTT recipe)
|
| 54 |
+
dataset:
|
| 55 |
+
type: openml
|
| 56 |
+
name: Locked OpenML eval (CTR-23 + AMLB), fair-set n_features ≤ 128
|
| 57 |
+
metrics:
|
| 58 |
+
- type: r2
|
| 59 |
+
value: 0.609
|
| 60 |
+
name: mean R² with Duo + TTT recipe (Mini + Base + test-time training)
|
| 61 |
---
|
| 62 |
|
| 63 |
# predictlm-base-26m
|
|
|
|
| 113 |
|
| 114 |
- **Developed by**: ZeroOne Research
|
| 115 |
- **Model card contact**: open an issue at the [code repo](https://github.com/zerooneresearch/predictlm-v11) or message the org on the Hub
|
| 116 |
+
- **License**: Apache 2.0 — permissive, commercial use allowed, no attribution-only restriction
|
| 117 |
|
| 118 |
## Intended use
|
| 119 |
|
|
|
|
| 194 |
| Cls vs TabPFN-2.5 | -0.096 | [-0.133, -0.059] | 12 | ✅ significant loss |
|
| 195 |
| Cls vs TabICLv2 | -0.108 | [-0.150, -0.066] | 12 | ✅ significant loss |
|
| 196 |
|
| 197 |
+
**Honest read on the headline number.** The +7.3 pp mean R² advantage over XGBoost on regression is the point estimate; the 95% paired-bootstrap CI is [−4.1 pp, +19.6 pp], **so the regression win does not survive 95%-CI hypothesis testing on this 13-dataset sample.** Within-dataset variance is large (some datasets predictlm wins by 10+ pp, others XGBoost wins by 5+ pp). What we can say: on this evaluation, predictlm-base-26m trends ahead of XGBoost on regression with a positive point estimate, while neither method has a statistically dominant advantage.
|
| 198 |
|
| 199 |
**Significant losses (real, not noise):** loses to XGBoost on classification (-5.8 pp, CI [-9.4, -2.4]); loses to TabPFN-2.5 and TabICLv2 on both axes — these are commercial / SOTA models 2-4× our parameter count.
|
| 200 |
|