01RAI commited on
Commit
97464f4
·
verified ·
1 Parent(s): c867b82

PredictLM v11.0 + Mini ship-bundle

Browse files
Files changed (1) hide show
  1. README.md +22 -8
README.md CHANGED
@@ -15,12 +15,6 @@ metrics:
15
  - roc_auc
16
  - r2
17
  - rmse
18
- co2_eq_emissions:
19
- emissions: 20000
20
- source: estimated from RunPod EU-NL grid factor (~0.3 kg CO₂/kWh) and ~80 GPU-hours at H100 ~700W TDP
21
- training_type: pre-training
22
- geographical_location: Netherlands
23
- hardware_used: 1× NVIDIA H100 80GB SXM5 (primary), 1× NVIDIA A100 40GB SXM4 (initial run), 1× NVIDIA L40S 46GB (failed Plan-B+ probe)
24
  model-index:
25
  - name: predictlm-base-26m
26
  results:
@@ -44,6 +38,26 @@ model-index:
44
  - type: r2
45
  value: 0.589
46
  name: mean R² (n=13, seed=42, fair-set n_features ≤ 128)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ---
48
 
49
  # predictlm-base-26m
@@ -99,7 +113,7 @@ Unified architecture: a shared backbone with two task heads (regression via a 10
99
 
100
  - **Developed by**: ZeroOne Research
101
  - **Model card contact**: open an issue at the [code repo](https://github.com/zerooneresearch/predictlm-v11) or message the org on the Hub
102
- - **License**: Apache 2.0 — permissive, no attribution-only restriction (more permissive than TabPFN v2's bespoke license)
103
 
104
  ## Intended use
105
 
@@ -180,7 +194,7 @@ Per-dataset deltas (predictlm-base-26m minus baseline):
180
  | Cls vs TabPFN-2.5 | -0.096 | [-0.133, -0.059] | 12 | ✅ significant loss |
181
  | Cls vs TabICLv2 | -0.108 | [-0.150, -0.066] | 12 | ✅ significant loss |
182
 
183
- **Honest read on the headline number.** The +7.3 pp mean R² advantage over XGBoost on regression is the point estimate; the 95% paired-bootstrap CI is [−4.1 pp, +19.6 pp], **so the regression win does not survive 95%-CI hypothesis testing on this 13-dataset sample.** Within-dataset variance is large (some datasets predictlm wins by 10+ pp, others XGBoost wins by 5+ pp). What we can say: on this evaluation, predictlm-base-26m trends ahead of XGBoost on regression with a positive point estimate, while neither method has a statistically dominant advantage. Wider multi-seed evals are planned for v11.0.7.
184
 
185
  **Significant losses (real, not noise):** loses to XGBoost on classification (-5.8 pp, CI [-9.4, -2.4]); loses to TabPFN-2.5 and TabICLv2 on both axes — these are commercial / SOTA models 2-4× our parameter count.
186
 
 
15
  - roc_auc
16
  - r2
17
  - rmse
 
 
 
 
 
 
18
  model-index:
19
  - name: predictlm-base-26m
20
  results:
 
38
  - type: r2
39
  value: 0.589
40
  name: mean R² (n=13, seed=42, fair-set n_features ≤ 128)
41
+ - task:
42
+ type: tabular-classification
43
+ name: Tabular Classification (Duo + TTT recipe)
44
+ dataset:
45
+ type: openml
46
+ name: Locked OpenML eval (CC-18 + AMLB + TabPFN-extras), fair-set n_features ≤ 128
47
+ metrics:
48
+ - type: accuracy
49
+ value: 0.751
50
+ name: mean accuracy with Duo + TTT recipe (Mini + Base + test-time training)
51
+ - task:
52
+ type: tabular-regression
53
+ name: Tabular Regression (Duo + TTT recipe)
54
+ dataset:
55
+ type: openml
56
+ name: Locked OpenML eval (CTR-23 + AMLB), fair-set n_features ≤ 128
57
+ metrics:
58
+ - type: r2
59
+ value: 0.609
60
+ name: mean R² with Duo + TTT recipe (Mini + Base + test-time training)
61
  ---
62
 
63
  # predictlm-base-26m
 
113
 
114
  - **Developed by**: ZeroOne Research
115
  - **Model card contact**: open an issue at the [code repo](https://github.com/zerooneresearch/predictlm-v11) or message the org on the Hub
116
+ - **License**: Apache 2.0 — permissive, commercial use allowed, no attribution-only restriction
117
 
118
  ## Intended use
119
 
 
194
  | Cls vs TabPFN-2.5 | -0.096 | [-0.133, -0.059] | 12 | ✅ significant loss |
195
  | Cls vs TabICLv2 | -0.108 | [-0.150, -0.066] | 12 | ✅ significant loss |
196
 
197
+ **Honest read on the headline number.** The +7.3 pp mean R² advantage over XGBoost on regression is the point estimate; the 95% paired-bootstrap CI is [−4.1 pp, +19.6 pp], **so the regression win does not survive 95%-CI hypothesis testing on this 13-dataset sample.** Within-dataset variance is large (some datasets predictlm wins by 10+ pp, others XGBoost wins by 5+ pp). What we can say: on this evaluation, predictlm-base-26m trends ahead of XGBoost on regression with a positive point estimate, while neither method has a statistically dominant advantage.
198
 
199
  **Significant losses (real, not noise):** loses to XGBoost on classification (-5.8 pp, CI [-9.4, -2.4]); loses to TabPFN-2.5 and TabICLv2 on both axes — these are commercial / SOTA models 2-4× our parameter count.
200