stableai-org commited on
Commit
dea6dbf
·
verified ·
1 Parent(s): 744a3c7

Upload 7 files

Browse files
.gitattributes CHANGED
@@ -52,3 +52,5 @@ images/image-4.png filter=lfs diff=lfs merge=lfs -text
52
  images/image.png filter=lfs diff=lfs merge=lfs -text
53
  images/TabArena-CLS.png filter=lfs diff=lfs merge=lfs -text
54
  images/TabZilla-CLS.png filter=lfs diff=lfs merge=lfs -text
 
 
 
52
  images/image.png filter=lfs diff=lfs merge=lfs -text
53
  images/TabArena-CLS.png filter=lfs diff=lfs merge=lfs -text
54
  images/TabZilla-CLS.png filter=lfs diff=lfs merge=lfs -text
55
+ images/image-2.png filter=lfs diff=lfs merge=lfs -text
56
+ images/image-5.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -2,17 +2,9 @@
2
 
3
  **LimiX** is a new class of tabular AI model designed to overcome one of modern machine learning’s longest-standing bottlenecks: structured data. With only **2M parameters**, **LimiX-2M** sets a new state-of-the-art across classification, regression, and missing-value imputation, surpassing XGBoost, CatBoost, AutoGluon, and TabPFN, and approaching the performance level of the larger LimiX-16M. Its lightweight, training-free design makes advanced tabular modeling accessible on ordinary hardware while preserving full transparency and offline deployability.
4
 
5
- ![](images/BCCO-CLS.png)
6
-
7
- ![](images/TabArena-CLS.png)
8
-
9
- ![](images/TabZilla-CLS.png)
10
-
11
- ![](images/BCCO-REG.png)
12
 
13
- ![](images/TabArena-REG.png)
14
 
15
- ![](images/CTR23-REG.png)
16
 
17
 
18
 
@@ -38,19 +30,19 @@ LimiX adopts a 12-block transformer architecture with axis-wise attention to fea
38
 
39
  To learn the joint distribution of tabular variables, LimiX is pretrained through Context-Conditional Masked Modeling (CCMM). By masking table cells and conditioning predictions on a small set of context rows, the model internalizes a wide range of conditional dependencies while adapting to new datasets without training or labels.
40
 
41
- ![](images/image.png)
42
 
43
  # 3. Evaluation Results
44
 
45
  ## Classification
46
 
47
- ![](images/image-1.png)
48
 
49
  On the BCCO-CLS benchmark, LimiX-16M establishes leading performance by significantly outperforming AutoGluon and all PFN variants in mean AUC, Accuracy, and F1 scores, with substantially better ranks. LimiX-2M also marks a clear lead over these models in most metrics, except for its AUC rank.
50
 
51
  ## Regression
52
 
53
- ![](images/image-2.png)
54
 
55
  LimiX-16M achieves the best overall scores and rankings on TALENT-REG, with the PFN models and LimiX-2M emerging as close runners-up in both R² and RMSE.
56
 
@@ -58,13 +50,13 @@ LimiX-16M achieves the best overall scores and rankings on TALENT-REG, with the
58
 
59
  LimiX introduces the first training-free, in-context approach for missing-value imputation on entirely new datasets. Across a wide set of real-world benchmarks, LimiX-16M delivers the best performance, achieving lower RMSE and error rates than classical and learned imputers including KNN, MICE, MissForest, GAIN, and MIWAE. Unlike all prior methods, which depend on additional fitting, LimiX performs imputation directly from context with consistently superior accuracy.
60
 
61
- ![](images/image-3.png)
62
 
63
  ## Finetune
64
 
65
  Using an attention-based retrieval–guided downsampling strategy, LimiX-16M fine-tunes on compact, highly relevant in-context episodes rather than full long contexts, substantially improving sample efficiency and reducing training cost. This approach enables LimiX-16M to significantly outperform strong baselines such as TabDPT and TabPFN-v2, with notable AUC gains across BCCO-CLS datasets.
66
 
67
- ![](images/image-4.png)
68
 
69
  # 4. Deployment
70
 
 
2
 
3
  **LimiX** is a new class of tabular AI model designed to overcome one of modern machine learning’s longest-standing bottlenecks: structured data. With only **2M parameters**, **LimiX-2M** sets a new state-of-the-art across classification, regression, and missing-value imputation, surpassing XGBoost, CatBoost, AutoGluon, and TabPFN, and approaching the performance level of the larger LimiX-16M. Its lightweight, training-free design makes advanced tabular modeling accessible on ordinary hardware while preserving full transparency and offline deployability.
4
 
5
+ ![](images/image.png)
 
 
 
 
 
 
6
 
 
7
 
 
8
 
9
 
10
 
 
30
 
31
  To learn the joint distribution of tabular variables, LimiX is pretrained through Context-Conditional Masked Modeling (CCMM). By masking table cells and conditioning predictions on a small set of context rows, the model internalizes a wide range of conditional dependencies while adapting to new datasets without training or labels.
32
 
33
+ ![](images/image-5.png)
34
 
35
  # 3. Evaluation Results
36
 
37
  ## Classification
38
 
39
+ ![](images/image-4.png)
40
 
41
  On the BCCO-CLS benchmark, LimiX-16M establishes leading performance by significantly outperforming AutoGluon and all PFN variants in mean AUC, Accuracy, and F1 scores, with substantially better ranks. LimiX-2M also marks a clear lead over these models in most metrics, except for its AUC rank.
42
 
43
  ## Regression
44
 
45
+ ![](images/image-3.png)
46
 
47
  LimiX-16M achieves the best overall scores and rankings on TALENT-REG, with the PFN models and LimiX-2M emerging as close runners-up in both R² and RMSE.
48
 
 
50
 
51
  LimiX introduces the first training-free, in-context approach for missing-value imputation on entirely new datasets. Across a wide set of real-world benchmarks, LimiX-16M delivers the best performance, achieving lower RMSE and error rates than classical and learned imputers including KNN, MICE, MissForest, GAIN, and MIWAE. Unlike all prior methods, which depend on additional fitting, LimiX performs imputation directly from context with consistently superior accuracy.
52
 
53
+ ![](images/image-1.png)
54
 
55
  ## Finetune
56
 
57
  Using an attention-based retrieval–guided downsampling strategy, LimiX-16M fine-tunes on compact, highly relevant in-context episodes rather than full long contexts, substantially improving sample efficiency and reducing training cost. This approach enables LimiX-16M to significantly outperform strong baselines such as TabDPT and TabPFN-v2, with notable AUC gains across BCCO-CLS datasets.
58
 
59
+ ![](images/image-2.png)
60
 
61
  # 4. Deployment
62
 
images/image-1.png CHANGED

Git LFS Details

  • SHA256: 01302c57b9fefce9e157bb2c78cedb577e314acbe98b742a6f3588f907b4fb95
  • Pointer size: 131 Bytes
  • Size of remote file: 152 kB

Git LFS Details

  • SHA256: a56da429b53a15278f730d7bc4254c4579cbfd8cb52ef87738d62f9f6610f7a8
  • Pointer size: 131 Bytes
  • Size of remote file: 201 kB
images/image-2.png CHANGED

Git LFS Details

  • SHA256: 01ba90cbbc80b3e7e0861cc0f5830bb93c36ecc889884ec7628c3c18e6e8f604
  • Pointer size: 131 Bytes
  • Size of remote file: 125 kB
images/image-3.png CHANGED

Git LFS Details

  • SHA256: a56da429b53a15278f730d7bc4254c4579cbfd8cb52ef87738d62f9f6610f7a8
  • Pointer size: 131 Bytes
  • Size of remote file: 201 kB

Git LFS Details

  • SHA256: 50c338c024b5b056d99b1b1380ca2f1a0745472d60989309b42d61c85abe76b5
  • Pointer size: 130 Bytes
  • Size of remote file: 87.5 kB
images/image-4.png CHANGED

Git LFS Details

  • SHA256: 01ba90cbbc80b3e7e0861cc0f5830bb93c36ecc889884ec7628c3c18e6e8f604
  • Pointer size: 131 Bytes
  • Size of remote file: 125 kB

Git LFS Details

  • SHA256: 61ad6baffb7971131d506ca4a91f2126ee1beee4415361657483aa95eb8d5dd5
  • Pointer size: 131 Bytes
  • Size of remote file: 221 kB
images/image-5.png ADDED

Git LFS Details

  • SHA256: 4a79e90bfb6d66f2b31f13979e47068585ccc65124bf499f1e6ccca44c4db318
  • Pointer size: 131 Bytes
  • Size of remote file: 239 kB
images/image.png CHANGED

Git LFS Details

  • SHA256: 4a79e90bfb6d66f2b31f13979e47068585ccc65124bf499f1e6ccca44c4db318
  • Pointer size: 131 Bytes
  • Size of remote file: 239 kB

Git LFS Details

  • SHA256: 918d472b826d1d32bae645aa41f9372cbea8f3865287239ff4cb5d71a21d0e9a
  • Pointer size: 131 Bytes
  • Size of remote file: 162 kB