Upload 7 files
Browse files- .gitattributes +2 -0
- README.md +6 -14
- images/image-1.png +2 -2
- images/image-2.png +0 -0
- images/image-3.png +2 -2
- images/image-4.png +2 -2
- images/image-5.png +3 -0
- images/image.png +2 -2
.gitattributes
CHANGED
|
@@ -52,3 +52,5 @@ images/image-4.png filter=lfs diff=lfs merge=lfs -text
|
|
| 52 |
images/image.png filter=lfs diff=lfs merge=lfs -text
|
| 53 |
images/TabArena-CLS.png filter=lfs diff=lfs merge=lfs -text
|
| 54 |
images/TabZilla-CLS.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 52 |
images/image.png filter=lfs diff=lfs merge=lfs -text
|
| 53 |
images/TabArena-CLS.png filter=lfs diff=lfs merge=lfs -text
|
| 54 |
images/TabZilla-CLS.png filter=lfs diff=lfs merge=lfs -text
|
| 55 |
+
images/image-2.png filter=lfs diff=lfs merge=lfs -text
|
| 56 |
+
images/image-5.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -2,17 +2,9 @@
|
|
| 2 |
|
| 3 |
**LimiX** is a new class of tabular AI model designed to overcome one of modern machine learning’s longest-standing bottlenecks: structured data. With only **2M parameters**, **LimiX-2M** sets a new state-of-the-art across classification, regression, and missing-value imputation, surpassing XGBoost, CatBoost, AutoGluon, and TabPFN, and approaching the performance level of the larger LimiX-16M. Its lightweight, training-free design makes advanced tabular modeling accessible on ordinary hardware while preserving full transparency and offline deployability.
|
| 4 |
|
| 5 |
-

|
| 8 |
-
|
| 9 |
-

|
| 10 |
-
|
| 11 |
-

|
| 12 |
|
| 13 |
-

|
| 14 |
|
| 15 |
-

|
| 16 |
|
| 17 |
|
| 18 |
|
|
@@ -38,19 +30,19 @@ LimiX adopts a 12-block transformer architecture with axis-wise attention to fea
|
|
| 38 |
|
| 39 |
To learn the joint distribution of tabular variables, LimiX is pretrained through Context-Conditional Masked Modeling (CCMM). By masking table cells and conditioning predictions on a small set of context rows, the model internalizes a wide range of conditional dependencies while adapting to new datasets without training or labels.
|
| 40 |
|
| 41 |
-

|
| 42 |
|
| 43 |
# 3. Evaluation Results
|
| 44 |
|
| 45 |
## Classification
|
| 46 |
|
| 47 |
-

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
|
|
|
| 7 |
|
|
|
|
| 8 |
|
| 9 |
|
| 10 |
|
|
|
|
| 30 |
|
| 31 |
To learn the joint distribution of tabular variables, LimiX is pretrained through Context-Conditional Masked Modeling (CCMM). By masking table cells and conditioning predictions on a small set of context rows, the model internalizes a wide range of conditional dependencies while adapting to new datasets without training or labels.
|
| 32 |
|
| 33 |
+

|
| 34 |
|
| 35 |
# 3. Evaluation Results
|
| 36 |
|
| 37 |
## Classification
|
| 38 |
|
| 39 |
+

|
| 40 |
|
| 41 |
On the BCCO-CLS benchmark, LimiX-16M establishes leading performance by significantly outperforming AutoGluon and all PFN variants in mean AUC, Accuracy, and F1 scores, with substantially better ranks. LimiX-2M also marks a clear lead over these models in most metrics, except for its AUC rank.
|
| 42 |
|
| 43 |
## Regression
|
| 44 |
|
| 45 |
+

|
| 46 |
|
| 47 |
LimiX-16M achieves the best overall scores and rankings on TALENT-REG, with the PFN models and LimiX-2M emerging as close runners-up in both R² and RMSE.
|
| 48 |
|
|
|
|
| 50 |
|
| 51 |
LimiX introduces the first training-free, in-context approach for missing-value imputation on entirely new datasets. Across a wide set of real-world benchmarks, LimiX-16M delivers the best performance, achieving lower RMSE and error rates than classical and learned imputers including KNN, MICE, MissForest, GAIN, and MIWAE. Unlike all prior methods, which depend on additional fitting, LimiX performs imputation directly from context with consistently superior accuracy.
|
| 52 |
|
| 53 |
+

|
| 54 |
|
| 55 |
## Finetune
|
| 56 |
|
| 57 |
Using an attention-based retrieval–guided downsampling strategy, LimiX-16M fine-tunes on compact, highly relevant in-context episodes rather than full long contexts, substantially improving sample efficiency and reducing training cost. This approach enables LimiX-16M to significantly outperform strong baselines such as TabDPT and TabPFN-v2, with notable AUC gains across BCCO-CLS datasets.
|
| 58 |
|
| 59 |
+

|
| 60 |
|
| 61 |
# 4. Deployment
|
| 62 |
|
images/image-1.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
images/image-2.png
CHANGED
|
|
Git LFS Details
|
images/image-3.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
images/image-4.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
images/image-5.png
ADDED
|
Git LFS Details
|
images/image.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|