---
license: mit
---
# Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning
Orion-MSP is a tabular foundation model for in-context learning. It uses multi-scale sparse attention and Perceiver-style memory to process tabular data at multiple granularities, capturing both local feature interactions and global dataset-level patterns.
OrionMSP can be used either directly via its own Python package or through [TabTune](https://github.com/Lexsi-Labs/TabTune), which provides a unified interface over several tabular foundation models.
## Key Features
- **Multi-Scale Sparse Attention:** Processes features at three levels (scales 1, 4, 16) using windowed, global, and random attention patterns, reducing quadratic complexity to near-linear.
- **Hierarchical Feature Understanding:** Captures patterns from individual cells to feature groups through scale-aware attention.
- **Perceiver-Style Memory:** Cross-component memory that compresses dataset information for efficient processing across samples
- **Memory-Efficient:** Block-sparse masking enables efficient processing of large tabular datasets
- **Scikit-learn Compatible:** Drop-in replacement with .fit() and .predict() methods
## Architecture
Orion-MSP consists of four main components:
- **Column-wise Embedding:** Distribution-aware feature embeddings using Induced Set Attention Blocks (ISAB)
- **Multi-Scale Row Interaction:** Sparse attention with windowed, global, and random patterns across multiple scales
- **Cross-Component Memory:** Perceiver-style memory for efficient dataset-level context
- **Dataset-wise ICL:** Enhanced predictor leveraging enriched representations for few-shot tabular classification
## Performance
Performance comparison across three benchmark suites—TALENT, OpenML-CC18, and TabZilla. Ranks are mean ranks based on accuracy (lower is better). Metrics: ACC = Accuracy, F1 = Weighted F1. 1st; 2nd.
| Models |
All |
TALENT |
OpenML-CC18 |
TabZilla |
| Rank |
Rank | ACC | F1 |
Rank | ACC | F1 |
Rank | ACC | F1 |
| XGBoost |
6.70 |
6.02 | 0.8403 | 0.8360 |
5.89 | 0.8558 | 0.8537 |
6.07 | 0.8612 | 0.8326 |
| CatBoost |
6.43 |
5.57 | 0.8336 | 0.8259 |
6.25 | 0.8588 | 0.8520 |
7.13 | 0.8579 | 0.8384 |
| Random Forest |
7.38 |
6.15 | 0.8285 | 0.8209 |
6.36 | 0.8547 | 0.8497 |
8.42 | 0.8358 | 0.8399 |
| LightGBM |
6.78 |
6.11 | 0.8331 | 0.8245 |
6.18 | 0.8581 | 0.8493 |
5.25 | 0.8618 | 0.8211 |
| TabICL |
4.96 |
4.09 | 0.8471 | 0.8379 |
4.69 | 0.8667 | 0.8623 |
5.89 | 0.8734 | 0.8698 |
| OrionBiX |
5.37 |
4.59 | 0.8346 | 0.8260 |
4.98 | 0.8653 | 0.8596 |
4.89 | 0.8728 | 0.8628 |
| OrionMSP |
3.58 |
3.26 | 0.8461 | 0.8360 |
4.12 | 0.8722 | 0.8676 |
3.84 | 0.8821 | 0.8786 |
| TabPFN |
4.61 |
3.72 | 0.8514 | 0.8412 |
4.76 | 0.8714 | 0.8663 |
4.86 | 0.8752 | 0.8716 |
| Mitra |
11.77 |
10.38 | 0.3921 | 0.2868 |
10.52 | 0.3614 | 0.2522 |
11.21 | 0.3152 | 0.1830 |
| ContextTab |
9.70 |
9.84 | 0.5474 | 0.4596 |
6.28 | 0.8639 | 0.8581 |
7.13 | 0.8389 | 0.8334 |
| TabDPT |
5.42 |
5.19 | 0.8408 | 0.8318 |
4.64 | 0.8672 | 0.8625 |
3.94 | 0.8814 | 0.8775 |
Orion-MSP is the most consistent top performer across all three benchmarks, achieving the best overall rank.
- On TALENT, it ranks **1** overall, while TabPFN edges the highest ACC/F1 by a hair.
- On OpenML-CC18, Orion-MSP attains the top ACC/F1 (0.8722/0.8676), narrowly ahead of TabPFN and TabDPT.
- On TabZilla, it leads with the highest ACC/F1 and the best rank.
- Classical baselines (XGBoost/LightGBM/CatBoost/RF) trail noticeably, highlighting Orion-MSP’s robustness across diverse tabular tasks.
Performance variation by dataset size across all benchmark suites. Rank = mean rank by accuracy (lower is better).
ACC = Accuracy; F1 = Weighted F1. Size buckets: Small (<1K), Medium (1K–10K), Large (>10K).
| Models |
Small (<1K) |
Medium (1K–10K) |
Large (>10K) |
| Rank | ACC | F1 |
Rank | ACC | F1 |
Rank | ACC | F1 |
| XGBoost |
7.70 | 0.8168 | 0.7964 |
6.88 | 0.8363 | 0.8314 |
5.41 | 0.8969 | 0.8920 |
| CatBoost |
7.88 | 0.8124 | 0.7935 |
6.47 | 0.8340 | 0.8264 |
5.48 | 0.8797 | 0.8733 |
| Random Forest |
8.55 | 0.7988 | 0.8187 |
7.16 | 0.8285 | 0.8221 |
7.30 | 0.8694 | 0.8628 |
| LightGBM |
7.80 | 0.8143 | 0.7789 |
6.94 | 0.8314 | 0.8226 |
5.63 | 0.8827 | 0.8764 |
| TabICL |
6.04 | 0.8301 | 0.8338 |
4.77 | 0.8486 | 0.8398 |
4.61 | 0.8802 | 0.8743 |
| OrionBiX |
6.32 | 0.8330 | 0.8150 |
5.48 | 0.8348 | 0.8260 |
4.42 | 0.8729 | 0.8670 |
| OrionMSP |
5.93 | 0.8232 | 0.8194 |
3.70 | 0.8494 | 0.8402 |
3.04 | 0.8843 | 0.8768 |
| TabPFN |
6.50 | 0.8325 | 0.8131 |
3.81 | 0.8557 | 0.8462 |
5.73 | 0.8783 | 0.8713 |
| Mitra |
13.88 | 0.4334 | 0.3236 |
11.59 | 0.3600 | 0.2553 |
11.11 | 0.3837 | 0.2754 |
| ContextTab |
9.60 | 0.7578 | 0.7363 |
9.52 | 0.6210 | 0.5566 |
10.22 | 0.6388 | 0.5638 |
| TabDPT |
5.48 | 0.8333 | 0.8271 |
5.40 | 0.8424 | 0.8339 |
5.26 | 0.8831 | 0.8765 |
OrionMSP is the most consistent top-ranked model as data grows (especially Medium/Large), while TabPFN peaks on Medium and GBDTs
(e.g., XGBoost) catch up in raw ACC/F1 on Large.
Performance vs. feature dimensionality. Rank = mean accuracy rank (lower is better). ACC = Accuracy; F1 = Weighted F1. Groups: Narrow (<10), Medium (10–100), Wide (>100).
1st ; 2nd within each group.
| Models |
Narrow (<10) |
Medium (10–100) |
Wide (>100) |
| Rank | ACC | F1 |
Rank | ACC | F1 |
Rank | ACC | F1 |
| XGBoost |
6.77 | 0.8222 | 0.8159 |
6.90 | 0.8482 | 0.8410 |
4.79 | 0.9140 | 0.9039 |
| CatBoost |
5.63 | 0.8145 | 0.8067 |
6.88 | 0.8441 | 0.8344 |
5.50 | 0.9157 | 0.9084 |
| Random Forest |
7.15 | 0.8005 | 0.7044 |
7.44 | 0.8410 | 0.8235 |
7.52 | 0.9034 | 0.8936 |
| LightGBM |
6.15 | 0.8128 | 0.7907 |
6.92 | 0.8458 | 0.8326 |
7.47 | 0.8999 | 0.8908 |
| TabICL |
5.14 | 0.8208 | 0.8119 |
4.61 | 0.8627 | 0.8549 |
6.46 | 0.9101 | 0.8936 |
| OrionBiX |
4.64 | 0.8112 | 0.8043 |
5.46 | 0.8510 | 0.8417 |
6.73 | 0.8859 | 0.8849 |
| OrionMSP |
3.76 | 0.8394 | 0.8314 |
4.09 | 0.8572 | 0.8478 |
5.69 | 0.8860 | 0.8837 |
| TabPFN |
5.30 | 0.8187 | 0.8092 |
4.07 | 0.8676 | 0.8589 |
6.141 | 0.9129 | 0.9111 |
| Mitra |
11.25 | 0.3737 | 0.2683 |
11.84 | 0.3886 | 0.2781 |
13.03 | 0.2521 | 0.1497 |
| ContextTab |
9.52 | 0.6391 | 0.5719 |
9.59 | 0.6480 | 0.5843 |
10.97 | 0.6017 | 0.5651 |
| TabDPT |
4.66 | 0.8262 | 0.8189 |
5.45 | 0.8566 | 0.8483 |
7.23 | 0.8845 | 0.8820 |
OrionMSP excels on narrow and stays strong on medium width, while TabPFN dominates medium-width features and GBDTs (XGBoost/CatBoost)
shine on wide feature spaces.
## Usage
### Direct (OrionMSP Python package)
```python
from orion_msp.sklearn import OrionMSPClassifier
# Initialize and use
clf = OrionMSPClassifier()
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
```
This code will automatically download the pre-trained model from Hugging Face and use a GPU if available.
### Via TabTune (unified TFM library)
```python
from tabtune import TabularPipeline
pipeline = TabularPipeline(
model_name="OrionMSP", # use OrionMSP through TabTune
tuning_strategy="inference", # zero-shot / in-context mode
tuning_params={"device": "cuda"} # or "cpu"
)
pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)
```
When used through TabTune, the OrionMSP weights are automatically downloaded from this Hugging Face repository on first use, and TabTune handles model-aware preprocessing for you.
## Installation
### Via TabTune (recommended if you want multiple tabular FMs)
```bash
pip install tabtune
```
This installs TabTune and its built-in OrionMSP integration; no separate orion-msp install is required.
### From the OrionMSP source
#### Option 1: From the local clone
```bash
cd orion-msp
pip install -e .
```
#### Option 2: From the Git Remote
```bash
pip install git+https://github.com/Lexsi-Labs/Orion-MSP.git
```
## Citation
If you use Orion-MSP, please cite our [paper](https://arxiv.org/abs/2511.02818):
```bibtex
@article{bouadi25orionmsp,
title={Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning},
author={Mohamed Bouadi and Pratinav Seth and Aditya Tanna and Vinay Kumar Sankarapu},
year={2025}
eprint={2511.02818},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2511.02818},
}
```