File size: 19,225 Bytes

---
license: mit
---
<div align="center">
  <img src="logo.png" alt="Orion-MSP Logo" width="700"/>
</div>

<div align="center">
  <a href="https://www.lexsi.ai">
    <img src="https://img.shields.io/badge/Lexsi-Homepage-FF6B6B?style=for-the-badge" alt="Homepage"/>
  </a>
  <a href="https://huggingface.co/Lexsi">
    <img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Lexsi AI-FFD21E?style=for-the-badge" alt="Hugging Face"/>
  </a>
  <a href="https://discord.gg/dSB62Q7A">
    <img src="https://img.shields.io/badge/Discord-Join-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"/>
  </a>
  <a href="https://github.com/Lexsi-Labs/Orion-MSP">
    <img src="https://img.shields.io/badge/GitHub-Orion%20MSP-181717?style=for-the-badge&logo=github&logoColor=white" alt="Orion MSP GitHub"/>
  </a>
  <!-- TabTune repo -->
  <a href="https://github.com/Lexsi-Labs/TabTune">
    <img src="https://img.shields.io/badge/GitHub-TabTune-181717?style=for-the-badge&logo=github&logoColor=white" alt="TabTune GitHub"/>
  </a>
</div>


# Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

Orion-MSP is a tabular foundation model for in-context learning. It uses multi-scale sparse attention and Perceiver-style memory to process tabular data at multiple granularities, capturing both local feature interactions and global dataset-level patterns.

OrionMSP can be used either directly via its own Python package or through [TabTune](https://github.com/Lexsi-Labs/TabTune), which provides a unified interface over several tabular foundation models.

## Key Features

- **Multi-Scale Sparse Attention:** Processes features at three levels (scales 1, 4, 16) using windowed, global, and random attention patterns, reducing quadratic complexity to near-linear.
- **Hierarchical Feature Understanding:** Captures patterns from individual cells to feature groups through scale-aware attention.
- **Perceiver-Style Memory:** Cross-component memory that compresses dataset information for efficient processing across samples
- **Memory-Efficient:** Block-sparse masking enables efficient processing of large tabular datasets
- **Scikit-learn Compatible:** Drop-in replacement with .fit() and .predict() methods
  

## Architecture
Orion-MSP consists of four main components:
- **Column-wise Embedding:** Distribution-aware feature embeddings using Induced Set Attention Blocks (ISAB)
- **Multi-Scale Row Interaction:** Sparse attention with windowed, global, and random patterns across multiple scales
- **Cross-Component Memory:** Perceiver-style memory for efficient dataset-level context
- **Dataset-wise ICL:** Enhanced predictor leveraging enriched representations for few-shot tabular classification

## Performance
<!-- Orion-MSP: Benchmark Summary Table -->
<style>
  /* Container adds horizontal scroll on small screens */
  .hf-table-wrap { overflow-x: auto; margin: 1rem 0; }
  table.bench { border-collapse: collapse; width: 100%; font-size: 0.9rem; }
  table.bench caption { font-weight: 600; text-align: left; margin-bottom: .5rem; }
  table.bench th, table.bench td { border-bottom: 1px solid #e5e7eb; padding: 6px 8px; }
  table.bench thead th { border-bottom: 2px solid #d1d5db; background: #fafafa; }
  table.bench th.sticky { position: sticky; top: 0; z-index: 1; }
  /* Alignment */
  table.bench th, table.bench td { text-align: center; }
  table.bench th:first-child, table.bench td:first-child { text-align: left; white-space: nowrap; }
  /* Emphasis helpers */
  .first { font-weight: 700; }                 /* 1st place (bold) */
  .second { text-decoration: underline; }      /* 2nd place (underline) */
  .first.second { font-weight: 700; text-decoration: underline; } /* bold+underline */
</style>

<div class="hf-table-wrap">
  <table class="bench">
    <caption>Performance comparison across three benchmark suites—TALENT, OpenML-CC18, and TabZilla. Ranks are mean ranks based on accuracy (lower is better). Metrics: ACC = Accuracy, F1 = Weighted F1. <span class="first">1st</span>; <span class="second">2nd</span>.</caption>
    <thead>
      <tr>
        <th class="sticky" rowspan="2">Models</th>
        <th class="sticky" colspan="1">All</th>
        <th class="sticky" colspan="3">TALENT</th>
        <th class="sticky" colspan="3">OpenML-CC18</th>
        <th class="sticky" colspan="3">TabZilla</th>
      </tr>
      <tr>
        <th class="sticky">Rank</th>
        <th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
        <th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
        <th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>XGBoost</td>
        <td>6.70</td>
        <td>6.02</td><td>0.8403</td><td>0.8360</td>
        <td>5.89</td><td>0.8558</td><td>0.8537</td>
        <td>6.07</td><td>0.8612</td><td>0.8326</td>
      </tr>
      <tr>
        <td>CatBoost</td>
        <td>6.43</td>
        <td>5.57</td><td>0.8336</td><td>0.8259</td>
        <td>6.25</td><td>0.8588</td><td>0.8520</td>
        <td>7.13</td><td>0.8579</td><td>0.8384</td>
      </tr>
      <tr>
        <td>Random Forest</td>
        <td>7.38</td>
        <td>6.15</td><td>0.8285</td><td>0.8209</td>
        <td>6.36</td><td>0.8547</td><td>0.8497</td>
        <td>8.42</td><td>0.8358</td><td>0.8399</td>
      </tr>
      <tr>
        <td>LightGBM</td>
        <td>6.78</td>
        <td>6.11</td><td>0.8331</td><td>0.8245</td>
        <td>6.18</td><td>0.8581</td><td>0.8493</td>
        <td>5.25</td><td>0.8618</td><td>0.8211</td>
      </tr>
      <tr>
        <td>TabICL</td>
        <td>4.96</td>
        <td>4.09</td><td class="second">0.8471</td><td class="second">0.8379</td>
        <td>4.69</td><td>0.8667</td><td>0.8623</td>
        <td>5.89</td><td>0.8734</td><td>0.8698</td>
      </tr>
      <tr>
        <td>OrionBiX</td>
        <td>5.37</td>
        <td>4.59</td><td>0.8346</td><td>0.8260</td>
        <td>4.98</td><td>0.8653</td><td>0.8596</td>
        <td>4.89</td><td>0.8728</td><td>0.8628</td>
      </tr>
      <tr>
        <td><span class="first second">OrionMSP</span></td>
        <td>3.58</td>
        <td class="first second">3.26</td><td>0.8461</td><td>0.8360</td>
        <td class="first second">4.12</td><td class="first second">0.8722</td><td class="first second">0.8676</td>
        <td class="first second">3.84</td><td class="first second">0.8821</td><td class="first second">0.8786</td>
      </tr>
      <tr>
        <td><span class="second">TabPFN</span></td>
        <td>4.61</td>
        <td class="second">3.72</td><td class="first second">0.8514</td><td class="first second">0.8412</td>
        <td>4.76</td><td class="second">0.8714</td><td class="second">0.8663</td>
        <td>4.86</td><td>0.8752</td><td>0.8716</td>
      </tr>
      <tr>
        <td>Mitra</td>
        <td>11.77</td>
        <td>10.38</td><td>0.3921</td><td>0.2868</td>
        <td>10.52</td><td>0.3614</td><td>0.2522</td>
        <td>11.21</td><td>0.3152</td><td>0.1830</td>
      </tr>
      <tr>
        <td>ContextTab</td>
        <td>9.70</td>
        <td>9.84</td><td>0.5474</td><td>0.4596</td>
        <td>6.28</td><td>0.8639</td><td>0.8581</td>
        <td>7.13</td><td>0.8389</td><td>0.8334</td>
      </tr>
      <tr>
        <td>TabDPT</td>
        <td>5.42</td>
        <td>5.19</td><td>0.8408</td><td>0.8318</td>
        <td class="second">4.64</td><td>0.8672</td><td>0.8625</td>
        <td class="second">3.94</td><td class="second">0.8814</td><td class="second">0.8775</td>
      </tr>
    </tbody>
  </table>
</div>

Orion-MSP is the most consistent top performer across all three benchmarks, achieving the best overall rank.
- On TALENT, it ranks **1** overall, while TabPFN edges the highest ACC/F1 by a hair.
- On OpenML-CC18, Orion-MSP attains the top ACC/F1 (0.8722/0.8676), narrowly ahead of TabPFN and TabDPT.
- On TabZilla, it leads with the highest ACC/F1 and the best rank.
- Classical baselines (XGBoost/LightGBM/CatBoost/RF) trail noticeably, highlighting Orion-MSP’s robustness across diverse tabular tasks.


<!-- Orion-MSP: Size Analysis Table -->
<style>
  /* Container adds horizontal scroll on small screens */
  .hf-table-wrap { overflow-x: auto; margin: 1rem 0; }
  table.bench { border-collapse: collapse; width: 100%; font-size: 0.9rem; }
  table.bench caption { font-weight: 600; text-align: left; margin-bottom: .5rem; }
  table.bench th, table.bench td { border-bottom: 1px solid #e5e7eb; padding: 6px 8px; }
  table.bench thead th { border-bottom: 2px solid #d1d5db; background: #fafafa; }
  table.bench th.sticky { position: sticky; top: 0; z-index: 1; }
  /* Alignment */
  table.bench th, table.bench td { text-align: center; }
  table.bench th:first-child, table.bench td:first-child { text-align: left; white-space: nowrap; }
  /* Emphasis helpers */
  .first { font-weight: 700; }                 /* 1st place (bold) */
  .second { text-decoration: underline; }      /* 2nd place (underline) */
  .first.second { font-weight: 700; text-decoration: underline; } /* bold+underline */
</style>

<div class="hf-table-wrap">
  <table class="bench">
    <caption>
      Performance variation by dataset size across all benchmark suites. Rank = mean rank by accuracy (lower is better).
      ACC = Accuracy; F1 = Weighted F1. Size buckets: Small (&lt;1K), Medium (1K–10K), Large (&gt;10K).
    </caption>
    <thead>
      <tr>
        <th class="sticky" rowspan="2">Models</th>
        <th class="sticky" colspan="3">Small (&lt;1K)</th>
        <th class="sticky" colspan="3">Medium (1K–10K)</th>
        <th class="sticky" colspan="3">Large (&gt;10K)</th>
      </tr>
      <tr>
        <th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
        <th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
        <th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>XGBoost</td>
        <td>7.70</td><td>0.8168</td><td>0.7964</td>
        <td>6.88</td><td>0.8363</td><td>0.8314</td>
        <td>5.41</td><td class="first">0.8969</td><td class="first">0.8920</td>
      </tr>
      <tr>
        <td>CatBoost</td>
        <td>7.88</td><td>0.8124</td><td>0.7935</td>
        <td>6.47</td><td>0.8340</td><td>0.8264</td>
        <td>5.48</td><td>0.8797</td><td>0.8733</td>
      </tr>
      <tr>
        <td>Random Forest</td>
        <td>8.55</td><td>0.7988</td><td>0.8187</td>
        <td>7.16</td><td>0.8285</td><td>0.8221</td>
        <td>7.30</td><td>0.8694</td><td>0.8628</td>
      </tr>
      <tr>
        <td>LightGBM</td>
        <td>7.80</td><td>0.8143</td><td>0.7789</td>
        <td>6.94</td><td>0.8314</td><td>0.8226</td>
        <td>5.63</td><td>0.8827</td><td>0.8764</td>
      </tr>
      <tr>
        <td>TabICL</td>
        <td>6.04</td><td>0.8301</td><td class="first">0.8338</td>
        <td>4.77</td><td>0.8486</td><td>0.8398</td>
        <td>4.61</td><td>0.8802</td><td>0.8743</td>
      </tr>
      <tr>
        <td>OrionBiX</td>
        <td>6.32</td><td class="second">0.8330</td><td>0.8150</td>
        <td>5.48</td><td>0.8348</td><td>0.8260</td>
        <td class="second">4.42</td><td>0.8729</td><td>0.8670</td>
      </tr>
      <tr>
        <td>OrionMSP</td>
        <td class="second">5.93</td><td>0.8232</td><td>0.8194</td>
        <td class="first">3.70</td><td class="second">0.8494</td><td class="second">0.8402</td>
        <td class="first">3.04</td><td class="second">0.8843</td><td class="second">0.8768</td>
      </tr>
      <tr>
        <td>TabPFN</td>
        <td>6.50</td><td>0.8325</td><td>0.8131</td>
        <td class="second">3.81</td><td class="first">0.8557</td><td class="first">0.8462</td>
        <td>5.73</td><td>0.8783</td><td>0.8713</td>
      </tr>
      <tr>
        <td>Mitra</td>
        <td>13.88</td><td>0.4334</td><td>0.3236</td>
        <td>11.59</td><td>0.3600</td><td>0.2553</td>
        <td>11.11</td><td>0.3837</td><td>0.2754</td>
      </tr>
      <tr>
        <td>ContextTab</td>
        <td>9.60</td><td>0.7578</td><td>0.7363</td>
        <td>9.52</td><td>0.6210</td><td>0.5566</td>
        <td>10.22</td><td>0.6388</td><td>0.5638</td>
      </tr>
      <tr>
        <td>TabDPT</td>
        <td class="first">5.48</td><td class="first">0.8333</td><td class="second">0.8271</td>
        <td>5.40</td><td>0.8424</td><td>0.8339</td>
        <td>5.26</td><td>0.8831</td><td>0.8765</td>
      </tr>
    </tbody>
  </table>
</div>

OrionMSP is the most consistent top-ranked model as data grows (especially Medium/Large), while TabPFN peaks on Medium and GBDTs 
(e.g., XGBoost) catch up in raw ACC/F1 on Large.

<!-- Orion-MSP: Width (Feature Dimensionality) Analysis Table -->
<style>
  /* Container adds horizontal scroll on small screens */
  .hf-table-wrap { overflow-x: auto; margin: 1rem 0; }
  table.bench { border-collapse: collapse; width: 100%; font-size: 0.9rem; }
  table.bench caption { font-weight: 600; text-align: left; margin-bottom: .5rem; }
  table.bench th, table.bench td { border-bottom: 1px solid #e5e7eb; padding: 6px 8px; }
  table.bench thead th { border-bottom: 2px solid #d1d5db; background: #fafafa; }
  table.bench th.sticky { position: sticky; top: 0; z-index: 1; }
  /* Alignment */
  table.bench th, table.bench td { text-align: center; }
  table.bench th:first-child, table.bench td:first-child { text-align: left; white-space: nowrap; }
  /* Emphasis helpers */
  .first { font-weight: 700; }                 /* 1st place (bold) */
  .second { text-decoration: underline; }      /* 2nd place (underline) */
  .first.second { font-weight: 700; text-decoration: underline; } /* bold+underline */
</style>

<div class="hf-table-wrap">
  <table class="bench">
    <caption>
      Performance vs. feature dimensionality. Rank = mean accuracy rank (lower is better). ACC = Accuracy; F1 = Weighted F1. Groups: Narrow (&lt;10), Medium (10–100), Wide (&gt;100). 
      <span class="first">1st</span> ; <span class="second">2nd</span> within each group.
    </caption>
    <thead>
      <tr>
        <th class="sticky" rowspan="2">Models</th>
        <th class="sticky" colspan="3">Narrow (&lt;10)</th>
        <th class="sticky" colspan="3">Medium (10–100)</th>
        <th class="sticky" colspan="3">Wide (&gt;100)</th>
      </tr>
      <tr>
        <th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
        <th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
        <th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>XGBoost</td>
        <td>6.77</td><td>0.8222</td><td>0.8159</td>
        <td>6.90</td><td>0.8482</td><td>0.8410</td>
        <td class="first">4.79</td><td>0.9140</td><td>0.9039</td>
      </tr>
      <tr>
        <td>CatBoost</td>
        <td>5.63</td><td>0.8145</td><td>0.8067</td>
        <td>6.88</td><td>0.8441</td><td>0.8344</td>
        <td class="second">5.50</td><td class="first">0.9157</td><td class="second">0.9084</td>
      </tr>
      <tr>
        <td>Random Forest</td>
        <td>7.15</td><td>0.8005</td><td>0.7044</td>
        <td>7.44</td><td>0.8410</td><td>0.8235</td>
        <td>7.52</td><td>0.9034</td><td>0.8936</td>
      </tr>
      <tr>
        <td>LightGBM</td>
        <td>6.15</td><td>0.8128</td><td>0.7907</td>
        <td>6.92</td><td>0.8458</td><td>0.8326</td>
        <td>7.47</td><td>0.8999</td><td>0.8908</td>
      </tr>
      <tr>
        <td>TabICL</td>
        <td>5.14</td><td>0.8208</td><td>0.8119</td>
        <td>4.61</td><td class="second">0.8627</td><td class="second">0.8549</td>
        <td>6.46</td><td>0.9101</td><td>0.8936</td>
      </tr>
      <tr>
        <td>OrionBiX</td>
        <td class="second">4.64</td><td>0.8112</td><td>0.8043</td>
        <td>5.46</td><td>0.8510</td><td>0.8417</td>
        <td>6.73</td><td>0.8859</td><td>0.8849</td>
      </tr>
      <tr>
        <td>OrionMSP</td>
        <td class="first">3.76</td><td class="first">0.8394</td><td class="first">0.8314</td>
        <td class="second">4.09</td><td>0.8572</td><td>0.8478</td>
        <td>5.69</td><td>0.8860</td><td>0.8837</td>
      </tr>
      <tr>
        <td>TabPFN</td>
        <td>5.30</td><td>0.8187</td><td>0.8092</td>
        <td class="first">4.07</td><td class="first">0.8676</td><td class="first">0.8589</td>
        <td>6.141</td><td class="second">0.9129</td><td class="first">0.9111</td>
      </tr>
      <tr>
        <td>Mitra</td>
        <td>11.25</td><td>0.3737</td><td>0.2683</td>
        <td>11.84</td><td>0.3886</td><td>0.2781</td>
        <td>13.03</td><td>0.2521</td><td>0.1497</td>
      </tr>
      <tr>
        <td>ContextTab</td>
        <td>9.52</td><td>0.6391</td><td>0.5719</td>
        <td>9.59</td><td>0.6480</td><td>0.5843</td>
        <td>10.97</td><td>0.6017</td><td>0.5651</td>
      </tr>
      <tr>
        <td>TabDPT</td>
        <td>4.66</td><td class="second">0.8262</td><td class="second">0.8189</td>
        <td>5.45</td><td>0.8566</td><td>0.8483</td>
        <td>7.23</td><td>0.8845</td><td>0.8820</td>
      </tr>
    </tbody>
  </table>
</div>

OrionMSP excels on narrow and stays strong on medium width, while TabPFN dominates medium-width features and GBDTs (XGBoost/CatBoost) 
shine on wide feature spaces.

## Usage
### Direct (OrionMSP Python package)

```python
from orion_msp.sklearn import OrionMSPClassifier

# Initialize and use
clf = OrionMSPClassifier()
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
```

This code will automatically download the pre-trained model from Hugging Face and use a GPU if available.

### Via TabTune (unified TFM library)

```python
from tabtune import TabularPipeline

pipeline = TabularPipeline(
    model_name="OrionMSP",          # use OrionMSP through TabTune
    tuning_strategy="inference",    # zero-shot / in-context mode
    tuning_params={"device": "cuda"}  # or "cpu"
)

pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)
```

When used through TabTune, the OrionMSP weights are automatically downloaded from this Hugging Face repository on first use, and TabTune handles model-aware preprocessing for you.


## Installation

### Via TabTune (recommended if you want multiple tabular FMs)

```bash
pip install tabtune
```

This installs TabTune and its built-in OrionMSP integration; no separate orion-msp install is required.


### From the OrionMSP source
#### Option 1: From the local clone

```bash
cd orion-msp
pip install -e .
```

#### Option 2: From the Git Remote

```bash
pip install git+https://github.com/Lexsi-Labs/Orion-MSP.git
```


## Citation
If you use Orion-MSP, please cite our [paper](https://arxiv.org/abs/2511.02818):

```bibtex
@article{bouadi25orionmsp,
  title={Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning},
  author={Mohamed Bouadi and Pratinav Seth and Aditya Tanna and Vinay Kumar Sankarapu},
  year={2025}
  eprint={2511.02818},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2511.02818}, 
}
```