Orion-MSP / README.md

Update README.md

92d7821 verified 2 months ago

19.2 kB

	---
	license: mit
	---
	<div align="center">
	<img src="logo.png" alt="Orion-MSP Logo" width="700"/>
	</div>

	<div align="center">
	<a href="https://www.lexsi.ai">
	<img src="https://img.shields.io/badge/Lexsi-Homepage-FF6B6B?style=for-the-badge" alt="Homepage"/>
	</a>
	<a href="https://huggingface.co/Lexsi">
	<img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Lexsi AI-FFD21E?style=for-the-badge" alt="Hugging Face"/>
	</a>
	<a href="https://discord.gg/dSB62Q7A">
	<img src="https://img.shields.io/badge/Discord-Join-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"/>
	</a>
	<a href="https://github.com/Lexsi-Labs/Orion-MSP">
	<img src="https://img.shields.io/badge/GitHub-Orion%20MSP-181717?style=for-the-badge&logo=github&logoColor=white" alt="Orion MSP GitHub"/>
	</a>
	<!-- TabTune repo -->
	<a href="https://github.com/Lexsi-Labs/TabTune">
	<img src="https://img.shields.io/badge/GitHub-TabTune-181717?style=for-the-badge&logo=github&logoColor=white" alt="TabTune GitHub"/>
	</a>
	</div>


	# Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

	Orion-MSP is a tabular foundation model for in-context learning. It uses multi-scale sparse attention and Perceiver-style memory to process tabular data at multiple granularities, capturing both local feature interactions and global dataset-level patterns.

	OrionMSP can be used either directly via its own Python package or through [TabTune](https://github.com/Lexsi-Labs/TabTune), which provides a unified interface over several tabular foundation models.

	## Key Features

	- Multi-Scale Sparse Attention: Processes features at three levels (scales 1, 4, 16) using windowed, global, and random attention patterns, reducing quadratic complexity to near-linear.
	- Hierarchical Feature Understanding: Captures patterns from individual cells to feature groups through scale-aware attention.
	- Perceiver-Style Memory: Cross-component memory that compresses dataset information for efficient processing across samples
	- Memory-Efficient: Block-sparse masking enables efficient processing of large tabular datasets
	- Scikit-learn Compatible: Drop-in replacement with .fit() and .predict() methods


	## Architecture
	Orion-MSP consists of four main components:
	- Column-wise Embedding: Distribution-aware feature embeddings using Induced Set Attention Blocks (ISAB)
	- Multi-Scale Row Interaction: Sparse attention with windowed, global, and random patterns across multiple scales
	- Cross-Component Memory: Perceiver-style memory for efficient dataset-level context
	- Dataset-wise ICL: Enhanced predictor leveraging enriched representations for few-shot tabular classification

	## Performance
	<!-- Orion-MSP: Benchmark Summary Table -->
	<style>
	/* Container adds horizontal scroll on small screens */
	.hf-table-wrap { overflow-x: auto; margin: 1rem 0; }
	table.bench { border-collapse: collapse; width: 100%; font-size: 0.9rem; }
	table.bench caption { font-weight: 600; text-align: left; margin-bottom: .5rem; }
	table.bench th, table.bench td { border-bottom: 1px solid #e5e7eb; padding: 6px 8px; }
	table.bench thead th { border-bottom: 2px solid #d1d5db; background: #fafafa; }
	table.bench th.sticky { position: sticky; top: 0; z-index: 1; }
	/* Alignment */
	table.bench th, table.bench td { text-align: center; }
	table.bench th:first-child, table.bench td:first-child { text-align: left; white-space: nowrap; }
	/* Emphasis helpers */
	.first { font-weight: 700; } /* 1st place (bold) */
	.second { text-decoration: underline; } /* 2nd place (underline) */
	.first.second { font-weight: 700; text-decoration: underline; } /* bold+underline */
	</style>

	<div class="hf-table-wrap">
	<table class="bench">
	<caption>Performance comparison across three benchmark suites—TALENT, OpenML-CC18, and TabZilla. Ranks are mean ranks based on accuracy (lower is better). Metrics: ACC = Accuracy, F1 = Weighted F1. <span class="first">1st</span>; <span class="second">2nd</span>.</caption>
	<thead>
	<tr>
	<th class="sticky" rowspan="2">Models</th>
	<th class="sticky" colspan="1">All</th>
	<th class="sticky" colspan="3">TALENT</th>
	<th class="sticky" colspan="3">OpenML-CC18</th>
	<th class="sticky" colspan="3">TabZilla</th>
	</tr>
	<tr>
	<th class="sticky">Rank</th>
	<th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
	<th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
	<th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>XGBoost</td>
	<td>6.70</td>
	<td>6.02</td><td>0.8403</td><td>0.8360</td>
	<td>5.89</td><td>0.8558</td><td>0.8537</td>
	<td>6.07</td><td>0.8612</td><td>0.8326</td>
	</tr>
	<tr>
	<td>CatBoost</td>
	<td>6.43</td>
	<td>5.57</td><td>0.8336</td><td>0.8259</td>
	<td>6.25</td><td>0.8588</td><td>0.8520</td>
	<td>7.13</td><td>0.8579</td><td>0.8384</td>
	</tr>
	<tr>
	<td>Random Forest</td>
	<td>7.38</td>
	<td>6.15</td><td>0.8285</td><td>0.8209</td>
	<td>6.36</td><td>0.8547</td><td>0.8497</td>
	<td>8.42</td><td>0.8358</td><td>0.8399</td>
	</tr>
	<tr>
	<td>LightGBM</td>
	<td>6.78</td>
	<td>6.11</td><td>0.8331</td><td>0.8245</td>
	<td>6.18</td><td>0.8581</td><td>0.8493</td>
	<td>5.25</td><td>0.8618</td><td>0.8211</td>
	</tr>
	<tr>
	<td>TabICL</td>
	<td>4.96</td>
	<td>4.09</td><td class="second">0.8471</td><td class="second">0.8379</td>
	<td>4.69</td><td>0.8667</td><td>0.8623</td>
	<td>5.89</td><td>0.8734</td><td>0.8698</td>
	</tr>
	<tr>
	<td>OrionBiX</td>
	<td>5.37</td>
	<td>4.59</td><td>0.8346</td><td>0.8260</td>
	<td>4.98</td><td>0.8653</td><td>0.8596</td>
	<td>4.89</td><td>0.8728</td><td>0.8628</td>
	</tr>
	<tr>
	<td><span class="first second">OrionMSP</span></td>
	<td>3.58</td>
	<td class="first second">3.26</td><td>0.8461</td><td>0.8360</td>
	<td class="first second">4.12</td><td class="first second">0.8722</td><td class="first second">0.8676</td>
	<td class="first second">3.84</td><td class="first second">0.8821</td><td class="first second">0.8786</td>
	</tr>
	<tr>
	<td><span class="second">TabPFN</span></td>
	<td>4.61</td>
	<td class="second">3.72</td><td class="first second">0.8514</td><td class="first second">0.8412</td>
	<td>4.76</td><td class="second">0.8714</td><td class="second">0.8663</td>
	<td>4.86</td><td>0.8752</td><td>0.8716</td>
	</tr>
	<tr>
	<td>Mitra</td>
	<td>11.77</td>
	<td>10.38</td><td>0.3921</td><td>0.2868</td>
	<td>10.52</td><td>0.3614</td><td>0.2522</td>
	<td>11.21</td><td>0.3152</td><td>0.1830</td>
	</tr>
	<tr>
	<td>ContextTab</td>
	<td>9.70</td>
	<td>9.84</td><td>0.5474</td><td>0.4596</td>
	<td>6.28</td><td>0.8639</td><td>0.8581</td>
	<td>7.13</td><td>0.8389</td><td>0.8334</td>
	</tr>
	<tr>
	<td>TabDPT</td>
	<td>5.42</td>
	<td>5.19</td><td>0.8408</td><td>0.8318</td>
	<td class="second">4.64</td><td>0.8672</td><td>0.8625</td>
	<td class="second">3.94</td><td class="second">0.8814</td><td class="second">0.8775</td>
	</tr>
	</tbody>
	</table>
	</div>

	Orion-MSP is the most consistent top performer across all three benchmarks, achieving the best overall rank.
	- On TALENT, it ranks 1 overall, while TabPFN edges the highest ACC/F1 by a hair.
	- On OpenML-CC18, Orion-MSP attains the top ACC/F1 (0.8722/0.8676), narrowly ahead of TabPFN and TabDPT.
	- On TabZilla, it leads with the highest ACC/F1 and the best rank.
	- Classical baselines (XGBoost/LightGBM/CatBoost/RF) trail noticeably, highlighting Orion-MSP’s robustness across diverse tabular tasks.


	<!-- Orion-MSP: Size Analysis Table -->
	<style>
	/* Container adds horizontal scroll on small screens */
	.hf-table-wrap { overflow-x: auto; margin: 1rem 0; }
	table.bench { border-collapse: collapse; width: 100%; font-size: 0.9rem; }
	table.bench caption { font-weight: 600; text-align: left; margin-bottom: .5rem; }
	table.bench th, table.bench td { border-bottom: 1px solid #e5e7eb; padding: 6px 8px; }
	table.bench thead th { border-bottom: 2px solid #d1d5db; background: #fafafa; }
	table.bench th.sticky { position: sticky; top: 0; z-index: 1; }
	/* Alignment */
	table.bench th, table.bench td { text-align: center; }
	table.bench th:first-child, table.bench td:first-child { text-align: left; white-space: nowrap; }
	/* Emphasis helpers */
	.first { font-weight: 700; } /* 1st place (bold) */
	.second { text-decoration: underline; } /* 2nd place (underline) */
	.first.second { font-weight: 700; text-decoration: underline; } /* bold+underline */
	</style>

	<div class="hf-table-wrap">
	<table class="bench">
	<caption>
	Performance variation by dataset size across all benchmark suites. Rank = mean rank by accuracy (lower is better).
	ACC = Accuracy; F1 = Weighted F1. Size buckets: Small (<1K), Medium (1K–10K), Large (>10K).
	</caption>
	<thead>
	<tr>
	<th class="sticky" rowspan="2">Models</th>
	<th class="sticky" colspan="3">Small (<1K)</th>
	<th class="sticky" colspan="3">Medium (1K–10K)</th>
	<th class="sticky" colspan="3">Large (>10K)</th>
	</tr>
	<tr>
	<th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
	<th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
	<th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>XGBoost</td>
	<td>7.70</td><td>0.8168</td><td>0.7964</td>
	<td>6.88</td><td>0.8363</td><td>0.8314</td>
	<td>5.41</td><td class="first">0.8969</td><td class="first">0.8920</td>
	</tr>
	<tr>
	<td>CatBoost</td>
	<td>7.88</td><td>0.8124</td><td>0.7935</td>
	<td>6.47</td><td>0.8340</td><td>0.8264</td>
	<td>5.48</td><td>0.8797</td><td>0.8733</td>
	</tr>
	<tr>
	<td>Random Forest</td>
	<td>8.55</td><td>0.7988</td><td>0.8187</td>
	<td>7.16</td><td>0.8285</td><td>0.8221</td>
	<td>7.30</td><td>0.8694</td><td>0.8628</td>
	</tr>
	<tr>
	<td>LightGBM</td>
	<td>7.80</td><td>0.8143</td><td>0.7789</td>
	<td>6.94</td><td>0.8314</td><td>0.8226</td>
	<td>5.63</td><td>0.8827</td><td>0.8764</td>
	</tr>
	<tr>
	<td>TabICL</td>
	<td>6.04</td><td>0.8301</td><td class="first">0.8338</td>
	<td>4.77</td><td>0.8486</td><td>0.8398</td>
	<td>4.61</td><td>0.8802</td><td>0.8743</td>
	</tr>
	<tr>
	<td>OrionBiX</td>
	<td>6.32</td><td class="second">0.8330</td><td>0.8150</td>
	<td>5.48</td><td>0.8348</td><td>0.8260</td>
	<td class="second">4.42</td><td>0.8729</td><td>0.8670</td>
	</tr>
	<tr>
	<td>OrionMSP</td>
	<td class="second">5.93</td><td>0.8232</td><td>0.8194</td>
	<td class="first">3.70</td><td class="second">0.8494</td><td class="second">0.8402</td>
	<td class="first">3.04</td><td class="second">0.8843</td><td class="second">0.8768</td>
	</tr>
	<tr>
	<td>TabPFN</td>
	<td>6.50</td><td>0.8325</td><td>0.8131</td>
	<td class="second">3.81</td><td class="first">0.8557</td><td class="first">0.8462</td>
	<td>5.73</td><td>0.8783</td><td>0.8713</td>
	</tr>
	<tr>
	<td>Mitra</td>
	<td>13.88</td><td>0.4334</td><td>0.3236</td>
	<td>11.59</td><td>0.3600</td><td>0.2553</td>
	<td>11.11</td><td>0.3837</td><td>0.2754</td>
	</tr>
	<tr>
	<td>ContextTab</td>
	<td>9.60</td><td>0.7578</td><td>0.7363</td>
	<td>9.52</td><td>0.6210</td><td>0.5566</td>
	<td>10.22</td><td>0.6388</td><td>0.5638</td>
	</tr>
	<tr>
	<td>TabDPT</td>
	<td class="first">5.48</td><td class="first">0.8333</td><td class="second">0.8271</td>
	<td>5.40</td><td>0.8424</td><td>0.8339</td>
	<td>5.26</td><td>0.8831</td><td>0.8765</td>
	</tr>
	</tbody>
	</table>
	</div>

	OrionMSP is the most consistent top-ranked model as data grows (especially Medium/Large), while TabPFN peaks on Medium and GBDTs
	(e.g., XGBoost) catch up in raw ACC/F1 on Large.

	<!-- Orion-MSP: Width (Feature Dimensionality) Analysis Table -->
	<style>
	/* Container adds horizontal scroll on small screens */
	.hf-table-wrap { overflow-x: auto; margin: 1rem 0; }
	table.bench { border-collapse: collapse; width: 100%; font-size: 0.9rem; }
	table.bench caption { font-weight: 600; text-align: left; margin-bottom: .5rem; }
	table.bench th, table.bench td { border-bottom: 1px solid #e5e7eb; padding: 6px 8px; }
	table.bench thead th { border-bottom: 2px solid #d1d5db; background: #fafafa; }
	table.bench th.sticky { position: sticky; top: 0; z-index: 1; }
	/* Alignment */
	table.bench th, table.bench td { text-align: center; }
	table.bench th:first-child, table.bench td:first-child { text-align: left; white-space: nowrap; }
	/* Emphasis helpers */
	.first { font-weight: 700; } /* 1st place (bold) */
	.second { text-decoration: underline; } /* 2nd place (underline) */
	.first.second { font-weight: 700; text-decoration: underline; } /* bold+underline */
	</style>

	<div class="hf-table-wrap">
	<table class="bench">
	<caption>
	Performance vs. feature dimensionality. Rank = mean accuracy rank (lower is better). ACC = Accuracy; F1 = Weighted F1. Groups: Narrow (<10), Medium (10–100), Wide (>100).
	<span class="first">1st</span> ; <span class="second">2nd</span> within each group.
	</caption>
	<thead>
	<tr>
	<th class="sticky" rowspan="2">Models</th>
	<th class="sticky" colspan="3">Narrow (<10)</th>
	<th class="sticky" colspan="3">Medium (10–100)</th>
	<th class="sticky" colspan="3">Wide (>100)</th>
	</tr>
	<tr>
	<th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
	<th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
	<th class="sticky">Rank</th><th class="sticky">ACC</th><th class="sticky">F1</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>XGBoost</td>
	<td>6.77</td><td>0.8222</td><td>0.8159</td>
	<td>6.90</td><td>0.8482</td><td>0.8410</td>
	<td class="first">4.79</td><td>0.9140</td><td>0.9039</td>
	</tr>
	<tr>
	<td>CatBoost</td>
	<td>5.63</td><td>0.8145</td><td>0.8067</td>
	<td>6.88</td><td>0.8441</td><td>0.8344</td>
	<td class="second">5.50</td><td class="first">0.9157</td><td class="second">0.9084</td>
	</tr>
	<tr>
	<td>Random Forest</td>
	<td>7.15</td><td>0.8005</td><td>0.7044</td>
	<td>7.44</td><td>0.8410</td><td>0.8235</td>
	<td>7.52</td><td>0.9034</td><td>0.8936</td>
	</tr>
	<tr>
	<td>LightGBM</td>
	<td>6.15</td><td>0.8128</td><td>0.7907</td>
	<td>6.92</td><td>0.8458</td><td>0.8326</td>
	<td>7.47</td><td>0.8999</td><td>0.8908</td>
	</tr>
	<tr>
	<td>TabICL</td>
	<td>5.14</td><td>0.8208</td><td>0.8119</td>
	<td>4.61</td><td class="second">0.8627</td><td class="second">0.8549</td>
	<td>6.46</td><td>0.9101</td><td>0.8936</td>
	</tr>
	<tr>
	<td>OrionBiX</td>
	<td class="second">4.64</td><td>0.8112</td><td>0.8043</td>
	<td>5.46</td><td>0.8510</td><td>0.8417</td>
	<td>6.73</td><td>0.8859</td><td>0.8849</td>
	</tr>
	<tr>
	<td>OrionMSP</td>
	<td class="first">3.76</td><td class="first">0.8394</td><td class="first">0.8314</td>
	<td class="second">4.09</td><td>0.8572</td><td>0.8478</td>
	<td>5.69</td><td>0.8860</td><td>0.8837</td>
	</tr>
	<tr>
	<td>TabPFN</td>
	<td>5.30</td><td>0.8187</td><td>0.8092</td>
	<td class="first">4.07</td><td class="first">0.8676</td><td class="first">0.8589</td>
	<td>6.141</td><td class="second">0.9129</td><td class="first">0.9111</td>
	</tr>
	<tr>
	<td>Mitra</td>
	<td>11.25</td><td>0.3737</td><td>0.2683</td>
	<td>11.84</td><td>0.3886</td><td>0.2781</td>
	<td>13.03</td><td>0.2521</td><td>0.1497</td>
	</tr>
	<tr>
	<td>ContextTab</td>
	<td>9.52</td><td>0.6391</td><td>0.5719</td>
	<td>9.59</td><td>0.6480</td><td>0.5843</td>
	<td>10.97</td><td>0.6017</td><td>0.5651</td>
	</tr>
	<tr>
	<td>TabDPT</td>
	<td>4.66</td><td class="second">0.8262</td><td class="second">0.8189</td>
	<td>5.45</td><td>0.8566</td><td>0.8483</td>
	<td>7.23</td><td>0.8845</td><td>0.8820</td>
	</tr>
	</tbody>
	</table>
	</div>

	OrionMSP excels on narrow and stays strong on medium width, while TabPFN dominates medium-width features and GBDTs (XGBoost/CatBoost)
	shine on wide feature spaces.

	## Usage
	### Direct (OrionMSP Python package)

	```python
	from orion_msp.sklearn import OrionMSPClassifier

	# Initialize and use
	clf = OrionMSPClassifier()
	clf.fit(X_train, y_train)
	predictions = clf.predict(X_test)
	```

	This code will automatically download the pre-trained model from Hugging Face and use a GPU if available.

	### Via TabTune (unified TFM library)

	```python
	from tabtune import TabularPipeline

	pipeline = TabularPipeline(
	model_name="OrionMSP", # use OrionMSP through TabTune
	tuning_strategy="inference", # zero-shot / in-context mode
	tuning_params={"device": "cuda"} # or "cpu"
	)

	pipeline.fit(X_train, y_train)
	predictions = pipeline.predict(X_test)
	```

	When used through TabTune, the OrionMSP weights are automatically downloaded from this Hugging Face repository on first use, and TabTune handles model-aware preprocessing for you.


	## Installation

	### Via TabTune (recommended if you want multiple tabular FMs)

	```bash
	pip install tabtune
	```

	This installs TabTune and its built-in OrionMSP integration; no separate orion-msp install is required.


	### From the OrionMSP source
	#### Option 1: From the local clone

	```bash
	cd orion-msp
	pip install -e .
	```

	#### Option 2: From the Git Remote

	```bash
	pip install git+https://github.com/Lexsi-Labs/Orion-MSP.git
	```


	## Citation
	If you use Orion-MSP, please cite our [paper](https://arxiv.org/abs/2511.02818):

	```bibtex
	@article{bouadi25orionmsp,
	title={Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning},
	author={Mohamed Bouadi and Pratinav Seth and Aditya Tanna and Vinay Kumar Sankarapu},
	year={2025}
	eprint={2511.02818},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2511.02818},
	}
	```