Improve model card: add pipeline tag and library name
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,10 +1,40 @@
|
|
| 1 |
---
|
| 2 |
-
license: cc-by-4.0
|
| 3 |
base_model:
|
| 4 |
- intfloat/e5-small-v2
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
We’re working on making **TabSTAR** available to everyone. In the meantime, you can find the research code to pretrain the model here:
|
| 9 |
|
| 10 |
[🔗 GitHub Repository: alanarazi7/TabSTAR](https://github.com/alanarazi7/TabSTAR)
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model:
|
| 3 |
- intfloat/e5-small-v2
|
| 4 |
+
license: cc-by-4.0
|
| 5 |
+
pipeline_tag: tabular-regression
|
| 6 |
---
|
| 7 |
|
| 8 |
+
# Paper title and link
|
| 9 |
+
|
| 10 |
+
The model was presented in the paper [TabSTAR: A Foundation Tabular Model With Semantically Target-Aware
|
| 11 |
+
Representations](https://arxiv.org/abs/2505.18125).
|
| 12 |
+
|
| 13 |
+
# Paper abstract
|
| 14 |
+
|
| 15 |
+
The abstract of the paper is the following:
|
| 16 |
+
|
| 17 |
+
While deep learning has achieved remarkable success across many domains, it
|
| 18 |
+
has historically underperformed on tabular learning tasks, which remain
|
| 19 |
+
dominated by gradient boosting decision trees (GBDTs). However, recent
|
| 20 |
+
advancements are paving the way for Tabular Foundation Models, which can
|
| 21 |
+
leverage real-world knowledge and generalize across diverse datasets,
|
| 22 |
+
particularly when the data contains free-text. Although incorporating language
|
| 23 |
+
model capabilities into tabular tasks has been explored, most existing methods
|
| 24 |
+
utilize static, target-agnostic textual representations, limiting their
|
| 25 |
+
effectiveness. We introduce TabSTAR: a Foundation Tabular Model with
|
| 26 |
+
Semantically Target-Aware Representations. TabSTAR is designed to enable
|
| 27 |
+
transfer learning on tabular data with textual features, with an architecture
|
| 28 |
+
free of dataset-specific parameters. It unfreezes a pretrained text encoder and
|
| 29 |
+
takes as input target tokens, which provide the model with the context needed
|
| 30 |
+
to learn task-specific embeddings. TabSTAR achieves state-of-the-art
|
| 31 |
+
performance for both medium- and large-sized datasets across known benchmarks
|
| 32 |
+
of classification tasks with text features, and its pretraining phase exhibits
|
| 33 |
+
scaling laws in the number of datasets, offering a pathway for further
|
| 34 |
+
performance improvements.
|
| 35 |
|
| 36 |
We’re working on making **TabSTAR** available to everyone. In the meantime, you can find the research code to pretrain the model here:
|
| 37 |
|
| 38 |
[🔗 GitHub Repository: alanarazi7/TabSTAR](https://github.com/alanarazi7/TabSTAR)
|
| 39 |
+
|
| 40 |
+
Project page: https://eilamshapira.com/TabSTAR/
|