File size: 3,471 Bytes
a2249ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
license: mit
language:
- en
metrics:
- accuracy
- bleu
pipeline_tag: table-question-answering
tags:
- code
---
# TableDART Gating Network Checkpoint

This repository provides the trained gating network checkpoint for **TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding**.

TableDART is a training-efficient framework that dynamically routes each table-query pair through the most appropriate reasoning path: Text-only, Image-only, or Fusion, while keeping all pretrained expert models **frozen**.

---

## πŸ” Overview

Modeling semantic and structural information from tabular data remains a core challenge for effective table understanding.
Existing LLM-based approaches face several limitations:

- Table-as-Text methods flatten tables into text sequences, losing structural cues.
- Table-as-Image methods preserve layout but struggle with precise semantics.
- Static multimodal methods process all modalities for every query, introducing redundancy and potential cross-modal conflicts.
- Most approaches require expensive fine-tuning of large LLMs or multimodal models.

**Our Solution: TableDART** addresses these limitations through:

- Reusing pretrained single-modality expert models (kept frozen, plug-and-play)
- Learning only a lightweight 2.59M-parameter MLP gating network
- Dynamically selecting the optimal path for each table-query pair (instance-level)
- Introducing an LLM agent that mediates cross-modal knowledge integration when needed

This design avoids full LLM/MLLM fine-tuning, reduces computational redundancy, and maintains strong efficiency-performance trade-offs.

---

## πŸš€ Performance

Across 7 benchmarks, TableDART:

- Achieves state-of-the-art results on 4/7 benchmarks among open-source models
- Outperforms the strongest baseline by +4.02% accuracy on average
- Maintains significant computational efficiency gains


## πŸ“¦ What This Checkpoint Contains

This Hugging Face model includes:

- The trained MLP gating network checkpoint

⚠️ Note: This checkpoint does not include the pretrained text or image expert models. Please load those separately according to the official repository instructions.

---

## πŸ›  Code and Usage

Full training scripts, inference pipelines, and reproduction details are available at our Github Repository: https://github.com/xiaobo-xing/TableDART

---

## πŸ“„ Paper

ICLR 2026 OpenReview Version:  
https://openreview.net/forum?id=4aZTiLH3fm

ArXiv Version:  
https://arxiv.org/abs/2509.14671

---

## πŸ“š Citation

If you find TableDART helpful, please cite our paper and consider starring the repository.

### ICLR 2026 Version

```bibtex
@inproceedings{xing2026tabledart,
    title={Table{DART}: Dynamic Adaptive Multi-Modal Routing for Table Understanding},
    author={Xiaobo Xing and Wei Yuan and Tong Chen and Quoc Viet Hung Nguyen and Xiangliang Zhang and Hongzhi Yin},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=4aZTiLH3fm}
}
```

### ArXiv Version
```bibtex
@misc{xing2025tabledartdynamicadaptivemultimodal,
    title={TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding}, 
    author={Xiaobo Xing and Wei Yuan and Tong Chen and Quoc Viet Hung Nguyen and Xiangliang Zhang and Hongzhi Yin},
    year={2025},
    eprint={2509.14671},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2509.14671}
}
```