|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
- bleu |
|
|
pipeline_tag: table-question-answering |
|
|
tags: |
|
|
- code |
|
|
--- |
|
|
# TableDART Gating Network Checkpoint |
|
|
|
|
|
This repository provides the trained gating network checkpoint for **TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding**. |
|
|
|
|
|
TableDART is a training-efficient framework that dynamically routes each table-query pair through the most appropriate reasoning path: Text-only, Image-only, or Fusion, while keeping all pretrained expert models **frozen**. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Overview |
|
|
|
|
|
Modeling semantic and structural information from tabular data remains a core challenge for effective table understanding. |
|
|
Existing LLM-based approaches face several limitations: |
|
|
|
|
|
- Table-as-Text methods flatten tables into text sequences, losing structural cues. |
|
|
- Table-as-Image methods preserve layout but struggle with precise semantics. |
|
|
- Static multimodal methods process all modalities for every query, introducing redundancy and potential cross-modal conflicts. |
|
|
- Most approaches require expensive fine-tuning of large LLMs or multimodal models. |
|
|
|
|
|
**Our Solution: TableDART** addresses these limitations through: |
|
|
|
|
|
- Reusing pretrained single-modality expert models (kept frozen, plug-and-play) |
|
|
- Learning only a lightweight 2.59M-parameter MLP gating network |
|
|
- Dynamically selecting the optimal path for each table-query pair (instance-level) |
|
|
- Introducing an LLM agent that mediates cross-modal knowledge integration when needed |
|
|
|
|
|
This design avoids full LLM/MLLM fine-tuning, reduces computational redundancy, and maintains strong efficiency-performance trade-offs. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Performance |
|
|
|
|
|
Across 7 benchmarks, TableDART: |
|
|
|
|
|
- Achieves state-of-the-art results on 4/7 benchmarks among open-source models |
|
|
- Outperforms the strongest baseline by +4.02% accuracy on average |
|
|
- Maintains significant computational efficiency gains |
|
|
|
|
|
|
|
|
## π¦ What This Checkpoint Contains |
|
|
|
|
|
This Hugging Face model includes: |
|
|
|
|
|
- The trained MLP gating network checkpoint |
|
|
|
|
|
β οΈ Note: This checkpoint does not include the pretrained text or image expert models. Please load those separately according to the official repository instructions. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Code and Usage |
|
|
|
|
|
Full training scripts, inference pipelines, and reproduction details are available at our Github Repository: https://github.com/xiaobo-xing/TableDART |
|
|
|
|
|
--- |
|
|
|
|
|
## π Paper |
|
|
|
|
|
ICLR 2026 OpenReview Version: |
|
|
https://openreview.net/forum?id=4aZTiLH3fm |
|
|
|
|
|
ArXiv Version: |
|
|
https://arxiv.org/abs/2509.14671 |
|
|
|
|
|
--- |
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you find TableDART helpful, please cite our paper and consider starring the repository. |
|
|
|
|
|
### ICLR 2026 Version |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{xing2026tabledart, |
|
|
title={Table{DART}: Dynamic Adaptive Multi-Modal Routing for Table Understanding}, |
|
|
author={Xiaobo Xing and Wei Yuan and Tong Chen and Quoc Viet Hung Nguyen and Xiangliang Zhang and Hongzhi Yin}, |
|
|
booktitle={The Fourteenth International Conference on Learning Representations}, |
|
|
year={2026}, |
|
|
url={https://openreview.net/forum?id=4aZTiLH3fm} |
|
|
} |
|
|
``` |
|
|
|
|
|
### ArXiv Version |
|
|
```bibtex |
|
|
@misc{xing2025tabledartdynamicadaptivemultimodal, |
|
|
title={TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding}, |
|
|
author={Xiaobo Xing and Wei Yuan and Tong Chen and Quoc Viet Hung Nguyen and Xiangliang Zhang and Hongzhi Yin}, |
|
|
year={2025}, |
|
|
eprint={2509.14671}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2509.14671} |
|
|
} |
|
|
``` |