|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- zh |
|
|
- en |
|
|
base_model: |
|
|
- Qwen/Qwen3-8B |
|
|
--- |
|
|
|
|
|
# TableGPT-R1 |
|
|
|
|
|
## Model details |
|
|
|
|
|
We developed and released TableGPT-R1, a specialized large language model optimized for complex tabular reasoning and data analysis. Unlike traditional models that rely solely on Supervised Fine-Tuning (SFT), TableGPT-R1 is trained using a systematic Reinforcement Learning (RL) framework. It is designed to bridge the gap between natural language understanding and professional data science requirements, such as multi-step logic, robust code execution, and autonomous environment interaction. |
|
|
|
|
|
**Model Developers** |
|
|
|
|
|
Zhejiang University & Institute of Computing Innovation, Zhejiang University |
|
|
|
|
|
**Key Technical Breakthroughs** |
|
|
|
|
|
* **Autonomous Agentic Reasoning**: The model is trained to "think" before acting. It generates a visible reasoning chain within `<think>` tags, plans Python-based data manipulations, and refines its strategy based on environment feedback (Code Interpreter). |
|
|
* **Unified Reward System**: We introduced a hybrid reward mechanism that combines rule-based verification (for deterministic SQL/Code tasks) with a **Criteria-Injected Reward Model** (for open-ended analytical questions), ensuring both accuracy and interpretability. |
|
|
* **GRPO++ Framework**: Utilizing an enhanced version of Group Relative Policy Optimization, the model optimizes its decision-making process across diverse table structures while maintaining its general-purpose reasoning capabilities. |
|
|
* **Cold-Start Data Engineering**: Bootstrapped with high-quality, long-chain reasoning trajectories, allowing the model to handle extreme table heterogeneity and complex multi-table joins. |
|
|
|
|
|
**Input** |
|
|
|
|
|
TableGPT-R1 accepts both natural language instructions and tabular data. It uniquely supports **table-path inputs**, enabling the model to autonomously load and retrieve information from files using a built-in code interpreter. |
|
|
|
|
|
**Output** |
|
|
|
|
|
TableGPT-R1 supports two output behaviors depending on the task: |
|
|
|
|
|
For tasks requiring logical deduction, metadata explanation, or semantic understanding without external execution. |
|
|
* **Format**: `<think> ... </think> [Answer]` |
|
|
* **Behavior**: The model performs internal "Chain-of-Thought" to verify its logic before presenting the final result. |
|
|
|
|
|
For data-intensive tasks requiring precise calculation, visualization, or large-scale data processing. |
|
|
* **Format**: |
|
|
1. **Plan**: `<think> ... </think>` (Analyze the goal and plan the code) |
|
|
2. **Act**: `<tool_call> ... </tool_call>` (Generate Python/SQL code) |
|
|
3. **Observe**: `<observation> ... </observation>` (Receive environment feedback) |
|
|
4. **Finalize**: `<answer> ... </answer>` (Summarize results) |
|
|
|
|
|
* **Behavior**: The model operates as an autonomous agent, reacting to execution errors or intermediate data results to ensure accuracy. |
|
|
|
|
|
Additionally, to enforce model thinking, the default chat template automatically includes `<think>`. Therefore, it is normal for the model's output to contain only `</think>` without an explicit opening `<think>` tag. |
|
|
|
|
|
**Language** |
|
|
|
|
|
Our model places a strong emphasis on Chinese corpora, and currently, queries in other languages may have limited support. |
|
|
|
|
|
**Model Architecture** |
|
|
|
|
|
TableGPT-R1 is built upon the **Qwen3-8B** transformer architecture, significantly enhanced for long-context tabular understanding and agentic workflows. |
|
|
|
|
|
* **Base Backbone**: Qwen3-8B (Dense Transformer). |
|
|
* **Context Window**: 128K tokens, optimized for processing large-scale table schemas, extensive metadata, and long execution logs. |
|
|
* **Specialized Tokenizer**: Enhanced to handle structural delimiters, whitespace in tables, and code-specific syntax (Python/SQL) more efficiently. |
|
|
* **Agentic Loop Integration**: The architecture is designed to support a seamless **"Think-Act-Observe"** cycle. It treats the environment's feedback (Code Interpreter output) as a first-class sequence input, allowing for real-time error correction and iterative reasoning. |
|
|
* **Instruction Following**: Optimized via RL to strictly adhere to formatting constraints, distinguishing between internal thought process and external tool calls. |
|
|
|
|
|
**Status** |
|
|
|
|
|
This model is static, trained on an offline dataset. Future versions may be released to enhance its performance on specialized tasks. |
|
|
|
|
|
**QuickStart** |
|
|
|
|
|
This code snippet demonstrates how to build a prompt with table information, and shows how to load the tokenizer, load the model, and generate content. |
|
|
|
|
|
> Note that you need `transformers>=4.51.0` to use `TableGPT-R1`: |
|
|
> ```sh |
|
|
> pip install transformers>=4.51.0 |
|
|
> ``` |
|
|
|
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
# Using pandas to read some structured data |
|
|
import pandas as pd |
|
|
from io import StringIO |
|
|
|
|
|
# single table |
|
|
EXAMPLE_CSV_CONTENT = """ |
|
|
"Loss","Date","Score","Opponent","Record","Attendance" |
|
|
"Hampton (14β12)","September 25","8β7","Padres","67β84","31,193" |
|
|
"Speier (5β3)","September 26","3β1","Padres","67β85","30,711" |
|
|
"Elarton (4β9)","September 22","3β1","@ Expos","65β83","9,707" |
|
|
"Lundquist (0β1)","September 24","15β11","Padres","67β83","30,774" |
|
|
"Hampton (13β11)","September 6","9β5","Dodgers","61β78","31,407" |
|
|
""" |
|
|
|
|
|
csv_file = StringIO(EXAMPLE_CSV_CONTENT) |
|
|
df = pd.read_csv(csv_file) |
|
|
|
|
|
model_name = "tablegpt/TableGPT-R1" |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, torch_dtype="auto", device_map="auto" |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
|
|
example_prompt_template = """Given access to several pandas dataframes, write the Python code to answer the user's question. |
|
|
|
|
|
/* |
|
|
"{var_name}.head(5).to_string(index=False)" as follows: |
|
|
{df_info} |
|
|
*/ |
|
|
|
|
|
Question: {user_question} |
|
|
""" |
|
|
question = "εͺδΊζ―θ΅ηζη»©θΎΎε°δΊ40θ40θ΄οΌ" |
|
|
|
|
|
prompt = example_prompt_template.format( |
|
|
var_name="df", |
|
|
df_info=df.head(5).to_string(index=False), |
|
|
user_question=question, |
|
|
) |
|
|
|
|
|
messages = [ |
|
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
|
{"role": "user", "content": prompt}, |
|
|
] |
|
|
text = tokenizer.apply_chat_template( |
|
|
messages, tokenize=False, add_generation_prompt=True |
|
|
) |
|
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
|
|
|
|
|
generated_ids = model.generate(**model_inputs, max_new_tokens=8192) |
|
|
generated_ids = [ |
|
|
output_ids[len(input_ids) :] |
|
|
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
|
] |
|
|
|
|
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
``` |
|
|
|
|
|
**Deployment** |
|
|
|
|
|
For deployment, you can use `sglang>=0.5.2` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint: |
|
|
- SGLang: |
|
|
```shell |
|
|
python -m sglang.launch_server --model-path tablegpt/TableGPT-R1 --served-model-name TableGPT-R1 --reasoning-parser qwen3 |
|
|
``` |
|
|
- vLLM: |
|
|
```shell |
|
|
vllm serve tablegpt/TableGPT-R1 --served-model-name TableGPT-R1 --enable-reasoning --reasoning-parser deepseek_r1 |
|
|
``` |
|
|
|
|
|
Then you can access the Chat API by: |
|
|
|
|
|
```bash |
|
|
curl http://localhost:xxx/v1/chat/completions \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{ |
|
|
"model": "TableGPT-R1", |
|
|
"messages": [ |
|
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
|
{"role": "user", "content": "xxxxx?"} |
|
|
] |
|
|
}' |
|
|
|
|
|
``` |
|
|
|
|
|
**License** |
|
|
|
|
|
TableGPT-R1 is under apache-2.0 license. |
|
|
|
|
|
**Research Paper** |
|
|
|
|
|
TableGPT-R1 is introduced and validated in the paper "[TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning](https://arxiv.org/xxxx)" available on arXiv. |
|
|
|
|
|
**Where to send questions or comments about the model** |
|
|
|
|
|
Inquiries and feedback are welcome at [j.zhao@zju.edu.cn](mailto:j.zhao@zju.edu.cn). |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
Performance comparison grouped by model scale. Left Group: Models with comparable |
|
|
parameters to TableGPT-R1. Right Group: Significantly larger models and proprietary closed-source |
|
|
models. Bold indicates the best result within each group. Gray background highlights TableGPT-R1. Abbreviations: Q3: Qwen3; QwQ: QwQ-32B; DS-V3: DeepSeek-V3; Q-Plus: Qwen-Plus; |
|
|
T-LLM: TableLLM; Llama: Llama-3.1-8B; TGPT2: TableGPT2-7B; TGPT-R1: TableGPT-R1-8B; |
|
|
FC: Fact Checking; NR: Numerical Reasoning; SC: Structure Comprehending; DA: Data Analysis; |
|
|
CG: Chart Generation. |
|
|
|
|
|
| Benchmark | Task | Met. | Q3-8B | T-LLM | Llama | TGPT2 | **TGPT-R1 (8B)** | Q3-14B | Q3-32B | Q3-70B | QwQ | GPT-4o | DS-V3 | Q-Plus | |
|
|
| :--- | :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | |
|
|
| **Internal Bench** | Table Info | Acc | 69.20 | 0.97 | 37.26 | - | **82.00** | 66.10 | 72.58 | 51.10 | 69.68 | 67.26 | 66.00 | **76.90** | |
|
|
| | Table Path | Acc | 73.90 | 0.65 | 31.77 | - | **85.00** | 74.70 | 78.55 | 60.50 | 75.00 | - | 72.90 | **81.50** | |
|
|
| **NL2SQL** | Spider | EX | 86.07 | 65.30 | 73.59 | 74.38 | **86.73** | 87.61 | 87.80 | 61.71 | 85.33 | 87.98 | 88.54 | **89.19** | |
|
|
| | BIRD | EX | 61.67 | 30.64 | 40.03 | 49.28 | **63.04** | 61.80 | 63.04 | 53.91 | 54.30 | 65.25 | 65.65 | **68.32** | |
|
|
| **Holistic Table** | TableBench DP | Rge | 42.10 | 3.63 | 18.04 | 42.10 | **47.58** | 47.41 | **52.18** | 48.61 | 49.33 | 40.91 | 36.56 | 31.01 | |
|
|
| **Evaluation** | PoT | Rge | 28.01 | 0.00 | 6.73 | **39.80** | 34.86 | 36.61 | 37.78 | 27.72 | 40.03 | **51.96** | 33.05 | 41.79 | |
|
|
| | SCoT | Rge | 41.86 | 1.99 | 21.94 | 40.70 | **48.68** | 47.36 | 47.47 | 45.68 | 44.84 | 41.43 | **50.11** | 44.06 | |
|
|
| | TCoT | Rge | 41.71 | 3.18 | 15.26 | 46.19 | **48.16** | 46.07 | 51.74 | 47.63 | 48.83 | 45.71 | **54.28** | 52.07 | |
|
|
| **RealHitBench** | FC | EM | 58.83 | 33.44 | 30.32 | 43.06 | **62.85** | 62.36 | **65.00** | 60.23 | 28.95 | 55.22 | **65.08** | 56.53 | |
|
|
| | NR | EM | 39.43 | 13.51 | 18.25 | 31.75 | **44.91** | 45.62 | 47.45 | 42.82 | 42.61 | 48.66 | **53.89** | 49.88 | |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find our work helpful, please cite us by |
|
|
|
|
|
```bibtex |
|
|
|
|
|
``` |