File size: 9,886 Bytes

---
license: apache-2.0
language:
- zh
- en
base_model:
- Qwen/Qwen3-8B
---

# TableGPT-R1

## Model details

We developed and released TableGPT-R1, a specialized large language model optimized for complex tabular reasoning and data analysis. Unlike traditional models that rely solely on Supervised Fine-Tuning (SFT), TableGPT-R1 is trained using a systematic Reinforcement Learning (RL) framework. It is designed to bridge the gap between natural language understanding and professional data science requirements, such as multi-step logic, robust code execution, and autonomous environment interaction.

**Model Developers**  

Zhejiang University & Institute of Computing Innovation, Zhejiang University

**Key Technical Breakthroughs**

* **Autonomous Agentic Reasoning**: The model is trained to "think" before acting. It generates a visible reasoning chain within `<think>` tags, plans Python-based data manipulations, and refines its strategy based on environment feedback (Code Interpreter).
* **Unified Reward System**: We introduced a hybrid reward mechanism that combines rule-based verification (for deterministic SQL/Code tasks) with a **Criteria-Injected Reward Model** (for open-ended analytical questions), ensuring both accuracy and interpretability.
* **GRPO++ Framework**: Utilizing an enhanced version of Group Relative Policy Optimization, the model optimizes its decision-making process across diverse table structures while maintaining its general-purpose reasoning capabilities.
* **Cold-Start Data Engineering**: Bootstrapped with high-quality, long-chain reasoning trajectories, allowing the model to handle extreme table heterogeneity and complex multi-table joins.

**Input**

TableGPT-R1 accepts both natural language instructions and tabular data. It uniquely supports **table-path inputs**, enabling the model to autonomously load and retrieve information from files using a built-in code interpreter.

**Output** 

TableGPT-R1 supports two output behaviors depending on the task:

For tasks requiring logical deduction, metadata explanation, or semantic understanding without external execution.
* **Format**: `<think> ... </think> [Answer]`
* **Behavior**: The model performs internal "Chain-of-Thought" to verify its logic before presenting the final result.

For data-intensive tasks requiring precise calculation, visualization, or large-scale data processing.
* **Format**: 
  1.  **Plan**: `<think> ... </think>` (Analyze the goal and plan the code)
  2.  **Act**: `<tool_call> ... </tool_call>` (Generate Python/SQL code)
  3.  **Observe**: `<observation> ... </observation>` (Receive environment feedback)
  4.  **Finalize**: `<answer> ... </answer>` (Summarize results)

* **Behavior**: The model operates as an autonomous agent, reacting to execution errors or intermediate data results to ensure accuracy.

Additionally, to enforce model thinking, the default chat template automatically includes `<think>`. Therefore, it is normal for the model's output to contain only `</think>` without an explicit opening `<think>` tag.

**Language**  

Our model places a strong emphasis on Chinese corpora, and currently, queries in other languages may have limited support.

**Model Architecture** 

TableGPT-R1 is built upon the **Qwen3-8B** transformer architecture, significantly enhanced for long-context tabular understanding and agentic workflows.

* **Base Backbone**: Qwen3-8B (Dense Transformer).
* **Context Window**: 128K tokens, optimized for processing large-scale table schemas, extensive metadata, and long execution logs.
* **Specialized Tokenizer**: Enhanced to handle structural delimiters, whitespace in tables, and code-specific syntax (Python/SQL) more efficiently.
* **Agentic Loop Integration**: The architecture is designed to support a seamless **"Think-Act-Observe"** cycle. It treats the environment's feedback (Code Interpreter output) as a first-class sequence input, allowing for real-time error correction and iterative reasoning.
* **Instruction Following**: Optimized via RL to strictly adhere to formatting constraints, distinguishing between internal thought process and external tool calls.

**Status**  

This model is static, trained on an offline dataset. Future versions may be released to enhance its performance on specialized tasks.

**QuickStart**

This code snippet demonstrates how to build a prompt with table information, and shows how to load the tokenizer, load the model, and generate content.

> Note that you need `transformers>=4.51.0` to use `TableGPT-R1`:
> ```sh
> pip install transformers>=4.51.0
> ```


```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Using pandas to read some structured data
import pandas as pd
from io import StringIO

# single table
EXAMPLE_CSV_CONTENT = """
"Loss","Date","Score","Opponent","Record","Attendance"
"Hampton (14–12)","September 25","8–7","Padres","67–84","31,193"
"Speier (5–3)","September 26","3–1","Padres","67–85","30,711"
"Elarton (4–9)","September 22","3–1","@ Expos","65–83","9,707"
"Lundquist (0–1)","September 24","15–11","Padres","67–83","30,774"
"Hampton (13–11)","September 6","9–5","Dodgers","61–78","31,407"
"""

csv_file = StringIO(EXAMPLE_CSV_CONTENT)
df = pd.read_csv(csv_file)

model_name = "tablegpt/TableGPT-R1"

model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

example_prompt_template = """Given access to several pandas dataframes, write the Python code to answer the user's question.

/*
"{var_name}.head(5).to_string(index=False)" as follows:
{df_info}
*/

Question: {user_question}
"""
question = "哪些比赛的战绩达到了40胜40负？"

prompt = example_prompt_template.format(
    var_name="df",
    df_info=df.head(5).to_string(index=False),
    user_question=question,
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=8192)
generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

**Deployment**

For deployment, you can use `sglang>=0.5.2` or `vllm>=0.10.2` or to create an OpenAI-compatible API endpoint:
- SGLang:
    ```shell
    python -m sglang.launch_server --model-path tablegpt/TableGPT-R1 --port 8080 --served-model-name TableGPT-R1 --reasoning-parser qwen3
    ```
- vLLM:
    ```shell
    vllm serve tablegpt/TableGPT-R1 --port 8080 --served-model-name TableGPT-R1 --reasoning-parser deepseek_r1
    ```

Then you can access the Chat API by:

```bash
curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "TableGPT-R1",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Give me a short introduction to large language model."}
    ]
    }'

```

**License**  

TableGPT-R1 is under apache-2.0 license.

**Research Paper**  

TableGPT-R1 is introduced and validated in the paper "[TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning](https://arxiv.org/xxxx)" available on arXiv.

**Where to send questions or comments about the model**  

Inquiries and feedback are welcome at [j.zhao@zju.edu.cn](mailto:j.zhao@zju.edu.cn).

## Evaluation Results

 Performance comparison grouped by model scale. Left Group: Models with comparable
parameters to TableGPT-R1. Right Group: Significantly larger models and proprietary closed-source
models. Bold indicates the best result within each group. Gray background highlights TableGPT-R1. Abbreviations: Q3: Qwen3; QwQ: QwQ-32B; DS-V3: DeepSeek-V3; Q-Plus: Qwen-Plus;
T-LLM: TableLLM; Llama: Llama-3.1-8B; TGPT2: TableGPT2-7B; TGPT-R1: TableGPT-R1-8B;
FC: Fact Checking; NR: Numerical Reasoning; SC: Structure Comprehending; DA: Data Analysis;
CG: Chart Generation.

| Benchmark | Task | Met. | Q3-8B | T-LLM | Llama | TGPT2 | **TGPT-R1 (8B)** | Q3-14B | Q3-32B | Q3-70B | QwQ | GPT-4o | DS-V3 | Q-Plus |
| :--- | :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| **Internal Bench** | Table Info | Acc | 69.20 | 0.97 | 37.26 | - | **82.00** | 66.10 | 72.58 | 51.10 | 69.68 | 67.26 | 66.00 | **76.90** |
| | Table Path | Acc | 73.90 | 0.65 | 31.77 | - | **85.00** | 74.70 | 78.55 | 60.50 | 75.00 | - | 72.90 | **81.50** |
| **NL2SQL** | Spider | EX | 86.07 | 65.30 | 73.59 | 74.38 | **86.73** | 87.61 | 87.80 | 61.71 | 85.33 | 87.98 | 88.54 | **89.19** |
| | BIRD | EX | 61.67 | 30.64 | 40.03 | 49.28 | **63.04** | 61.80 | 63.04 | 53.91 | 54.30 | 65.25 | 65.65 | **68.32** |
| **Holistic Table** | TableBench DP | Rge | 42.10 | 3.63 | 18.04 | 42.10 | **47.58** | 47.41 | **52.18** | 48.61 | 49.33 | 40.91 | 36.56 | 31.01 |
| **Evaluation** | PoT | Rge | 28.01 | 0.00 | 6.73 | **39.80** | 34.86 | 36.61 | 37.78 | 27.72 | 40.03 | **51.96** | 33.05 | 41.79 |
| | SCoT | Rge | 41.86 | 1.99 | 21.94 | 40.70 | **48.68** | 47.36 | 47.47 | 45.68 | 44.84 | 41.43 | **50.11** | 44.06 |
| | TCoT | Rge | 41.71 | 3.18 | 15.26 | 46.19 | **48.16** | 46.07 | 51.74 | 47.63 | 48.83 | 45.71 | **54.28** | 52.07 |
| **RealHitBench** | FC | EM | 58.83 | 33.44 | 30.32 | 43.06 | **62.85** | 62.36 | **65.00** | 60.23 | 28.95 | 55.22 | **65.08** | 56.53 |
| | NR | EM | 39.43 | 13.51 | 18.25 | 31.75 | **44.91** | 45.62 | 47.45 | 42.82 | 42.61 | 48.66 | **53.89** | 49.88 |

## Citation

If you find our work helpful, please cite us by

```bibtex

```