--- license: apache-2.0 language: - zh - en base_model: - Qwen/Qwen3-8B --- # TableGPT-R1 ## Model details We developed and released TableGPT-R1, a specialized large language model optimized for complex tabular reasoning and data analysis. Unlike traditional models that rely solely on Supervised Fine-Tuning (SFT), TableGPT-R1 is trained using a systematic Reinforcement Learning (RL) framework. It is designed to bridge the gap between natural language understanding and professional data science requirements, such as multi-step logic, robust code execution, and autonomous environment interaction. **Model Developers** Zhejiang University & Institute of Computing Innovation, Zhejiang University **Key Technical Breakthroughs** * **Autonomous Agentic Reasoning**: The model is trained to "think" before acting. It generates a visible reasoning chain within `` tags, plans Python-based data manipulations, and refines its strategy based on environment feedback (Code Interpreter). * **Unified Reward System**: We introduced a hybrid reward mechanism that combines rule-based verification (for deterministic SQL/Code tasks) with a **Criteria-Injected Reward Model** (for open-ended analytical questions), ensuring both accuracy and interpretability. * **GRPO++ Framework**: Utilizing an enhanced version of Group Relative Policy Optimization, the model optimizes its decision-making process across diverse table structures while maintaining its general-purpose reasoning capabilities. * **Cold-Start Data Engineering**: Bootstrapped with high-quality, long-chain reasoning trajectories, allowing the model to handle extreme table heterogeneity and complex multi-table joins. **Input** TableGPT-R1 accepts both natural language instructions and tabular data. It uniquely supports **table-path inputs**, enabling the model to autonomously load and retrieve information from files using a built-in code interpreter. **Output** TableGPT-R1 supports two output behaviors depending on the task: For tasks requiring logical deduction, metadata explanation, or semantic understanding without external execution. * **Format**: ` ... [Answer]` * **Behavior**: The model performs internal "Chain-of-Thought" to verify its logic before presenting the final result. For data-intensive tasks requiring precise calculation, visualization, or large-scale data processing. * **Format**: 1. **Plan**: ` ... ` (Analyze the goal and plan the code) 2. **Act**: ` ... ` (Generate Python/SQL code) 3. **Observe**: ` ... ` (Receive environment feedback) 4. **Finalize**: ` ... ` (Summarize results) * **Behavior**: The model operates as an autonomous agent, reacting to execution errors or intermediate data results to ensure accuracy. Additionally, to enforce model thinking, the default chat template automatically includes ``. Therefore, it is normal for the model's output to contain only `` without an explicit opening `` tag. **Language** Our model places a strong emphasis on Chinese corpora, and currently, queries in other languages may have limited support. **Model Architecture** TableGPT-R1 is built upon the **Qwen3-8B** transformer architecture, significantly enhanced for long-context tabular understanding and agentic workflows. * **Base Backbone**: Qwen3-8B (Dense Transformer). * **Context Window**: 128K tokens, optimized for processing large-scale table schemas, extensive metadata, and long execution logs. * **Specialized Tokenizer**: Enhanced to handle structural delimiters, whitespace in tables, and code-specific syntax (Python/SQL) more efficiently. * **Agentic Loop Integration**: The architecture is designed to support a seamless **"Think-Act-Observe"** cycle. It treats the environment's feedback (Code Interpreter output) as a first-class sequence input, allowing for real-time error correction and iterative reasoning. * **Instruction Following**: Optimized via RL to strictly adhere to formatting constraints, distinguishing between internal thought process and external tool calls. **Status** This model is static, trained on an offline dataset. Future versions may be released to enhance its performance on specialized tasks. **QuickStart** This code snippet demonstrates how to build a prompt with table information, and shows how to load the tokenizer, load the model, and generate content. > Note that you need `transformers>=4.51.0` to use `TableGPT-R1`: > ```sh > pip install transformers>=4.51.0 > ``` ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Using pandas to read some structured data import pandas as pd from io import StringIO # single table EXAMPLE_CSV_CONTENT = """ "Loss","Date","Score","Opponent","Record","Attendance" "Hampton (14–12)","September 25","8–7","Padres","67–84","31,193" "Speier (5–3)","September 26","3–1","Padres","67–85","30,711" "Elarton (4–9)","September 22","3–1","@ Expos","65–83","9,707" "Lundquist (0–1)","September 24","15–11","Padres","67–83","30,774" "Hampton (13–11)","September 6","9–5","Dodgers","61–78","31,407" """ csv_file = StringIO(EXAMPLE_CSV_CONTENT) df = pd.read_csv(csv_file) model_name = "tablegpt/TableGPT-R1" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) example_prompt_template = """Given access to several pandas dataframes, write the Python code to answer the user's question. /* "{var_name}.head(5).to_string(index=False)" as follows: {df_info} */ Question: {user_question} """ question = "哪些比赛的战绩达到了40胜40负?" prompt = example_prompt_template.format( var_name="df", df_info=df.head(5).to_string(index=False), user_question=question, ) messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt}, ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate(**model_inputs, max_new_tokens=8192) generated_ids = [ output_ids[len(input_ids) :] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` **Deployment** For deployment, you can use `sglang>=0.5.2` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint: - SGLang: ```shell python -m sglang.launch_server --model-path tablegpt/TableGPT-R1 --served-model-name TableGPT-R1 --reasoning-parser qwen3 ``` - vLLM: ```shell vllm serve tablegpt/TableGPT-R1 --served-model-name TableGPT-R1 --enable-reasoning --reasoning-parser deepseek_r1 ``` Then you can access the Chat API by: ```bash curl http://localhost:xxx/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "TableGPT-R1", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "xxxxx?"} ] }' ``` **License** TableGPT-R1 is under apache-2.0 license. **Research Paper** TableGPT-R1 is introduced and validated in the paper "[TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning](https://arxiv.org/xxxx)" available on arXiv. **Where to send questions or comments about the model** Inquiries and feedback are welcome at [j.zhao@zju.edu.cn](mailto:j.zhao@zju.edu.cn). ## Evaluation Results Performance comparison grouped by model scale. Left Group: Models with comparable parameters to TableGPT-R1. Right Group: Significantly larger models and proprietary closed-source models. Bold indicates the best result within each group. Gray background highlights TableGPT-R1. Abbreviations: Q3: Qwen3; QwQ: QwQ-32B; DS-V3: DeepSeek-V3; Q-Plus: Qwen-Plus; T-LLM: TableLLM; Llama: Llama-3.1-8B; TGPT2: TableGPT2-7B; TGPT-R1: TableGPT-R1-8B; FC: Fact Checking; NR: Numerical Reasoning; SC: Structure Comprehending; DA: Data Analysis; CG: Chart Generation. | Benchmark | Task | Met. | Q3-8B | T-LLM | Llama | TGPT2 | **TGPT-R1 (8B)** | Q3-14B | Q3-32B | Q3-70B | QwQ | GPT-4o | DS-V3 | Q-Plus | | :--- | :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | **Internal Bench** | Table Info | Acc | 69.20 | 0.97 | 37.26 | - | **82.00** | 66.10 | 72.58 | 51.10 | 69.68 | 67.26 | 66.00 | **76.90** | | | Table Path | Acc | 73.90 | 0.65 | 31.77 | - | **85.00** | 74.70 | 78.55 | 60.50 | 75.00 | - | 72.90 | **81.50** | | **NL2SQL** | Spider | EX | 86.07 | 65.30 | 73.59 | 74.38 | **86.73** | 87.61 | 87.80 | 61.71 | 85.33 | 87.98 | 88.54 | **89.19** | | | BIRD | EX | 61.67 | 30.64 | 40.03 | 49.28 | **63.04** | 61.80 | 63.04 | 53.91 | 54.30 | 65.25 | 65.65 | **68.32** | | **Holistic Table** | TableBench DP | Rge | 42.10 | 3.63 | 18.04 | 42.10 | **47.58** | 47.41 | **52.18** | 48.61 | 49.33 | 40.91 | 36.56 | 31.01 | | **Evaluation** | PoT | Rge | 28.01 | 0.00 | 6.73 | **39.80** | 34.86 | 36.61 | 37.78 | 27.72 | 40.03 | **51.96** | 33.05 | 41.79 | | | SCoT | Rge | 41.86 | 1.99 | 21.94 | 40.70 | **48.68** | 47.36 | 47.47 | 45.68 | 44.84 | 41.43 | **50.11** | 44.06 | | | TCoT | Rge | 41.71 | 3.18 | 15.26 | 46.19 | **48.16** | 46.07 | 51.74 | 47.63 | 48.83 | 45.71 | **54.28** | 52.07 | | **RealHitBench** | FC | EM | 58.83 | 33.44 | 30.32 | 43.06 | **62.85** | 62.36 | **65.00** | 60.23 | 28.95 | 55.22 | **65.08** | 56.53 | | | NR | EM | 39.43 | 13.51 | 18.25 | 31.75 | **44.91** | 45.62 | 47.45 | 42.82 | 42.61 | 48.66 | **53.89** | 49.88 | ## Citation If you find our work helpful, please cite us by ```bibtex ```