File size: 7,467 Bytes
383c0f5 ec14da0 e4f11be ec14da0 e4f11be 383c0f5 5b41e21 e4f11be 4f6b094 fb15576 5b41e21 afbee49 7d10390 afbee49 7d10390 71d0fee 61f07ce 7d10390 f9e5a6a 7d10390 61f07ce 7d10390 61f07ce 7d10390 61f07ce 7d10390 61f07ce 7d10390 61f07ce 7d10390 61f07ce 2f5f11b e4f11be |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
---
datasets:
- RUCKBReasoning/TableLLM-SFT
language:
- en
license: llama2
tags:
- Table
- QA
- Code
pipeline_tag: table-question-answering
library_name: transformers
---
# TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
| **[Paper](https://arxiv.org/abs/2403.19318)** | **[Training set](https://huggingface.co/datasets/RUCKBReasoning/TableLLM-SFT)** | **[Github](https://github.com/TableLLM/TableLLM)** | **[Homepage](https://tablellm.github.io/)** |
We present **TableLLM**, a powerful large language model designed to handle tabular data manipulation tasks efficiently, whether they are embedded in spreadsheets or documents, meeting the demands of real office scenarios. The TableLLM series encompasses two distinct scales: [TableLLM-7B](https://huggingface.co/RUCKBReasoning/TableLLM-7b) and [TableLLM-13B](https://huggingface.co/RUCKBReasoning/TableLLM-13b), which are fine-tuned based on [CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) and [CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf).
TableLLM generates either a code solution or a direct text answer to handle tabular data manipulation tasks based on different scenarios. Code generation is used for handling spreadsheet-embedded tabular data, which often involves the insert, delete, update, query, merge, and plot operations of tables. Text generation is used for handling document-embedded tabular data, which often involves the query operation of short tables.
## Evaluation Results
We evaluate the code solution generation ability of TableLLM on three benchmarks: WikiSQL, Spider and Self-created table operation benchmark. The text answer generation ability is tested on four benchmarks: WikiTableQuestion (WikiTQ), TAT-QA, FeTaQA and OTTQA. The evaluation result is shown below:
| Model | WikiTQ | TAT-QA | FeTaQA | OTTQA | WikiSQL | Spider | Self-created | Average |
| :------------------- | :----: | :----: | :----: | :-----: | :-----: | :----: | :----------: | :-----: |
| TaPEX | 38.5 | β | β | β | 83.9 | 15.0 | / | 45.8 |
| TaPas | 31.5 | β | β | β | 74.2 | 23.1 | / | 42.92 |
| TableLlama | 24.0 | 22.2 | 20.5 | 6.4 | 43.7 | 9.0 | / | 20.7 |
| GPT3.5 | 58.5 |<ins>72.1</ins>| 71.2 | 60.8 | 81.7 | 67.4 | 77.1 | 69.8 |
| GPT4 |**74.1**|**77.1**|**78.4**|**69.5** | 84.0 | 69.5 | 77.8 | **75.8**|
| Llama2-Chat (13B) | 48.8 | 49.6 | 67.7 | 61.5 | β | β | β | 56.9 |
| CodeLlama (13B) | 43.4 | 47.2 | 57.2 | 49.7 | 38.3 | 21.9 | 47.6 | 43.6 |
| Deepseek-Coder (33B) | 6.5 | 11.0 | 7.1 | 7.4 | 72.5 | 58.4 | 73.9 | 33.8 |
| StructGPT (GPT3.5) | 52.5 | 27.5 | 11.8 | 14.0 | 67.8 |**84.8**| / | 48.9 |
| Binder (GPT3.5) | 61.6 | 12.8 | 6.8 | 5.1 | 78.6 | 52.6 | / | 42.5 |
| DATER (GPT3.5) | 53.4 | 28.4 | 18.3 | 13.0 | 58.2 | 26.5 | / | 37.0 |
| TableLLM-7B (Ours) | 58.8 | 66.9 | 72.6 |<ins>63.1</ins>|<ins>86.6</ins>| 82.6 |<ins>78.8</ins>| 72.8 |
| TableLLM-13B (Ours) |<ins>62.4</ins>| 68.2 |<ins>74.5</ins>| 62.5 | **90.7**|<ins>83.4</ins>| **80.8** |<ins>74.7</ins>|
## Prompt Template
The prompts we used for generating code solutions and text answers are introduced below.
### Code Solution
The prompt template for the insert, delete, update, query, and plot operations on a single table.
```
[INST]Below are the first few lines of a CSV file. You need to write a Python program to solve the provided question.
Header and first few lines of CSV file:
{csv_data}
Question: {question}[/INST]
```
The prompt template for the merge operation on two tables.
```
[INST]Below are the first few lines two CSV file. You need to write a Python program to solve the provided question.
Header and first few lines of CSV file 1:
{csv_data1}
Header and first few lines of CSV file 2:
{csv_data2}
Question: {question}[/INST]
```
The csv_data field is filled with the first few lines of your provided table file. Below is an example:
```
Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
```
### Text Answer
The prompt template for direct text answer generation on short tables.
````
[INST]Offer a thorough and accurate solution that directly addresses the Question outlined in the [Question].
### [Table Text]
{table_descriptions}
### [Table]
```
{table_in_csv}
```
### [Question]
{question}
### [Solution][INST/]
````
## Environment Setup
Install the requirements with pip:
```
pip install -r requirements.txt
```
## Inference
The inference results of TableLLM are provided in ```inference/results``` folder. You can also obtain the inference result by yourself. The example commands of spreadsheet-embedded tabular data (e.g., WikiSQL) and document-embedded tabular data (e.g., WTQ) are shown below:
```
cd inference
python inference_code.py --dataset wikisql --model_path TableLLM-13b
python inference_text.py --dataset wtq --model_path TableLLM-13b
```
## Evaluation
The python code in ```evaluation``` folder is used for reproducing evaluation results. For code generation benchmarks, you can run the following command to reproduce the result of TableLLM-13b on WikiSQL:
```
cd evaluation/wikisql-eval
tar -zxvf csv_tables.tar.gz
python eval.py --infer_data ../../inference/results/TableLLM-13b/Infer_wikisql.jsonl
```
For text generation, we use [CritiqueLLM](https://github.com/thu-coai/CritiqueLLM) for judgement. We also provide the judgement results running by ourselves. You can obtain it in ```inference/results``` folder and reproduce the results using the following command:
```
cd evaluation/text-eval
python get_sum_grade.py --grade_data ../../inference/results/TableLLM-13b/Grade_wtq.jsonl
```
## Deployment
You can use the code in ```deployment``` folder as the frontend and backend for deploying TableLLM.

Deploy TableLLM using vllm. Remember to modify the PORT and MODEL_PATH in the script and ```config.json```.
```
cd deployment
bash scripts/deploy_tablellm.sh
```
Install mongodb and change the username and password to yours in ```config.json```. Prepare the default tables and questions:
```
bash prepare_default.sh
```
Deploy the streamlit app:
```
streamlit run streamlit.py --server.port PORT
```
## Citation
```
@article{zhang2024tablellm,
title={TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios},
author={Zhang, Xiaokang and Zhang, Jing and Ma, Zeyao and Li, Yang and Zhang, Bohan and Li, Guanlin and Yao, Zijun and Xu, Kangli and Zhou, Jinchang and Zhang-Li, Daniel and others},
journal={arXiv preprint arXiv:2403.19318},
year={2024}
}
```
## Contact
If you have any questions, we encourage you to either create Github issues or get in touch with us at <zhang2718@ruc.edu.cn>, <zeyaoma@ruc.edu.cn>, or <zhang-jing@ruc.edu.cn>. |