Add Transformers library and pipeline tag
#2
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,18 +1,20 @@
|
|
| 1 |
---
|
| 2 |
-
license: llama2
|
| 3 |
datasets:
|
| 4 |
- RUCKBReasoning/TableLLM-SFT
|
| 5 |
language:
|
| 6 |
- en
|
|
|
|
| 7 |
tags:
|
| 8 |
- Table
|
| 9 |
- QA
|
| 10 |
- Code
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
# TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
|
| 14 |
|
| 15 |
-
| **[Paper](https://arxiv.org/abs/2403.19318)** | **[Training set](https://huggingface.co/datasets/RUCKBReasoning/TableLLM-SFT)** | **[Github](https://github.com/
|
| 16 |
|
| 17 |
We present **TableLLM**, a powerful large language model designed to handle tabular data manipulation tasks efficiently, whether they are embedded in spreadsheets or documents, meeting the demands of real office scenarios. The TableLLM series encompasses two distinct scales: [TableLLM-7B](https://huggingface.co/RUCKBReasoning/TableLLM-7b) and [TableLLM-13B](https://huggingface.co/RUCKBReasoning/TableLLM-13b), which are fine-tuned based on [CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) and [CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf).
|
| 18 |
|
|
@@ -92,4 +94,67 @@ The prompt template for direct text answer generation on short tables.
|
|
| 92 |
### [Solution][INST/]
|
| 93 |
````
|
| 94 |
|
| 95 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
datasets:
|
| 3 |
- RUCKBReasoning/TableLLM-SFT
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
+
license: llama2
|
| 7 |
tags:
|
| 8 |
- Table
|
| 9 |
- QA
|
| 10 |
- Code
|
| 11 |
+
pipeline_tag: table-question-answering
|
| 12 |
+
library_name: transformers
|
| 13 |
---
|
| 14 |
|
| 15 |
# TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
|
| 16 |
|
| 17 |
+
| **[Paper](https://arxiv.org/abs/2403.19318)** | **[Training set](https://huggingface.co/datasets/RUCKBReasoning/TableLLM-SFT)** | **[Github](https://github.com/TableLLM/TableLLM)** | **[Homepage](https://tablellm.github.io/)** |
|
| 18 |
|
| 19 |
We present **TableLLM**, a powerful large language model designed to handle tabular data manipulation tasks efficiently, whether they are embedded in spreadsheets or documents, meeting the demands of real office scenarios. The TableLLM series encompasses two distinct scales: [TableLLM-7B](https://huggingface.co/RUCKBReasoning/TableLLM-7b) and [TableLLM-13B](https://huggingface.co/RUCKBReasoning/TableLLM-13b), which are fine-tuned based on [CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) and [CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf).
|
| 20 |
|
|
|
|
| 94 |
### [Solution][INST/]
|
| 95 |
````
|
| 96 |
|
| 97 |
+
## Environment Setup
|
| 98 |
+
|
| 99 |
+
Install the requirements with pip:
|
| 100 |
+
```
|
| 101 |
+
pip install -r requirements.txt
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
## Inference
|
| 105 |
+
The inference results of TableLLM are provided in ```inference/results``` folder. You can also obtain the inference result by yourself. The example commands of spreadsheet-embedded tabular data (e.g., WikiSQL) and document-embedded tabular data (e.g., WTQ) are shown below:
|
| 106 |
+
```
|
| 107 |
+
cd inference
|
| 108 |
+
|
| 109 |
+
python inference_code.py --dataset wikisql --model_path TableLLM-13b
|
| 110 |
+
|
| 111 |
+
python inference_text.py --dataset wtq --model_path TableLLM-13b
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
## Evaluation
|
| 115 |
+
The python code in ```evaluation``` folder is used for reproducing evaluation results. For code generation benchmarks, you can run the following command to reproduce the result of TableLLM-13b on WikiSQL:
|
| 116 |
+
```
|
| 117 |
+
cd evaluation/wikisql-eval
|
| 118 |
+
tar -zxvf csv_tables.tar.gz
|
| 119 |
+
python eval.py --infer_data ../../inference/results/TableLLM-13b/Infer_wikisql.jsonl
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
For text generation, we use [CritiqueLLM](https://github.com/thu-coai/CritiqueLLM) for judgement. We also provide the judgement results running by ourselves. You can obtain it in ```inference/results``` folder and reproduce the results using the following command:
|
| 123 |
+
```
|
| 124 |
+
cd evaluation/text-eval
|
| 125 |
+
python get_sum_grade.py --grade_data ../../inference/results/TableLLM-13b/Grade_wtq.jsonl
|
| 126 |
+
```
|
| 127 |
+
|
| 128 |
+
## Deployment
|
| 129 |
+
You can use the code in ```deployment``` folder as the frontend and backend for deploying TableLLM.
|
| 130 |
+
|
| 131 |
+

|
| 132 |
+
|
| 133 |
+
Deploy TableLLM using vllm. Remember to modify the PORT and MODEL_PATH in the script and ```config.json```.
|
| 134 |
+
```
|
| 135 |
+
cd deployment
|
| 136 |
+
bash scripts/deploy_tablellm.sh
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
Install mongodb and change the username and password to yours in ```config.json```. Prepare the default tables and questions:
|
| 140 |
+
```
|
| 141 |
+
bash prepare_default.sh
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
Deploy the streamlit app:
|
| 145 |
+
```
|
| 146 |
+
streamlit run streamlit.py --server.port PORT
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
## Citation
|
| 150 |
+
```
|
| 151 |
+
@article{zhang2024tablellm,
|
| 152 |
+
title={TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios},
|
| 153 |
+
author={Zhang, Xiaokang and Zhang, Jing and Ma, Zeyao and Li, Yang and Zhang, Bohan and Li, Guanlin and Yao, Zijun and Xu, Kangli and Zhou, Jinchang and Zhang-Li, Daniel and others},
|
| 154 |
+
journal={arXiv preprint arXiv:2403.19318},
|
| 155 |
+
year={2024}
|
| 156 |
+
}
|
| 157 |
+
```
|
| 158 |
+
|
| 159 |
+
## Contact
|
| 160 |
+
If you have any questions, we encourage you to either create Github issues or get in touch with us at <zhang2718@ruc.edu.cn>, <zeyaoma@ruc.edu.cn>, or <zhang-jing@ruc.edu.cn>.
|