Add Transformers library and pipeline tag

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +68 -3
README.md CHANGED
@@ -1,18 +1,20 @@
1
  ---
2
- license: llama2
3
  datasets:
4
  - RUCKBReasoning/TableLLM-SFT
5
  language:
6
  - en
 
7
  tags:
8
  - Table
9
  - QA
10
  - Code
 
 
11
  ---
12
 
13
  # TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
14
 
15
- | **[Paper](https://arxiv.org/abs/2403.19318)** | **[Training set](https://huggingface.co/datasets/RUCKBReasoning/TableLLM-SFT)** | **[Github](https://github.com/RUCKBReasoning/TableLLM)** | **[Homepage](https://tablellm.github.io/)** |
16
 
17
  We present **TableLLM**, a powerful large language model designed to handle tabular data manipulation tasks efficiently, whether they are embedded in spreadsheets or documents, meeting the demands of real office scenarios. The TableLLM series encompasses two distinct scales: [TableLLM-7B](https://huggingface.co/RUCKBReasoning/TableLLM-7b) and [TableLLM-13B](https://huggingface.co/RUCKBReasoning/TableLLM-13b), which are fine-tuned based on [CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) and [CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf).
18
 
@@ -92,4 +94,67 @@ The prompt template for direct text answer generation on short tables.
92
  ### [Solution][INST/]
93
  ````
94
 
95
- For more details about how to use TableLLM, please refer to our GitHub page: <https://github.com/TableLLM/TableLLM>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  datasets:
3
  - RUCKBReasoning/TableLLM-SFT
4
  language:
5
  - en
6
+ license: llama2
7
  tags:
8
  - Table
9
  - QA
10
  - Code
11
+ pipeline_tag: table-question-answering
12
+ library_name: transformers
13
  ---
14
 
15
  # TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
16
 
17
+ | **[Paper](https://arxiv.org/abs/2403.19318)** | **[Training set](https://huggingface.co/datasets/RUCKBReasoning/TableLLM-SFT)** | **[Github](https://github.com/TableLLM/TableLLM)** | **[Homepage](https://tablellm.github.io/)** |
18
 
19
  We present **TableLLM**, a powerful large language model designed to handle tabular data manipulation tasks efficiently, whether they are embedded in spreadsheets or documents, meeting the demands of real office scenarios. The TableLLM series encompasses two distinct scales: [TableLLM-7B](https://huggingface.co/RUCKBReasoning/TableLLM-7b) and [TableLLM-13B](https://huggingface.co/RUCKBReasoning/TableLLM-13b), which are fine-tuned based on [CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) and [CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf).
20
 
 
94
  ### [Solution][INST/]
95
  ````
96
 
97
+ ## Environment Setup
98
+
99
+ Install the requirements with pip:
100
+ ```
101
+ pip install -r requirements.txt
102
+ ```
103
+
104
+ ## Inference
105
+ The inference results of TableLLM are provided in ```inference/results``` folder. You can also obtain the inference result by yourself. The example commands of spreadsheet-embedded tabular data (e.g., WikiSQL) and document-embedded tabular data (e.g., WTQ) are shown below:
106
+ ```
107
+ cd inference
108
+
109
+ python inference_code.py --dataset wikisql --model_path TableLLM-13b
110
+
111
+ python inference_text.py --dataset wtq --model_path TableLLM-13b
112
+ ```
113
+
114
+ ## Evaluation
115
+ The python code in ```evaluation``` folder is used for reproducing evaluation results. For code generation benchmarks, you can run the following command to reproduce the result of TableLLM-13b on WikiSQL:
116
+ ```
117
+ cd evaluation/wikisql-eval
118
+ tar -zxvf csv_tables.tar.gz
119
+ python eval.py --infer_data ../../inference/results/TableLLM-13b/Infer_wikisql.jsonl
120
+ ```
121
+
122
+ For text generation, we use [CritiqueLLM](https://github.com/thu-coai/CritiqueLLM) for judgement. We also provide the judgement results running by ourselves. You can obtain it in ```inference/results``` folder and reproduce the results using the following command:
123
+ ```
124
+ cd evaluation/text-eval
125
+ python get_sum_grade.py --grade_data ../../inference/results/TableLLM-13b/Grade_wtq.jsonl
126
+ ```
127
+
128
+ ## Deployment
129
+ You can use the code in ```deployment``` folder as the frontend and backend for deploying TableLLM.
130
+
131
+ ![platform](images/platform.png)
132
+
133
+ Deploy TableLLM using vllm. Remember to modify the PORT and MODEL_PATH in the script and ```config.json```.
134
+ ```
135
+ cd deployment
136
+ bash scripts/deploy_tablellm.sh
137
+ ```
138
+
139
+ Install mongodb and change the username and password to yours in ```config.json```. Prepare the default tables and questions:
140
+ ```
141
+ bash prepare_default.sh
142
+ ```
143
+
144
+ Deploy the streamlit app:
145
+ ```
146
+ streamlit run streamlit.py --server.port PORT
147
+ ```
148
+
149
+ ## Citation
150
+ ```
151
+ @article{zhang2024tablellm,
152
+ title={TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios},
153
+ author={Zhang, Xiaokang and Zhang, Jing and Ma, Zeyao and Li, Yang and Zhang, Bohan and Li, Guanlin and Yao, Zijun and Xu, Kangli and Zhou, Jinchang and Zhang-Li, Daniel and others},
154
+ journal={arXiv preprint arXiv:2403.19318},
155
+ year={2024}
156
+ }
157
+ ```
158
+
159
+ ## Contact
160
+ If you have any questions, we encourage you to either create Github issues or get in touch with us at <zhang2718@ruc.edu.cn>, <zeyaoma@ruc.edu.cn>, or <zhang-jing@ruc.edu.cn>.