Zheyuan Zhao commited on
Commit
f191612
·
verified ·
1 Parent(s): 27465ff

Add model card with training details and benchmark results

Browse files
Files changed (1) hide show
  1. README.md +162 -0
README.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
6
+ tags:
7
+ - text-to-sql
8
+ - pipe-sql
9
+ - sqlglot
10
+ - tool-calling
11
+ - qwen2
12
+ datasets:
13
+ - spider
14
+ pipeline_tag: text-generation
15
+ model-index:
16
+ - name: pipe-sql-1.5b
17
+ results:
18
+ - task:
19
+ type: text-to-sql
20
+ name: Text-to-SQL
21
+ dataset:
22
+ type: spider
23
+ name: Spider 1.0 Dev
24
+ metrics:
25
+ - type: execution_accuracy
26
+ value: 60.66
27
+ name: Execution Accuracy
28
+ ---
29
+
30
+ # Pipe SQL 1.5B
31
+
32
+ A fine-tuned [Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) model for generating **Pipe SQL** through multi-turn tool-calling conversations.
33
+
34
+ ## What is Pipe SQL?
35
+
36
+ Pipe SQL is a more readable SQL syntax that uses the `|>` (pipe) operator to chain operations in a linear, top-to-bottom flow:
37
+
38
+ ```sql
39
+ FROM employees
40
+ |> WHERE department = 'Engineering'
41
+ |> AGGREGATE AVG(salary) AS avg_salary GROUP BY level
42
+ |> ORDER BY avg_salary DESC
43
+ ```
44
+
45
+ This is transpiled to standard SQL via [sqlglot](https://github.com/tobymao/sqlglot), an open-source SQL parser and transpiler.
46
+
47
+ ## Model Details
48
+
49
+ | Property | Value |
50
+ |----------|-------|
51
+ | **Base Model** | Qwen2.5-Coder-1.5B-Instruct |
52
+ | **Architecture** | Qwen2ForCausalLM |
53
+ | **Parameters** | 1.5B |
54
+ | **Hidden Size** | 1536 |
55
+ | **Layers** | 28 |
56
+ | **Attention Heads** | 12 (2 KV heads) |
57
+ | **Context Length** | 2048 tokens (training) |
58
+
59
+ ## Training
60
+
61
+ The model was fine-tuned using **QLoRA** on multi-turn tool-calling conversations for text-to-SQL generation.
62
+
63
+ ### Training Data
64
+
65
+ Conversations were generated from the [Spider 1.0](https://yale-lily.github.io/spider) training set, where each conversation follows an agentic workflow:
66
+ 1. **Explore** the database schema using `list_tables`, `describe_table`, and `sample_data` tools
67
+ 2. **Write** pipe SQL queries using `execute_pipe_sql` and `validate_pipe_sql` tools
68
+ 3. **Iterate** based on execution results until the query is correct
69
+
70
+ ### Hyperparameters
71
+
72
+ | Parameter | Value |
73
+ |-----------|-------|
74
+ | **Method** | QLoRA (4-bit NF4) |
75
+ | **LoRA rank** | 16 |
76
+ | **LoRA alpha** | 32 |
77
+ | **LoRA dropout** | 0.05 |
78
+ | **Target modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
79
+ | **Epochs** | 3 |
80
+ | **Learning rate** | 2e-4 |
81
+ | **LR scheduler** | Cosine |
82
+ | **Warmup ratio** | 0.05 |
83
+ | **Batch size** | 2 (per device) |
84
+ | **Gradient accumulation** | 8 steps |
85
+ | **Weight decay** | 0.01 |
86
+ | **Loss** | Assistant-only (tool responses masked) |
87
+
88
+ ## Evaluation Results
89
+
90
+ Evaluated on the **Spider 1.0 dev set** (1,034 questions) using an agentic benchmark pipeline with execution accuracy.
91
+
92
+ | Metric | Value |
93
+ |--------|-------|
94
+ | **Execution Accuracy** | **60.66%** (626 / 1,032) |
95
+ | **Prediction Rate** | 99.7% (1,031 / 1,034) |
96
+
97
+ ### Status Breakdown
98
+
99
+ | Status | Count | Percentage |
100
+ |--------|-------|------------|
101
+ | Match | 626 | 60.5% |
102
+ | Mismatch | 209 | 20.2% |
103
+ | Execution Error | 170 | 16.4% |
104
+ | Transpile Error | 24 | 2.3% |
105
+ | No Prediction | 3 | 0.3% |
106
+ | Gold Error (excluded) | 2 | 0.2% |
107
+
108
+ > **Note**: This is an **in-distribution** evaluation — the model was trained on Spider training data, and the dev set uses the same 20 databases. Gold errors (2 questions where the reference SQL fails) are excluded from the accuracy denominator.
109
+
110
+ ## Tools
111
+
112
+ The model was trained to use 5 tools in a multi-turn conversation:
113
+
114
+ | Tool | Description |
115
+ |------|-------------|
116
+ | `list_tables` | List all tables in a database |
117
+ | `describe_table` | Get column names, types, and constraints for a table |
118
+ | `sample_data` | Retrieve sample rows from a table |
119
+ | `execute_pipe_sql` | Execute a pipe SQL query against the database |
120
+ | `validate_pipe_sql` | Validate pipe SQL syntax without executing |
121
+
122
+ ## Usage
123
+
124
+ ### Chat Template
125
+
126
+ The model uses a custom chat template with `<tool_call>` tags for tool invocations:
127
+
128
+ ```
129
+ <|im_start|>assistant
130
+ Let me explore the database first.
131
+ <tool_call>
132
+ list_tables({"db_id": "concert_singer"})
133
+ </tool_call><|im_end|>
134
+ ```
135
+
136
+ Tool responses are formatted as:
137
+
138
+ ```
139
+ <|im_start|>user
140
+ <tool_response>
141
+ Tables in database 'concert_singer':
142
+ - stadium
143
+ - singer
144
+ - concert
145
+ - singer_in_concert
146
+ </tool_response><|im_end|>
147
+ ```
148
+
149
+ ### Inference
150
+
151
+ For inference with the correct chat template, see the evaluation server code in the [sqlglot repository](https://github.com/nittygritty-zzy/sqlglot/tree/main/evaluation/server).
152
+
153
+ ## Limitations
154
+
155
+ - Trained and evaluated only on Spider 1.0 (SQLite databases)
156
+ - Context window limited to 2,048 tokens during training
157
+ - The 1.5B model may generate garbled special tokens instead of proper `<tool_call>` tags — the inference server includes fallback parsing for bare function calls
158
+ - Performance on out-of-distribution databases (different schemas/domains) has not been extensively tested
159
+
160
+ ## License
161
+
162
+ This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), consistent with the base Qwen2.5-Coder model license.