Tnt3o5 commited on
Commit
ebc11ad
·
verified ·
1 Parent(s): d4dc785

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +328 -4
README.md CHANGED
@@ -1,5 +1,329 @@
1
  ---
2
- library_name: transformers
3
- tags: []
4
- ---
5
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - vi
6
+ base_model:
7
+ - Qwen/Qwen3-4B-Instruct-2507
8
+ tags:
9
+ - text-generation-inference
10
+ ---
11
+ # Text-to-SQL Evaluation Pipeline
12
+
13
+ ## Overview
14
+
15
+ This repository implements a **Text-to-SQL evaluation pipeline** using the OpenAI API. The system is designed for the **banking domain**, where strict business rules and deterministic SQL generation are required.
16
+
17
+ Key goals:
18
+
19
+ - Translate **natural language questions** into **valid SQLite SQL queries**
20
+ - Enforce **domain-specific constraints** via prompt engineering
21
+ - Benchmark model outputs using **multiple evaluation metrics**
22
+
23
+ ---
24
+
25
+ ## Key Features
26
+
27
+ > **Important:** The existing `MyModel` class and its formatting are **kept exactly as-is**.\
28
+ > This project does **not** modify, refactor, or reformat the demo code. The README only documents how the current implementation works.
29
+
30
+ - The `MyModel` class structure, method names, and prompt formatting remain unchanged
31
+ - No code auto-formatting or refactoring is applied
32
+ - All behavior described below reflects the **original demo code**
33
+
34
+ ## Key Features
35
+
36
+ - Deterministic SQL generation (`temperature = 0`)
37
+ - Strong prompt constraints (no markdown, no explanations, SQL only)
38
+ - Banking-specific metric grouping and unit conversion logic
39
+ - Multi-metric evaluation for both syntactic and semantic quality
40
+
41
+ ---
42
+
43
+ ## High-level Architecture
44
+
45
+ ```text
46
+ User Question
47
+
48
+
49
+ Prompt Builder (System + User)
50
+
51
+
52
+ OpenAI ChatCompletion API
53
+
54
+
55
+ Generated SQL
56
+
57
+
58
+ Evaluation Metrics
59
+ ```
60
+
61
+ ---
62
+
63
+ ## Suggested Project Structure
64
+
65
+ ```text
66
+ .
67
+ ├── main.py # Entry point, runs inference and prints metrics
68
+ ├── model.py # OpenAI client wrapper (MyModel)
69
+ ├── evaluator.py # Evaluation metrics implementation
70
+ ├── prompts/
71
+ │ └── text2sql.txt # System prompt with banking rules
72
+ ├── README.md
73
+ └── requirements.txt
74
+ ```
75
+
76
+ ---
77
+
78
+ ## Prompt Design
79
+
80
+ ### System Prompt Responsibilities
81
+
82
+ The system prompt enforces the following rules:
83
+
84
+ - **Always perform** an `INNER JOIN` between `system_data` and `branch`
85
+ - **Always SELECT** the following columns:
86
+ - `system_data.data_code`
87
+ - `system_data.year`
88
+ - `system_data.branch_id`
89
+ - `branch.name`
90
+ - `system_data.value`
91
+ - SQL keywords must be **UPPERCASE**
92
+ - Text filters must use `LIKE '%keyword%'`
93
+ - Vietnamese location names must use **exact accents**
94
+ - Output **SQL only** (no markdown, no explanations)
95
+
96
+ ### Metric Grouping Logic
97
+
98
+ Metrics are classified by `metric_code` prefix:
99
+
100
+ | Group | Description |
101
+ | ----- | ------------------------------------ |
102
+ | A | Inbound metrics (`MET_A_%`) |
103
+ | B | Outbound metrics (`MET_B_%`) |
104
+ | C | Stock / snapshot metrics (`MET_C_%`) |
105
+ | D | Exposure / obligation metrics |
106
+ | E | Resource mobilization metrics |
107
+ | F | Ratio & efficiency metrics |
108
+
109
+ ### Unit Conversion Rule
110
+
111
+ - Stored unit: **Million VND**
112
+ - If the question mentions **"Billion VND"**, multiply value by `1000`
113
+
114
+ ---
115
+
116
+ ## Example Input (Schema)
117
+
118
+ ```sql
119
+ CREATE TABLE entity_a (
120
+ id INTEGER,
121
+ group_id INTEGER,
122
+ org_id INTEGER,
123
+ code VARCHAR(100),
124
+ name VARCHAR(255),
125
+ attr_1 VARCHAR(255),
126
+ attr_2 VARCHAR(255),
127
+ attr_3 TEXT
128
+ );
129
+
130
+ CREATE TABLE entity_b (
131
+ id INTEGER,
132
+ group_id INTEGER,
133
+ entity_a_id INTEGER,
134
+ time_key INTEGER,
135
+ metric_name VARCHAR(255),
136
+ metric_code VARCHAR(100),
137
+ metric_value REAL,
138
+ metric_unit VARCHAR(100)
139
+ );
140
+ ```
141
+
142
+ ---
143
+
144
+ ## Evaluation Metrics
145
+
146
+ Evaluation results are printed **at the very top of the output**:
147
+
148
+ | Label | Value |
149
+ | -------------- | ------------ |
150
+ | rouge | 0.96 |
151
+ | meteor | 0.95 |
152
+ | binary | 0.65 |
153
+ | llm-as-a-judge | 0.82 |
154
+
155
+ ### Metric Definitions
156
+
157
+ - **ROUGE**: Token-level overlap between generated and reference SQL
158
+ - **METEOR**: Semantic similarity with synonym awareness
159
+ - **Binary Match**: Exact string match (0 or 1)
160
+ - **LLM-as-a-Judge**: LLM-based holistic judgment of correctness
161
+
162
+ ---
163
+
164
+ ## Full Demo Code (Kept Exactly As-Is)
165
+
166
+ The following is the **original demo code**, included verbatim for clarity and ease of understanding. No refactoring, no reformatting, no behavioral changes have been applied.
167
+
168
+ ```python
169
+ import argparse
170
+
171
+ from openai import OpenAI
172
+
173
+ DEFAULT_QUESTION = """CREATE TABLE entity_a (
174
+ id INTEGER,
175
+ group_id INTEGER,
176
+ org_id INTEGER,
177
+ code VARCHAR(100),
178
+ name VARCHAR(255),
179
+ attr_1 VARCHAR(255),
180
+ attr_2 VARCHAR(255),
181
+ attr_3 TEXT
182
+ );
183
+ CREATE TABLE entity_b (
184
+ id INTEGER,
185
+ group_id INTEGER,
186
+ entity_a_id INTEGER,
187
+ time_key INTEGER,
188
+ metric_name VARCHAR(255),
189
+ metric_code VARCHAR(100),
190
+ metric_value REAL,
191
+ metric_unit VARCHAR(100)
192
+ );
193
+ ENTITIES = {
194
+ "metric": {
195
+ "metric_code": "METRIC_X",
196
+ "metric_unit": "UNIT_A"
197
+ },
198
+ "entity_a_field": {
199
+ "attr_1": [],
200
+ "attr_2": [],
201
+ "attr_3": [],
202
+ "id": []
203
+ },
204
+ "time_key": [year],
205
+ Query:
206
+ }
207
+
208
+
209
+ """
210
+
211
+
212
+ class MyModel(object):
213
+ def __init__(self, model_name: str, api_key: str):
214
+ self.model_name = model_name
215
+ self.client = OpenAI(base_url="", api_key=api_key)
216
+ def get_prompt(
217
+ self,
218
+ question: str,
219
+ ) -> list[dict[str, str]]:
220
+ return [
221
+ {
222
+ "role": "system",
223
+ "content": """
224
+ You are a problem solving model working on task_description XML block:
225
+ <task_description>You are a specialized Text-to-SQL assistant in the banking domain. Your objective is to translate natural language questions into valid SQLite queries using the provided schema and banking business logic.
226
+ ### Input:
227
+ - Schema: Table definitions in SQL DDL format.
228
+ - Relationships: Key linking logic between tables (system_data.branch_id = branch.id).
229
+ - Data Content Context:
230
+ Indicator_Categories:
231
+ Group_A:
232
+ description: Primary metrics – inbound type
233
+ rule:
234
+ - metric_code LIKE 'MET_A_%'
235
+
236
+ Group_B:
237
+ description: Primary metrics – outbound type
238
+ rule:
239
+ - metric_code LIKE 'MET_B_%'
240
+
241
+ Group_C:
242
+ description: Stock / snapshot metrics
243
+ rule:
244
+ - metric_code LIKE 'MET_C_%'
245
+
246
+ Group_D:
247
+ description: Exposure / obligation related metrics
248
+ rule:
249
+ - metric_code LIKE 'MET_D_%'
250
+ - metric_code LIKE 'MET_D_TOTAL_%'
251
+ - metric_code = 'MET_D_SPECIAL'
252
+
253
+ Group_E:
254
+ description: Resource mobilization metrics
255
+ rule:
256
+ - metric_code LIKE 'MET_E_%'
257
+
258
+ Group_F:
259
+ description: Ratio & efficiency indicators
260
+ rule:
261
+ - Unit Logic: {Which dmain} data is stored in 'Triệu VND'. If the Question mentions 'Tỷ', multiply the value by 1000.
262
+ - Entities: Extracted key information including data_code, year, and branch filtering criteria.
263
+ ### Rules:
264
+ 1. ALWAYS perform an INNER JOIN between system_data and branch on system_data.branch_id = branch.id.
265
+ 2. ALWAYS SELECT system_data.data_code, system_data.year, system_data.branch_id, branch.name, system_data.value.
266
+ 3. Use exact Vietnamese accents for location values.
267
+ 4. Use LIKE '%keyword%' for text matching.
268
+ 5. Use UPPERCASE for SQL keywords.
269
+ 6. Output ONLY the SQL query. No explanations or markdown blocks.</task_description>
270
+ You will be given a single task in the question XML block
271
+ Solve only the task in question block.
272
+ Generate only the answer, do not generate anything else
273
+ """,
274
+ },
275
+ {
276
+ "role": "user",
277
+ "content": f"""
278
+ Now for the real task, solve the task in question block.
279
+ Generate only the solution, do not generate anything else
280
+ <question>{question}</question>
281
+ """,
282
+ },
283
+ ]
284
+ def invoke(self, question: str) -> str:
285
+ chat_response = self.client.chat.completions.create(
286
+ model=self.model_name,
287
+ messages=self.get_prompt(question),
288
+ temperature=0,
289
+ reasoning_effort="none",
290
+ )
291
+ return chat_response.choices[0].message.content
292
+
293
+ if __name__ == "__main__":
294
+ parser = argparse.ArgumentParser()
295
+ parser.add_argument("--question", type=str, default=DEFAULT_QUESTION, required=False)
296
+ parser.add_argument("--api-key", type=str, default="", required=False)
297
+ parser.add_argument("--model", type=str, default="model", required=False)
298
+ args = parser.parse_args()
299
+ client = MyModel(model_name=args.model, api_key=args.api_key)
300
+ print(client.invoke(args.question))
301
+ ```
302
+
303
+ ---
304
+
305
+ ## How to Run
306
+
307
+ ```bash
308
+ python main.py \
309
+ --question "<QUESTION_TEXT>" \
310
+ --api-key "YOUR_OPENAI_API_KEY" \
311
+ --model "gpt-4.1-mini"
312
+ ```
313
+
314
+ ---
315
+
316
+ ## Important Notes
317
+
318
+ - `temperature = 0` ensures reproducible results
319
+ - Function calling is intentionally avoided to prevent JSON-wrapped SQL
320
+ - The prompt is optimized for **SQLite dialect**
321
+
322
+ ---
323
+
324
+ ## Possible Extensions
325
+
326
+ - Multi-year queries using `IN` or ranges
327
+ - Queries combining multiple metric groups
328
+ - Execution-based evaluation (SQL result comparison)
329
+ - Support for additional SQL dialects (PostgreSQL, MySQL)