Tnt3o5 commited on
Commit
e0983b7
Β·
verified Β·
1 Parent(s): 5c91ccd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +189 -12
README.md CHANGED
@@ -1,3 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import argparse
2
 
3
  from openai import OpenAI
@@ -12,7 +170,6 @@ DEFAULT_QUESTION = """CREATE TABLE entity_a (
12
  attr_2 VARCHAR(255),
13
  attr_3 TEXT
14
  );
15
-
16
  CREATE TABLE entity_b (
17
  id INTEGER,
18
  group_id INTEGER,
@@ -36,7 +193,6 @@ ENTITIES = {
36
  },
37
  "time_key": [year],
38
  Query:
39
-
40
  }
41
 
42
 
@@ -46,8 +202,7 @@ Query:
46
  class MyModel(object):
47
  def __init__(self, model_name: str, api_key: str):
48
  self.model_name = model_name
49
- self.client = OpenAI(base_url=""", api_key=api_key)
50
-
51
  def get_prompt(
52
  self,
53
  question: str,
@@ -58,7 +213,6 @@ class MyModel(object):
58
  "content": """
59
  You are a problem solving model working on task_description XML block:
60
  <task_description>You are a specialized Text-to-SQL assistant in the banking domain. Your objective is to translate natural language questions into valid SQLite queries using the provided schema and banking business logic.
61
-
62
  ### Input:
63
  - Schema: Table definitions in SQL DDL format.
64
  - Relationships: Key linking logic between tables (system_data.branch_id = branch.id).
@@ -96,7 +250,6 @@ You are a problem solving model working on task_description XML block:
96
  rule:
97
  - Unit Logic: {Which dmain} data is stored in 'Triệu VND'. If the Question mentions 'Tα»·', multiply the value by 1000.
98
  - Entities: Extracted key information including data_code, year, and branch filtering criteria.
99
-
100
  ### Rules:
101
  1. ALWAYS perform an INNER JOIN between system_data and branch on system_data.branch_id = branch.id.
102
  2. ALWAYS SELECT system_data.data_code, system_data.year, system_data.branch_id, branch.name, system_data.value.
@@ -112,14 +265,12 @@ Generate only the answer, do not generate anything else
112
  {
113
  "role": "user",
114
  "content": f"""
115
-
116
  Now for the real task, solve the task in question block.
117
  Generate only the solution, do not generate anything else
118
  <question>{question}</question>
119
  """,
120
  },
121
  ]
122
-
123
  def invoke(self, question: str) -> str:
124
  chat_response = self.client.chat.completions.create(
125
  model=self.model_name,
@@ -129,15 +280,41 @@ Generate only the solution, do not generate anything else
129
  )
130
  return chat_response.choices[0].message.content
131
 
132
-
133
  if __name__ == "__main__":
134
  parser = argparse.ArgumentParser()
135
  parser.add_argument("--question", type=str, default=DEFAULT_QUESTION, required=False)
136
  parser.add_argument("--api-key", type=str, default="", required=False)
137
  parser.add_argument("--model", type=str, default="model", required=False)
138
-
139
  args = parser.parse_args()
140
-
141
  client = MyModel(model_name=args.model, api_key=args.api_key)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
 
143
- print(client.invoke(args.question))
 
1
+ # Text-to-SQL Evaluation Pipeline
2
+
3
+ ## Overview
4
+
5
+ This repository implements a **Text-to-SQL evaluation pipeline** using the OpenAI API. The system is designed for the **banking domain**, where strict business rules and deterministic SQL generation are required.
6
+
7
+ Key goals:
8
+
9
+ - Translate **natural language questions** into **valid SQLite SQL queries**
10
+ - Enforce **domain-specific constraints** via prompt engineering
11
+ - Benchmark model outputs using **multiple evaluation metrics**
12
+
13
+ ---
14
+
15
+ ## Key Features
16
+
17
+ > **Important:** The existing `MyModel` class and its formatting are **kept exactly as-is**.\
18
+ > This project does **not** modify, refactor, or reformat the demo code. The README only documents how the current implementation works.
19
+
20
+ - The `MyModel` class structure, method names, and prompt formatting remain unchanged
21
+ - No code auto-formatting or refactoring is applied
22
+ - All behavior described below reflects the **original demo code**
23
+
24
+ ## Key Features
25
+
26
+ - Deterministic SQL generation (`temperature = 0`)
27
+ - Strong prompt constraints (no markdown, no explanations, SQL only)
28
+ - Banking-specific metric grouping and unit conversion logic
29
+ - Multi-metric evaluation for both syntactic and semantic quality
30
+
31
+ ---
32
+
33
+ ## High-level Architecture
34
+
35
+ ```text
36
+ User Question
37
+ β”‚
38
+ β–Ό
39
+ Prompt Builder (System + User)
40
+ β”‚
41
+ β–Ό
42
+ OpenAI ChatCompletion API
43
+ β”‚
44
+ β–Ό
45
+ Generated SQL
46
+ β”‚
47
+ β–Ό
48
+ Evaluation Metrics
49
+ ```
50
+
51
+ ---
52
+
53
+ ## Suggested Project Structure
54
+
55
+ ```text
56
+ .
57
+ β”œβ”€β”€ main.py # Entry point, runs inference and prints metrics
58
+ β”œβ”€β”€ model.py # OpenAI client wrapper (MyModel)
59
+ β”œβ”€β”€ evaluator.py # Evaluation metrics implementation
60
+ β”œβ”€β”€ prompts/
61
+ β”‚ └── text2sql.txt # System prompt with banking rules
62
+ β”œβ”€β”€ README.md
63
+ └── requirements.txt
64
+ ```
65
+
66
+ ---
67
+
68
+ ## Prompt Design
69
+
70
+ ### System Prompt Responsibilities
71
+
72
+ The system prompt enforces the following rules:
73
+
74
+ - **Always perform** an `INNER JOIN` between `system_data` and `branch`
75
+ - **Always SELECT** the following columns:
76
+ - `system_data.data_code`
77
+ - `system_data.year`
78
+ - `system_data.branch_id`
79
+ - `branch.name`
80
+ - `system_data.value`
81
+ - SQL keywords must be **UPPERCASE**
82
+ - Text filters must use `LIKE '%keyword%'`
83
+ - Vietnamese location names must use **exact accents**
84
+ - Output **SQL only** (no markdown, no explanations)
85
+
86
+ ### Metric Grouping Logic
87
+
88
+ Metrics are classified by `metric_code` prefix:
89
+
90
+ | Group | Description |
91
+ | ----- | ------------------------------------ |
92
+ | A | Inbound metrics (`MET_A_%`) |
93
+ | B | Outbound metrics (`MET_B_%`) |
94
+ | C | Stock / snapshot metrics (`MET_C_%`) |
95
+ | D | Exposure / obligation metrics |
96
+ | E | Resource mobilization metrics |
97
+ | F | Ratio & efficiency metrics |
98
+
99
+ ### Unit Conversion Rule
100
+
101
+ - Stored unit: **Million VND**
102
+ - If the question mentions **"Billion VND"**, multiply value by `1000`
103
+
104
+ ---
105
+
106
+ ## Example Input (Schema)
107
+
108
+ ```sql
109
+ CREATE TABLE entity_a (
110
+ id INTEGER,
111
+ group_id INTEGER,
112
+ org_id INTEGER,
113
+ code VARCHAR(100),
114
+ name VARCHAR(255),
115
+ attr_1 VARCHAR(255),
116
+ attr_2 VARCHAR(255),
117
+ attr_3 TEXT
118
+ );
119
+
120
+ CREATE TABLE entity_b (
121
+ id INTEGER,
122
+ group_id INTEGER,
123
+ entity_a_id INTEGER,
124
+ time_key INTEGER,
125
+ metric_name VARCHAR(255),
126
+ metric_code VARCHAR(100),
127
+ metric_value REAL,
128
+ metric_unit VARCHAR(100)
129
+ );
130
+ ```
131
+
132
+ ---
133
+
134
+ ## Evaluation Metrics
135
+
136
+ Evaluation results are printed **at the very top of the output**:
137
+
138
+ | Label | Value |
139
+ | -------------- | ------------ |
140
+ | rouge | 0.9290708304 |
141
+ | meteor | 0.9191570862 |
142
+ | binary | 0.55 |
143
+ | llm-as-a-judge | 0.65 |
144
+
145
+ ### Metric Definitions
146
+
147
+ - **ROUGE**: Token-level overlap between generated and reference SQL
148
+ - **METEOR**: Semantic similarity with synonym awareness
149
+ - **Binary Match**: Exact string match (0 or 1)
150
+ - **LLM-as-a-Judge**: LLM-based holistic judgment of correctness
151
+
152
+ ---
153
+
154
+ ## Full Demo Code (Kept Exactly As-Is)
155
+
156
+ The following is the **original demo code**, included verbatim for clarity and ease of understanding. No refactoring, no reformatting, no behavioral changes have been applied.
157
+
158
+ ```python
159
  import argparse
160
 
161
  from openai import OpenAI
 
170
  attr_2 VARCHAR(255),
171
  attr_3 TEXT
172
  );
 
173
  CREATE TABLE entity_b (
174
  id INTEGER,
175
  group_id INTEGER,
 
193
  },
194
  "time_key": [year],
195
  Query:
 
196
  }
197
 
198
 
 
202
  class MyModel(object):
203
  def __init__(self, model_name: str, api_key: str):
204
  self.model_name = model_name
205
+ self.client = OpenAI(base_url="", api_key=api_key)
 
206
  def get_prompt(
207
  self,
208
  question: str,
 
213
  "content": """
214
  You are a problem solving model working on task_description XML block:
215
  <task_description>You are a specialized Text-to-SQL assistant in the banking domain. Your objective is to translate natural language questions into valid SQLite queries using the provided schema and banking business logic.
 
216
  ### Input:
217
  - Schema: Table definitions in SQL DDL format.
218
  - Relationships: Key linking logic between tables (system_data.branch_id = branch.id).
 
250
  rule:
251
  - Unit Logic: {Which dmain} data is stored in 'Triệu VND'. If the Question mentions 'Tα»·', multiply the value by 1000.
252
  - Entities: Extracted key information including data_code, year, and branch filtering criteria.
 
253
  ### Rules:
254
  1. ALWAYS perform an INNER JOIN between system_data and branch on system_data.branch_id = branch.id.
255
  2. ALWAYS SELECT system_data.data_code, system_data.year, system_data.branch_id, branch.name, system_data.value.
 
265
  {
266
  "role": "user",
267
  "content": f"""
 
268
  Now for the real task, solve the task in question block.
269
  Generate only the solution, do not generate anything else
270
  <question>{question}</question>
271
  """,
272
  },
273
  ]
 
274
  def invoke(self, question: str) -> str:
275
  chat_response = self.client.chat.completions.create(
276
  model=self.model_name,
 
280
  )
281
  return chat_response.choices[0].message.content
282
 
 
283
  if __name__ == "__main__":
284
  parser = argparse.ArgumentParser()
285
  parser.add_argument("--question", type=str, default=DEFAULT_QUESTION, required=False)
286
  parser.add_argument("--api-key", type=str, default="", required=False)
287
  parser.add_argument("--model", type=str, default="model", required=False)
 
288
  args = parser.parse_args()
 
289
  client = MyModel(model_name=args.model, api_key=args.api_key)
290
+ print(client.invoke(args.question))
291
+ ```
292
+
293
+ ---
294
+
295
+ ## How to Run
296
+
297
+ ```bash
298
+ python main.py \
299
+ --question "<QUESTION_TEXT>" \
300
+ --api-key "YOUR_OPENAI_API_KEY" \
301
+ --model "gpt-4.1-mini"
302
+ ```
303
+
304
+ ---
305
+
306
+ ## Important Notes
307
+
308
+ - `temperature = 0` ensures reproducible results
309
+ - Function calling is intentionally avoided to prevent JSON-wrapped SQL
310
+ - The prompt is optimized for **SQLite dialect**
311
+
312
+ ---
313
+
314
+ ## Possible Extensions
315
+
316
+ - Multi-year queries using `IN` or ranges
317
+ - Queries combining multiple metric groups
318
+ - Execution-based evaluation (SQL result comparison)
319
+ - Support for additional SQL dialects (PostgreSQL, MySQL)
320