LightRT commited on
Commit
ba6ff48
·
1 Parent(s): 033a38b

Database Connection Fix

Browse files
Files changed (4) hide show
  1. pyproject.toml +5 -0
  2. requirements.txt +8 -1
  3. src/graph.py +57 -40
  4. uv.lock +0 -0
pyproject.toml CHANGED
@@ -21,4 +21,9 @@ dependencies = [
21
  "sqlalchemy>=2.0.50",
22
  "streamlit>=1.58.0",
23
  "pymysql",
 
 
 
 
 
24
  ]
 
21
  "sqlalchemy>=2.0.50",
22
  "streamlit>=1.58.0",
23
  "pymysql",
24
+ "psycopg2-binary>=2.9.12",
25
+ "pyodbc>=5.3.0",
26
+ "oracledb>=4.0.1",
27
+ "snowflake-sqlalchemy>=1.10.0",
28
+ "sqlalchemy-bigquery>=1.17.0",
29
  ]
requirements.txt CHANGED
@@ -17,4 +17,11 @@ streamlit
17
  requests
18
  psycopg-pool
19
  langsmith
20
- pymysql
 
 
 
 
 
 
 
 
17
  requests
18
  psycopg-pool
19
  langsmith
20
+
21
+ # --- Database Drivers ---
22
+ psycopg2-binary
23
+ pymysql
24
+ pyodbc
25
+ oracledb
26
+ snowflake-sqlalchemy
27
+ sqlalchemy-bigquery
src/graph.py CHANGED
@@ -81,37 +81,50 @@ INSTRUCTION: Analyze the error message and the schema carefully. Fix the syntax,
81
  else :
82
  error_context = ""
83
 
84
- system_prompt = SystemMessage(content=f"""You are an expert Data Analyst and Database Engineer.
85
- Your job is to write highly optimized, perfectly accurate database queries based on user requests.
 
 
86
 
87
  === DATABASE SCHEMA & DIALECT ===
88
- Look at the metadata below to identify the targeted database engine dialect and table layout:
89
  {scheme}
90
 
91
  === CONVERSATION HISTORY ===
92
- Use this previous context to resolve ambiguous terms (e.g., if the user says "filter those by...", look here to see what "those" refers to):
93
  {history_text}
 
 
94
  {error_context}
95
 
96
  === CRITICAL RULES ===
97
- 1. ALIGNMENT: Only use the tables and columns provided in the schema above. Do not hallucinate column names.
98
- 2. DIALECT MATCHING: Look at the 'Dialect:' specified above and write strict queries matching that exact syntax.
99
- 3. JOINS: Pay close attention to the FOREIGN KEY constraints provided in the schema to perform accurate JOINs.
100
- 4. CURRENT DATE: Today's date is {current_date}. Use this exact date for any relative time filters (e.g., "last month", "this year").
101
- 5. CASE SENSITIVITY: When filtering by strings, use case-insensitive comparisons (e.g., LOWER(column) = LOWER('value')) unless instructed otherwise.
102
- 6. SECURITY: NEVER generate DML queries (INSERT, UPDATE, DELETE, DROP). Only generate SELECT statements.
103
-
104
- === OUTPUT SELECTION RULES ===
105
- 1. If the user asks WHO / WHICH / WHAT IS THE NAME / identify a person, customer, user, product, company, or entity, return the human-readable name field, not just the ID.
106
- 2. If the schema has both an ID column and a name column, prefer selecting the name column in the final output.
107
- 3. If the name is in another table, use the required JOIN to fetch it.
108
- 4. Only return an ID alone when the user explicitly asks for the ID, or when no name-like field exists in the schema.
109
- 5. For count/number questions, return an aggregate numeric result, not a list of rows.
110
- 6. For "who/which" questions, do not answer with only identifiers if a readable label exists in the schema.
111
-
112
- === INSTRUCTIONS ===
113
- First, think through the necessary tables, filters, joins, and the exact type of answer expected.
114
- Then, provide the final executable SQL query specifically for the LATEST USER REQUEST.""")
 
 
 
 
 
 
 
 
 
 
 
115
 
116
  final_msg = [
117
  system_prompt,
@@ -149,6 +162,7 @@ def routing(state : State) :
149
  def answer_node(state : State) :
150
  messages = state.get("messages")
151
  query_result = state.get("query_result" , "No records found.")
 
152
  error = state.get("error")
153
 
154
  history_messages = messages[:-1]
@@ -162,29 +176,32 @@ def answer_node(state : State) :
162
  else:
163
  history_text = "This is the first user request. No history exists."
164
 
165
- system_prompt = f"""You are a helpful Data Analyst communicating directly with a user.
 
166
 
167
  === CONVERSATION HISTORY ===
168
- Use this to maintain the context and tone of the conversation:
169
  {history_text}
170
 
171
- === EXECUTION CONTEXT ===\n"""
 
 
172
 
173
- if error:
174
- system_prompt += f"""Unfortunately, the database returned an error and the data could not be retrieved.
175
- Error details: {error}
176
- INSTRUCTION: Politely apologize to the user and briefly explain that you encountered a technical issue retrieving their specific request."""
177
- else:
178
- system_prompt += f"""The database returned this raw data: {query_result}
179
-
180
- INSTRUCTIONS:
181
- 1. Answer using ONLY the returned data.
182
- 2. Never invent a name, value, or entity that is not present in the result.
183
- 3. If the result contains both an ID and a name, use the name in the final answer and mention the ID only if helpful.
184
- 4. If the result contains only an ID and the user asked for a name/person/entity, say that the returned data only contains an identifier and no readable name.
185
- 5. Do not substitute or guess a name from a customer_id or any other identifier.
186
- 6. Do not mention SQL, the database, schemas, or how you got the data.
187
- 7. Give a clean, professional, and conversational response."""
 
188
 
189
  final_msg = [
190
  SystemMessage(content=system_prompt),
 
81
  else :
82
  error_context = ""
83
 
84
+ system_prompt = SystemMessage(content=f"""
85
+ You are an expert Data Analyst and SQL Engineer.
86
+
87
+ Your task is to generate ONE valid SELECT query for the latest user request.
88
 
89
  === DATABASE SCHEMA & DIALECT ===
 
90
  {scheme}
91
 
92
  === CONVERSATION HISTORY ===
 
93
  {history_text}
94
+
95
+ === ERROR CORRECTION MODE ===
96
  {error_context}
97
 
98
  === CRITICAL RULES ===
99
+ 1. Use ONLY tables and columns that exist in the schema.
100
+ 2. Never hallucinate columns, joins, or tables.
101
+ 3. Generate only SELECT queries. No INSERT, UPDATE, DELETE, DROP, TRUNCATE, ALTER.
102
+ 4. Use the exact SQL dialect implied by the schema metadata.
103
+ 5. For any output columns, ALWAYS use clear aliases.
104
+ Example:
105
+ - customer_id AS customer_id
106
+ - customer_name AS customer_name
107
+ - SUM(amount) AS total_amount
108
+ 6. When the user asks for a person/customer/company/product/entity, return BOTH:
109
+ - the readable name field if it exists
110
+ - the matching ID field
111
+ 7. If a name exists in another table, join to fetch it.
112
+ 8. If no readable name exists, return the best human-readable identifier available, and the ID.
113
+ 9. For aggregate queries, include a label column when possible so the answer layer can explain the result.
114
+ 10. If fixing an error, preserve the original user intent and correct only the broken parts.
115
+
116
+ === PRIORITY RULE FOR ID VS NAME ===
117
+ - Priority 1: name + id together, if possible
118
+ - Priority 2: name only, if name exists but id cannot be included
119
+ - Priority 3: id only, only if no readable name exists
120
+
121
+ === OUTPUT FORMAT REQUIREMENT ===
122
+ Return a SQL query whose selected columns are self-explanatory.
123
+ Do not rely on positional meaning like column 1, column 2 without aliases.
124
+
125
+ === CURRENT DATE ===
126
+ Today's date is {current_date}.
127
+ """)
128
 
129
  final_msg = [
130
  system_prompt,
 
162
  def answer_node(state : State) :
163
  messages = state.get("messages")
164
  query_result = state.get("query_result" , "No records found.")
165
+ sql_query = state.get("sql_query", "")
166
  error = state.get("error")
167
 
168
  history_messages = messages[:-1]
 
176
  else:
177
  history_text = "This is the first user request. No history exists."
178
 
179
+ system_prompt = f"""
180
+ You are a helpful Data Analyst communicating directly with a user.
181
 
182
  === CONVERSATION HISTORY ===
 
183
  {history_text}
184
 
185
+ === EXECUTION CONTEXT ===
186
+ SQL QUERY USED:
187
+ {sql_query}
188
 
189
+ RAW DATABASE RESULT:
190
+ {query_result}
191
+
192
+ === INSTRUCTIONS ===
193
+ 1. Use ONLY the returned data.
194
+ 2. Interpret the result using the SQL query and its selected aliases.
195
+ 3. If the query selected columns like customer_id, customer_name, total_amount, use those exact labels in the final response.
196
+ 4. If the result is positional, map values to the SQL SELECT order.
197
+ 5. Never invent a name or ID.
198
+ 6. For who/which questions:
199
+ - prefer name + id
200
+ - if name is missing, give the id and clearly say no readable name was returned
201
+ 7. If the result contains an ID and a value like total_amount, explain them clearly.
202
+ 8. Do not mention SQL or the database in the final answer.
203
+ 9. Give a clean, professional response.
204
+ """
205
 
206
  final_msg = [
207
  SystemMessage(content=system_prompt),
uv.lock CHANGED
The diff for this file is too large to render. See raw diff