Spaces:

LightRT
/

text2sql_backend

Sleeping

App Files Files Community

LightRT commited on 3 days ago

Commit

ba6ff48

1 Parent(s): 033a38b

Database Connection Fix

Browse files

Files changed (4) hide show

pyproject.toml +5 -0
requirements.txt +8 -1
src/graph.py +57 -40
uv.lock +0 -0

pyproject.toml CHANGED Viewed

@@ -21,4 +21,9 @@ dependencies = [
     "sqlalchemy>=2.0.50",
     "streamlit>=1.58.0",
     "pymysql",
 ]

     "sqlalchemy>=2.0.50",
     "streamlit>=1.58.0",
     "pymysql",
+    "psycopg2-binary>=2.9.12",
+    "pyodbc>=5.3.0",
+    "oracledb>=4.0.1",
+    "snowflake-sqlalchemy>=1.10.0",
+    "sqlalchemy-bigquery>=1.17.0",
 ]

requirements.txt CHANGED Viewed

@@ -17,4 +17,11 @@ streamlit
 requests
 psycopg-pool
 langsmith
-pymysql

 requests
 psycopg-pool
 langsmith
+# --- Database Drivers ---
+psycopg2-binary
+pymysql
+pyodbc
+oracledb
+snowflake-sqlalchemy
+sqlalchemy-bigquery

src/graph.py CHANGED Viewed

@@ -81,37 +81,50 @@ INSTRUCTION: Analyze the error message and the schema carefully. Fix the syntax,
     else :
         error_context = ""
-        system_prompt = SystemMessage(content=f"""You are an expert Data Analyst and Database Engineer.
-Your job is to write highly optimized, perfectly accurate database queries based on user requests.
 === DATABASE SCHEMA & DIALECT ===
-Look at the metadata below to identify the targeted database engine dialect and table layout:
 {scheme}
 === CONVERSATION HISTORY ===
-Use this previous context to resolve ambiguous terms (e.g., if the user says "filter those by...", look here to see what "those" refers to):
 {history_text}
 {error_context}
 === CRITICAL RULES ===
-1. ALIGNMENT: Only use the tables and columns provided in the schema above. Do not hallucinate column names.
-2. DIALECT MATCHING: Look at the 'Dialect:' specified above and write strict queries matching that exact syntax.
-3. JOINS: Pay close attention to the FOREIGN KEY constraints provided in the schema to perform accurate JOINs.
-4. CURRENT DATE: Today's date is {current_date}. Use this exact date for any relative time filters (e.g., "last month", "this year").
-5. CASE SENSITIVITY: When filtering by strings, use case-insensitive comparisons (e.g., LOWER(column) = LOWER('value')) unless instructed otherwise.
-6. SECURITY: NEVER generate DML queries (INSERT, UPDATE, DELETE, DROP). Only generate SELECT statements.
-=== OUTPUT SELECTION RULES ===
-1. If the user asks WHO / WHICH / WHAT IS THE NAME / identify a person, customer, user, product, company, or entity, return the human-readable name field, not just the ID.
-2. If the schema has both an ID column and a name column, prefer selecting the name column in the final output.
-3. If the name is in another table, use the required JOIN to fetch it.
-4. Only return an ID alone when the user explicitly asks for the ID, or when no name-like field exists in the schema.
-5. For count/number questions, return an aggregate numeric result, not a list of rows.
-6. For "who/which" questions, do not answer with only identifiers if a readable label exists in the schema.
-=== INSTRUCTIONS ===
-First, think through the necessary tables, filters, joins, and the exact type of answer expected.
-Then, provide the final executable SQL query specifically for the LATEST USER REQUEST.""")
     final_msg = [
         system_prompt,
@@ -149,6 +162,7 @@ def routing(state : State) :
 def answer_node(state : State) :
     messages = state.get("messages")
     query_result = state.get("query_result" , "No records found.")
     error = state.get("error")
     history_messages = messages[:-1]
@@ -162,29 +176,32 @@ def answer_node(state : State) :
     else:
         history_text = "This is the first user request. No history exists."
-    system_prompt = f"""You are a helpful Data Analyst communicating directly with a user.
 === CONVERSATION HISTORY ===
-Use this to maintain the context and tone of the conversation:
 {history_text}
-=== EXECUTION CONTEXT ===\n"""
-    if error:
-        system_prompt += f"""Unfortunately, the database returned an error and the data could not be retrieved.
-Error details: {error}
-INSTRUCTION: Politely apologize to the user and briefly explain that you encountered a technical issue retrieving their specific request."""
-    else:
-        system_prompt += f"""The database returned this raw data: {query_result}
-INSTRUCTIONS:
-1. Answer using ONLY the returned data.
-2. Never invent a name, value, or entity that is not present in the result.
-3. If the result contains both an ID and a name, use the name in the final answer and mention the ID only if helpful.
-4. If the result contains only an ID and the user asked for a name/person/entity, say that the returned data only contains an identifier and no readable name.
-5. Do not substitute or guess a name from a customer_id or any other identifier.
-6. Do not mention SQL, the database, schemas, or how you got the data.
-7. Give a clean, professional, and conversational response."""
     final_msg = [
         SystemMessage(content=system_prompt),

     else :
         error_context = ""
+    system_prompt = SystemMessage(content=f"""
+You are an expert Data Analyst and SQL Engineer.
+Your task is to generate ONE valid SELECT query for the latest user request.
 === DATABASE SCHEMA & DIALECT ===
 {scheme}
 === CONVERSATION HISTORY ===
 {history_text}
+=== ERROR CORRECTION MODE ===
 {error_context}
 === CRITICAL RULES ===
+1. Use ONLY tables and columns that exist in the schema.
+2. Never hallucinate columns, joins, or tables.
+3. Generate only SELECT queries. No INSERT, UPDATE, DELETE, DROP, TRUNCATE, ALTER.
+4. Use the exact SQL dialect implied by the schema metadata.
+5. For any output columns, ALWAYS use clear aliases.
+   Example:
+   - customer_id AS customer_id
+   - customer_name AS customer_name
+   - SUM(amount) AS total_amount
+6. When the user asks for a person/customer/company/product/entity, return BOTH:
+   - the readable name field if it exists
+   - the matching ID field
+7. If a name exists in another table, join to fetch it.
+8. If no readable name exists, return the best human-readable identifier available, and the ID.
+9. For aggregate queries, include a label column when possible so the answer layer can explain the result.
+10. If fixing an error, preserve the original user intent and correct only the broken parts.
+=== PRIORITY RULE FOR ID VS NAME ===
+- Priority 1: name + id together, if possible
+- Priority 2: name only, if name exists but id cannot be included
+- Priority 3: id only, only if no readable name exists
+=== OUTPUT FORMAT REQUIREMENT ===
+Return a SQL query whose selected columns are self-explanatory.
+Do not rely on positional meaning like column 1, column 2 without aliases.
+=== CURRENT DATE ===
+Today's date is {current_date}.
+""")
     final_msg = [
         system_prompt,
 def answer_node(state : State) :
     messages = state.get("messages")
     query_result = state.get("query_result" , "No records found.")
+    sql_query = state.get("sql_query", "")
     error = state.get("error")
     history_messages = messages[:-1]
     else:
         history_text = "This is the first user request. No history exists."
+    system_prompt = f"""
+You are a helpful Data Analyst communicating directly with a user.
 === CONVERSATION HISTORY ===
 {history_text}
+=== EXECUTION CONTEXT ===
+SQL QUERY USED:
+{sql_query}
+RAW DATABASE RESULT:
+{query_result}
+=== INSTRUCTIONS ===
+1. Use ONLY the returned data.
+2. Interpret the result using the SQL query and its selected aliases.
+3. If the query selected columns like customer_id, customer_name, total_amount, use those exact labels in the final response.
+4. If the result is positional, map values to the SQL SELECT order.
+5. Never invent a name or ID.
+6. For who/which questions:
+   - prefer name + id
+   - if name is missing, give the id and clearly say no readable name was returned
+7. If the result contains an ID and a value like total_amount, explain them clearly.
+8. Do not mention SQL or the database in the final answer.
+9. Give a clean, professional response.
+"""
     final_msg = [
         SystemMessage(content=system_prompt),

uv.lock CHANGED Viewed

The diff for this file is too large to render. See raw diff