Spaces:

Shizu0n
/

phi3-mini-sql-generator-demo

Sleeping

App Files Files Community

Shizu0n commited on 29 days ago

Commit

bc39556

1 Parent(s): 03cc0b0

chore: double call fix and build_generation_prompt history injection bug fix

Browse files

Files changed (6) hide show

.gitignore +5 -0
README.md +12 -16
app.py +1094 -122
requirements.txt +1 -1
tests/e2e_flow_test.py +250 -0
tests/test_chatbot_behavior.py +672 -0

.gitignore CHANGED Viewed

@@ -48,3 +48,8 @@ logs/
 # AI-generated code artifacts
 *.gen.py
 .claude

 # AI-generated code artifacts
 *.gen.py
 .claude
+# Local agent/workspace notes
+/AGENTS.md
+/CLAUDE.md
+/PROGRESS.md

README.md CHANGED Viewed

@@ -13,34 +13,30 @@ short_description: "SQL generator powered by Phi-3 Mini fine-tuning"
 # Phi-3 Mini SQL Generator
-Generates SQL queries from a table schema and a natural-language question, comparing the base Phi-3 Mini model with a fine-tuned text-to-SQL version.
 ## What the App Does
-Transforms simple table descriptions and questions into SQL using Phi-3 Mini, with a choice between the base model and a QLoRA fine-tuned model.
 ## How to Use
-1. Select a model by clicking the card or the selection button:
-   - **Base Phi-3 Mini**: the non-fine-tuned baseline.
-   - **Fine-tuned QLoRA model**: the main model, selected by default.
-2. Click **Load selected model**.
    - Loading is lazy: the model is only downloaded and loaded when you request it.
    - On CPU, the first load can take a few minutes.
-3. Enter or edit the **SQL table schema**.
    - You can use the presets: `employees`, `orders`, `students`, `products`, `sales`.
    - You can also write your own schema manually.
-4. Enter the question in the **Question** field.
-5. Click **Generate SQL**.
-6. Review the result in `gr.Code(language="sql")`.
    - The app shows a validation badge powered by `sqlparse`.
-7. Optional: click **Save for comparison** to compare the saved query with the current query.
 ## Models
 - Fine-tuned adapter: [Shizu0n/phi3-mini-sql-generator](https://huggingface.co/Shizu0n/phi3-mini-sql-generator)
 - Fine-tuned merged model used in the app: [Shizu0n/phi3-mini-sql-generator-merged](https://huggingface.co/Shizu0n/phi3-mini-sql-generator-merged)
-- Base comparison model: [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
 ## Metrics
@@ -53,10 +49,10 @@ Reported gain: **+71.5 percentage points** over the base model.
 ## Current Features
-- Gradio UI with a step-by-step flow: select model, load, enter schema/question, generate SQL, and compare outputs.
-- Clickable model cards in addition to the selection buttons.
-- Lazy loading with unloading of the previous model to reduce memory use.
-- Preserved Phi-3 patches: `rope_scaling`, `use_cache=False`, and `trust_remote_code`.
 - Schema presets without blocking manual input.
 - SQL output separated from errors/status so booleans, integers, and error messages do not appear inside the SQL block.
 - Centered loading overlay to make the loading state obvious.

 # Phi-3 Mini SQL Generator
+Generates SQL queries from a table schema and a natural-language question using a QLoRA fine-tuned Phi-3 Mini model.
 ## What the App Does
+Transforms simple table descriptions and questions into SQL using the fine-tuned Phi-3 Mini model. The base model is shown as offline evaluation evidence instead of a second live CPU-loaded model.
 ## How to Use
+1. Click **Load fine-tuned model**.
    - Loading is lazy: the model is only downloaded and loaded when you request it.
    - On CPU, the first load can take a few minutes.
+2. Enter or edit the **SQL table schema**.
    - You can use the presets: `employees`, `orders`, `students`, `products`, `sales`.
    - You can also write your own schema manually.
+3. Enter the question in the chat input.
+4. Click **Send**.
+5. Review the result in `gr.Code(language="sql")`.
    - The app shows a validation badge powered by `sqlparse`.
 ## Models
 - Fine-tuned adapter: [Shizu0n/phi3-mini-sql-generator](https://huggingface.co/Shizu0n/phi3-mini-sql-generator)
 - Fine-tuned merged model used in the app: [Shizu0n/phi3-mini-sql-generator-merged](https://huggingface.co/Shizu0n/phi3-mini-sql-generator-merged)
+- Offline baseline model used for evaluation: [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
 ## Metrics
 ## Current Features
+- Gradio UI with a step-by-step flow: load the fine-tuned model, enter schema/question, and generate SQL.
+- Offline baseline metrics shown in the UI without loading a second 3.8B model on the CPU Space.
+- Lazy loading to reduce startup cost.
+- Preserved Phi-3 patches for local/Spaces compatibility.
 - Schema presets without blocking manual input.
 - SQL output separated from errors/status so booleans, integers, and error messages do not appear inside the SQL block.
 - Centered loading overlay to make the loading state obvious.

app.py CHANGED Viewed

@@ -1,9 +1,13 @@
 import gc
 import html
 import inspect
 import re
 import threading
 import time
 import gradio as gr
 import sqlparse
@@ -24,7 +28,7 @@ MODEL_CATALOG = {
         "title": "Phi-3 Mini base",
         "model_id": BASE_MODEL_ID,
         "exact_match": "2.0%",
-        "trust_remote_code": True,
         "ready_text": "Base model ready",
         "metadata": (
             "Model: microsoft/Phi-3-mini-4k-instruct\n"
@@ -50,17 +54,12 @@ MODEL_CATALOG = {
     },
 }
-MODEL_OPTIONS = {
-    MODEL_CATALOG[FINE_TUNED_MODEL_KEY]["label"]: FINE_TUNED_MODEL_ID,
-    MODEL_CATALOG[BASE_MODEL_KEY]["label"]: BASE_MODEL_ID,
-}
 PRESETS = {
-    "employees": "employees (id, name, department, salary)",
-    "orders": "orders (id, customer_id, product, amount, date)",
-    "students": "students (id, name, course, grade, year)",
-    "products": "products (id, name, category, price, stock)",
-    "sales": "sales (id, product_id, quantity, total, date)",
 }
 PROMPT_TEMPLATE = (
@@ -73,15 +72,18 @@ PROMPT_TEMPLATE = (
 GENERAL_PROMPT_TEMPLATE = (
     "<|user|>\n"
-    "You are Phi-3 Mini in a SQL generator demo. Reply naturally and briefly. "
-    "If the user asks for SQL, provide only the SQL query.\n\n"
-    "User: {message}<|end|>\n"
     "<|assistant|>"
 )
 EMPTY_VALIDATOR = '<span class="validator-badge validator-empty">No SQL yet</span>'
 CHAT_VALIDATOR = '<span class="validator-badge validator-empty">Chat response</span>'
 EMPTY_CHAT_OUTPUT = ""
 LOAD_SCROLL_JS = """
 (selectedKey) => {
   setTimeout(() => {
@@ -98,6 +100,7 @@ _current_model_id = None
 _model = None
 _tokenizer = None
 _model_lock = threading.RLock()
 def import_model_runtime():
@@ -115,13 +118,144 @@ def import_model_runtime():
     return torch, AutoConfig, AutoModelForCausalLM, AutoTokenizer
 def patch_phi3_config(config):
     if hasattr(config, "rope_scaling") and config.rope_scaling:
         rope_type = config.rope_scaling.get("rope_type", "longrope")
-        if rope_type == "default":
-            config.rope_scaling = None
-        elif "type" not in config.rope_scaling:
             config.rope_scaling["type"] = rope_type
     return config
@@ -147,36 +281,77 @@ def unload_model():
 def load_model(model_id):
     global _current_model_id, _model, _tokenizer
-    with _model_lock:
         if _current_model_id == model_id and _model is not None and _tokenizer is not None:
             return _model, _tokenizer
-        _, AutoConfig, AutoModelForCausalLM, AutoTokenizer = import_model_runtime()
-        unload_model()
-        model_def = model_by_id(model_id)
-        config = AutoConfig.from_pretrained(
-            model_id,
-            trust_remote_code=model_def["trust_remote_code"],
-        )
-        config = patch_phi3_config(config)
-        tokenizer = AutoTokenizer.from_pretrained(
-            model_id,
-            trust_remote_code=model_def["trust_remote_code"],
-        )
-        model = AutoModelForCausalLM.from_pretrained(
-            model_id,
-            config=config,
-            trust_remote_code=model_def["trust_remote_code"],
-            device_map={"": "cpu"},
-            torch_dtype="auto",
-            low_cpu_mem_usage=True,
-        )
-        model.eval()
-        _model = model
-        _tokenizer = tokenizer
-        _current_model_id = model_id
-        return model, tokenizer
 def model_by_key(model_key):
@@ -197,8 +372,39 @@ def model_key_by_id(model_id):
     return None
 def clean_generation(text):
-    cleaned = (text or "").strip()
     if cleaned.startswith("```"):
         lines = cleaned.splitlines()
         if lines and lines[0].strip().lower() in {"```", "```sql"}:
@@ -209,9 +415,19 @@ def clean_generation(text):
     for marker in ("<|end|>", "<|user|>", "<|assistant|>", "</s>"):
         if marker in cleaned:
             cleaned = cleaned.split(marker, 1)[0].strip()
     return cleaned
 def is_sql_like(text):
     text = (text or "").strip()
     if not text:
@@ -232,43 +448,121 @@ def is_sql_like(text):
 def is_sql_intent(message, schema):
-    message = (message or "").strip().lower()
     schema = (schema or "").strip()
-    if schema:
-        return True
     if not message:
         return False
     sql_terms = {
-        "sql",
-        "query",
-        "select",
-        "table",
-        "schema",
         "database",
-        "join",
         "group by",
         "order by",
-        "where",
-        "average",
-        "count",
-        "sum",
         "rows",
-        "columns",
     }
-    return any(term in message for term in sql_terms)
-def build_generation_prompt(schema, message):
     schema = (schema or "").strip()
     message = (message or "").strip()
     if is_sql_intent(message, schema):
-        table_schema = schema or "No explicit schema provided. Infer the table and columns only if the request includes them."
-        return PROMPT_TEMPLATE.format(schema=table_schema, question=message)
     return GENERAL_PROMPT_TEMPLATE.format(message=message)
 def format_generation_result(text):
-    cleaned = clean_generation(text)
     if is_sql_like(cleaned):
         return str(cleaned), EMPTY_CHAT_OUTPUT, validate_sql(cleaned)
     return "", str(cleaned), CHAT_VALIDATOR
@@ -332,7 +626,7 @@ def render_model_card(model_key, selected_key):
     selected = model_key == selected_key
     state_class = " selected" if selected else ""
     return f"""
-    <article class="model-card{state_class}" role="button" tabindex="0">
       <div class="model-tag">{model_def["tag"]}</div>
       <h3>{model_def["title"]}</h3>
       <code>{model_def["model_id"]}</code>
@@ -391,6 +685,34 @@ def model_metadata(model_key=None):
     """
 def schema_name_by_value(schema):
     schema = (schema or "").strip()
     for name, value in PRESETS.items():
@@ -399,6 +721,376 @@ def schema_name_by_value(schema):
     return "custom"
 def render_schema_context(schema=""):
     schema = (schema or "").strip()
     if not schema:
@@ -416,7 +1108,8 @@ def render_schema_context(schema=""):
 def query_control_updates(can_generate):
     context_updates = [gr.update(interactive=True) for _ in range(6)]
-    return [*context_updates, gr.update(interactive=True), gr.update(interactive=can_generate)]
 def render_message(message="", kind="error"):
@@ -441,9 +1134,13 @@ def select_model(model_key, loaded_key):
     )
-def load_selected_model(selected_key):
-    selected_key = selected_key if selected_key in MODEL_CATALOG else DEFAULT_MODEL_KEY
     model_def = model_by_key(selected_key)
     yield (
         None,
         render_status(selected_key, None, state="loading"),
@@ -459,15 +1156,28 @@ def load_selected_model(selected_key):
     )
     started = time.time()
     try:
-        load_model(model_def["model_id"])
     except Exception as exc:
         error = f"Load failed for {model_def['model_id']}: {type(exc).__name__}: {exc}"
         yield (
             None,
             render_status(selected_key, None),
             render_loading_overlay(visible=False),
             model_metadata(selected_key),
-            gr.update(interactive=True),
             *query_control_updates(False),
             "",
             EMPTY_VALIDATOR,
@@ -483,7 +1193,7 @@ def load_selected_model(selected_key):
         render_status(selected_key, selected_key),
         render_loading_overlay(visible=False),
         model_metadata(selected_key),
-        gr.update(interactive=True),
         *query_control_updates(True),
         "",
         EMPTY_VALIDATOR,
@@ -529,11 +1239,91 @@ def render_compare_label(prefix, model_label, metric):
     )
 def generate_response(message, chat_history, active_schema, loaded_key, saved_state):
     message = (message or "").strip()
     active_schema = (active_schema or "").strip()
     chat_history = list(chat_history or [])
-    if not loaded_key or _model is None or _tokenizer is None:
         compare = comparison_updates(saved_state, "", loaded_key)
         return (
             chat_history,
@@ -543,20 +1333,62 @@ def generate_response(message, chat_history, active_schema, loaded_key, saved_st
             "",
             EMPTY_VALIDATOR,
             gr.update(interactive=False, visible=False),
-            render_message("Load a model before generating SQL."),
             *compare,
         )
-    if not message:
         compare = comparison_updates(saved_state, "", loaded_key)
         return (
             chat_history,
             "",
             active_schema,
             "",
             "",
             EMPTY_VALIDATOR,
             gr.update(interactive=False, visible=False),
-            render_message("Type a message before sending."),
             *compare,
         )
@@ -577,20 +1409,34 @@ def generate_response(message, chat_history, active_schema, loaded_key, saved_st
     started = time.time()
     try:
-        torch, _, _, _ = import_model_runtime()
         with _model_lock:
-            prompt = build_generation_prompt(active_schema, message)
             inputs = _tokenizer(prompt, return_tensors="pt")
             input_length = inputs["input_ids"].shape[-1]
-            with torch.no_grad():
-                output_ids = _model.generate(
-                    **inputs,
-                    max_new_tokens=80,
-                    do_sample=False,
-                    use_cache=False,
-                )
             generated_ids = output_ids[0][input_length:]
-            generated_text = _tokenizer.decode(generated_ids, skip_special_tokens=False)
     except Exception as exc:
         compare = comparison_updates(saved_state, "", loaded_key)
         return (
@@ -624,7 +1470,7 @@ def generate_response(message, chat_history, active_schema, loaded_key, saved_st
         message,
         str(sql_text),
         validator,
-        gr.update(interactive=bool(sql_text.strip()), visible=bool(sql_text.strip())),
         render_message(f"Generated {response_kind} with {model_def['model_id']} in {elapsed}s.", kind="ok"),
         *compare,
     )
@@ -664,9 +1510,50 @@ def save_for_comparison(sql_text, loaded_key, active_schema, last_message):
     )
 CSS = """
 @import url('https://fonts.googleapis.com/css2?family=Space+Mono:wght@400;500;700&display=swap');
 :root {
   --bg-base: #0c0c0b;
   --bg-surface: #1a1a18;
@@ -757,19 +1644,19 @@ CSS = """
 .badge-green,
 .validator-ok {
   background: var(--teal-soft);
-  color: var(--teal-text);
 }
 .badge-cream,
 .validator-warn {
   background: var(--amber-soft);
-  color: var(--amber-text);
 }
 .badge-light,
 .validator-empty {
   background: var(--bg-raised);
-  color: var(--text-secondary);
   border: 0.5px solid var(--border);
 }
@@ -805,29 +1692,24 @@ CSS = """
   background: var(--bg-surface);
   border: 0.5px solid var(--border);
   border-radius: 6px;
-  cursor: pointer;
   min-height: 176px;
   padding: 16px;
   transition: border-color 160ms ease, background 160ms ease;
 }
-.model-card:hover {
-  border-color: var(--border-hi);
-}
 .model-card.selected {
   border: 1.5px solid var(--teal);
 }
 .model-tag {
   background: var(--amber-soft);
-  color: var(--amber-text);
   margin-bottom: 18px;
 }
 .model-card.selected .model-tag {
   background: var(--teal-soft);
-  color: var(--teal-text);
 }
 .model-card h3 {
@@ -882,6 +1764,64 @@ CSS = """
   display: flex;
 }
 #load-button,
 #generate-button,
 #save-button {
@@ -906,6 +1846,11 @@ CSS = """
   width: 100% !important;
 }
 #load-button button:hover,
 #generate-button button:hover {
   background: var(--text-primary) !important;
@@ -969,7 +1914,7 @@ CSS = """
 }
 .stat-card strong {
-  color: var(--text-primary);
   display: block;
   font-size: 15px;
   font-weight: 500;
@@ -978,7 +1923,7 @@ CSS = """
 }
 .stat-card span {
-  color: var(--text-secondary);
   display: block;
   font-size: 11px;
   font-weight: 400;
@@ -1063,16 +2008,32 @@ CSS = """
 }
 .composer-row {
-  align-items: stretch;
   gap: 8px !important;
 }
 #message-input {
   flex: 1 1 auto;
 }
 #message-input textarea {
   min-height: 42px !important;
 }
 #clear-schema-button button {
@@ -1178,7 +2139,7 @@ textarea {
 }
 .validator-detail {
-  color: var(--text-secondary);
   font-size: 11px;
   margin-left: 8px;
 }
@@ -1228,7 +2189,7 @@ textarea {
 .compare-head {
   align-items: center;
   background: var(--amber-soft);
-  color: var(--amber-text);
   display: flex;
   font-size: 11px;
   font-weight: 500;
@@ -1241,7 +2202,7 @@ textarea {
 .compare-card.current .compare-head,
 .current-compare-head .compare-head {
   background: var(--teal-soft);
-  color: var(--teal-text);
 }
 .compare-head strong {
@@ -1316,7 +2277,8 @@ textarea {
 @media (max-width: 860px) {
   .top-panel,
   .model-grid,
-  .compare-grid {
     grid-template-columns: 1fr;
   }
@@ -1334,8 +2296,7 @@ textarea {
 }
 """
-with gr.Blocks(css=CSS, title="Phi-3 Mini SQL Generator") as demo:
-    selected_model_key = gr.State(value=DEFAULT_MODEL_KEY)
     loaded_key_state = gr.State(value=None)
     saved_output = gr.State(value=None)
     active_schema = gr.State(value="")
@@ -1347,11 +2308,11 @@ with gr.Blocks(css=CSS, title="Phi-3 Mini SQL Generator") as demo:
         gr.HTML(render_step("01", "Model"))
         with gr.Row(elem_classes=["model-grid"]):
-            base_model_card = gr.HTML(render_model_card(BASE_MODEL_KEY, DEFAULT_MODEL_KEY))
             fine_tuned_model_card = gr.HTML(render_model_card(FINE_TUNED_MODEL_KEY, DEFAULT_MODEL_KEY))
-        load_button = gr.Button("Load selected model", variant="primary", elem_id="load-button")
         model_status = gr.HTML(render_status(DEFAULT_MODEL_KEY, None))
         model_info = gr.HTML(model_metadata(DEFAULT_MODEL_KEY))
         with gr.Column(elem_id="query-section", elem_classes=["query-section"]):
             gr.HTML(render_step("02", "Chat"))
@@ -1406,7 +2367,7 @@ with gr.Blocks(css=CSS, title="Phi-3 Mini SQL Generator") as demo:
                 show_label=False,
             )
         save_button = gr.Button(
-            "Save for comparison",
             interactive=False,
             visible=False,
             elem_id="save-button",
@@ -1423,8 +2384,6 @@ with gr.Blocks(css=CSS, title="Phi-3 Mini SQL Generator") as demo:
                     current_sql = gr.Code(label="", language="sql", lines=6, show_label=False)
     model_state_outputs = [
-        selected_model_key,
-        base_model_card,
         fine_tuned_model_card,
         model_status,
         model_info,
@@ -1439,20 +2398,10 @@ with gr.Blocks(css=CSS, title="Phi-3 Mini SQL Generator") as demo:
         save_button,
         error_output,
     ]
-    base_model_card.click(
-        select_model,
-        inputs=[gr.State(BASE_MODEL_KEY), loaded_key_state],
-        outputs=model_state_outputs,
-    )
-    fine_tuned_model_card.click(
-        select_model,
-        inputs=[gr.State(FINE_TUNED_MODEL_KEY), loaded_key_state],
-        outputs=model_state_outputs,
-    )
     load_button.click(
         load_selected_model,
-        inputs=selected_model_key,
         outputs=[
             loaded_key_state,
             model_status,
@@ -1523,6 +2472,29 @@ with gr.Blocks(css=CSS, title="Phi-3 Mini SQL Generator") as demo:
             error_output,
         ],
     )
 queue_kwargs = {}
 if "default_concurrency_limit" in inspect.signature(demo.queue).parameters:
@@ -1531,4 +2503,4 @@ demo.queue(**queue_kwargs)
 if __name__ == "__main__":
-    demo.launch()

+import concurrent.futures
 import gc
 import html
 import inspect
+import os
 import re
 import threading
 import time
+import traceback
+import unicodedata
 import gradio as gr
 import sqlparse
         "title": "Phi-3 Mini base",
         "model_id": BASE_MODEL_ID,
         "exact_match": "2.0%",
+        "trust_remote_code": False,
         "ready_text": "Base model ready",
         "metadata": (
             "Model: microsoft/Phi-3-mini-4k-instruct\n"
     },
 }
 PRESETS = {
+    "employees": "CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary NUMERIC)",
+    "orders": "CREATE TABLE orders (id INTEGER, customer_id INTEGER, product TEXT, amount NUMERIC, date DATE)",
+    "students": "CREATE TABLE students (id INTEGER, name TEXT, course TEXT, grade NUMERIC, year INTEGER)",
+    "products": "CREATE TABLE products (id INTEGER, name TEXT, category TEXT, price NUMERIC, stock INTEGER)",
+    "sales": "CREATE TABLE sales (id INTEGER, product_id INTEGER, quantity INTEGER, total NUMERIC, date DATE)",
 }
 PROMPT_TEMPLATE = (
 GENERAL_PROMPT_TEMPLATE = (
     "<|user|>\n"
+    "You are a SQL assistant. Answer the user's question.\n\n"
+    "Question: {message}<|end|>\n"
     "<|assistant|>"
 )
 EMPTY_VALIDATOR = '<span class="validator-badge validator-empty">No SQL yet</span>'
 CHAT_VALIDATOR = '<span class="validator-badge validator-empty">Chat response</span>'
 EMPTY_CHAT_OUTPUT = ""
+LOAD_TIMEOUT_SECONDS = 900
+GENERATION_MAX_TIME_SECONDS = 285
+GENERATION_TIMEOUT_SECONDS = 320
+LOCAL_FILES_ONLY_ENV = "PHI3_SQL_LOCAL_FILES_ONLY"
 LOAD_SCROLL_JS = """
 (selectedKey) => {
   setTimeout(() => {
 _model = None
 _tokenizer = None
 _model_lock = threading.RLock()
+_model_activity_lock = threading.Lock()
 def import_model_runtime():
     return torch, AutoConfig, AutoModelForCausalLM, AutoTokenizer
+def log_load_step(model_id, step, started=None):
+    elapsed = "" if started is None else f" elapsed={time.time() - started:.1f}s"
+    print(f"[LOAD_STEP] model={model_id} step={step}{elapsed}", flush=True)
+def cached_model_weights_available(model_id):
+    try:
+        from huggingface_hub import try_to_load_from_cache
+    except ModuleNotFoundError:
+        return False
+    weight_files = (
+        "model.safetensors",
+        "model.safetensors.index.json",
+        "pytorch_model.bin",
+        "pytorch_model.bin.index.json",
+    )
+    for filename in weight_files:
+        try:
+            cached_path = try_to_load_from_cache(model_id, filename)
+        except Exception:
+            cached_path = None
+        if isinstance(cached_path, str) and os.path.exists(cached_path):
+            return True
+    return False
+def cached_file_path(model_id, filename):
+    try:
+        from huggingface_hub import try_to_load_from_cache
+    except ModuleNotFoundError:
+        return None
+    try:
+        cached_path = try_to_load_from_cache(model_id, filename)
+    except Exception:
+        return None
+    if isinstance(cached_path, str) and os.path.exists(cached_path):
+        return cached_path
+    return None
+def cached_snapshot_path(model_id):
+    config_path = cached_file_path(model_id, "config.json")
+    if not config_path or not cached_model_weights_available(model_id):
+        return None
+    return os.path.dirname(config_path)
+def local_files_only_for(model_id):
+    explicit_local = os.getenv(LOCAL_FILES_ONLY_ENV, "").strip().lower() in {"1", "true", "yes", "on"}
+    offline_mode = bool(os.getenv("HF_HUB_OFFLINE") or os.getenv("TRANSFORMERS_OFFLINE"))
+    return explicit_local or offline_mode
+def running_on_spaces():
+    return bool(os.getenv("SPACE_ID"))
+def resolve_model_source(model_id):
+    if local_files_only_for(model_id):
+        return cached_snapshot_path(model_id) or model_id
+    return model_id
+def dtype_from_name(torch, dtype_name):
+    if not dtype_name:
+        return None
+    normalized = str(dtype_name).replace("torch.", "")
+    return {
+        "float16": torch.float16,
+        "bfloat16": torch.bfloat16,
+        "float32": torch.float32,
+    }.get(normalized)
+def dtype_from_safetensors(torch, source):
+    safetensors_path = os.path.join(source, "model.safetensors")
+    if not os.path.exists(safetensors_path):
+        return None
+    try:
+        from safetensors import safe_open
+        with safe_open(safetensors_path, framework="pt", device="cpu") as handle:
+            keys = list(handle.keys())
+            if not keys:
+                return None
+            return handle.get_tensor(keys[0]).dtype
+    except Exception:
+        return None
+def cpu_model_dtype(torch):
+    return torch.bfloat16
+def model_load_kwargs(torch, config, source):
+    return {
+        "attn_implementation": "eager",
+        "device_map": {"": "cpu"},
+        "low_cpu_mem_usage": True,
+        "torch_dtype": "auto",
+    }
+def force_eager_attention(config):
+    for attr in ("attn_implementation", "_attn_implementation"):
+        try:
+            setattr(config, attr, "eager")
+        except Exception:
+            pass
+    return config
+def _run_generation(model, inputs, kwargs):
+    if not _model_activity_lock.acquire(blocking=False):
+        raise RuntimeError(
+            "Another model operation is still running. Wait for it to finish before starting another request."
+        )
+    torch, _, _, _ = import_model_runtime()
+    try:
+        with torch.no_grad():
+            return model.generate(**inputs, **kwargs)
+    finally:
+        _model_activity_lock.release()
+def _run_model_load(model_id):
+    return load_model(model_id)
 def patch_phi3_config(config):
     if hasattr(config, "rope_scaling") and config.rope_scaling:
         rope_type = config.rope_scaling.get("rope_type", "longrope")
+        if "type" not in config.rope_scaling:
             config.rope_scaling["type"] = rope_type
+        if hasattr(config, "rope_parameters") and config.rope_parameters is None:
+            config.rope_parameters = dict(config.rope_scaling)
     return config
 def load_model(model_id):
     global _current_model_id, _model, _tokenizer
+    started = time.time()
+    log_load_step(model_id, "requested", started)
+    if not _model_lock.acquire(blocking=False):
+        raise RuntimeError("Another model load is still running. Wait for it to finish before retrying.")
+    try:
         if _current_model_id == model_id and _model is not None and _tokenizer is not None:
+            log_load_step(model_id, "already_loaded", started)
             return _model, _tokenizer
+        if not _model_activity_lock.acquire(blocking=False):
+            raise RuntimeError(
+                "Another model operation is still running. Wait for it to finish before switching models."
+            )
+        try:
+            log_load_step(model_id, "runtime_import_start", started)
+            torch, AutoConfig, AutoModelForCausalLM, AutoTokenizer = import_model_runtime()
+            log_load_step(model_id, "runtime_import_done", started)
+            local_files_only = local_files_only_for(model_id)
+            model_source = resolve_model_source(model_id)
+            log_load_step(model_id, f"cache_mode local_files_only={local_files_only}", started)
+            log_load_step(model_id, f"model_source {model_source}", started)
+            log_load_step(model_id, "unload_previous_start", started)
+            unload_model()
+            log_load_step(model_id, "unload_previous_done", started)
+            model_def = model_by_id(model_id)
+            common_kwargs = {
+                "trust_remote_code": model_def["trust_remote_code"],
+                "local_files_only": local_files_only,
+            }
+            log_load_step(model_id, "config_start", started)
+            config = AutoConfig.from_pretrained(
+                model_source,
+                **common_kwargs,
+            )
+            if model_def["trust_remote_code"]:
+                config = patch_phi3_config(config)
+            config = force_eager_attention(config)
+            log_load_step(model_id, "config_done", started)
+            load_kwargs = model_load_kwargs(torch, config, model_source)
+            log_load_step(model_id, f"model_kwargs {load_kwargs}", started)
+            log_load_step(model_id, "tokenizer_start", started)
+            tokenizer = AutoTokenizer.from_pretrained(
+                model_source,
+                **common_kwargs,
+            )
+            if tokenizer.pad_token_id is None and tokenizer.eos_token is not None:
+                tokenizer.pad_token = tokenizer.eos_token
+            log_load_step(model_id, "tokenizer_done", started)
+            log_load_step(model_id, "weights_start", started)
+            model = AutoModelForCausalLM.from_pretrained(
+                model_source,
+                config=config,
+                **common_kwargs,
+                **load_kwargs,
+            )
+            log_load_step(model_id, "weights_done", started)
+            log_load_step(model_id, f"loaded_dtype {getattr(model, 'dtype', 'unknown')}", started)
+            log_load_step(model_id, "eval_start", started)
+            model.config.use_cache = False
+            model.eval()
+            log_load_step(model_id, "eval_done", started)
+            _model = model
+            _tokenizer = tokenizer
+            _current_model_id = model_id
+            log_load_step(model_id, "state_set_done", started)
+            return model, tokenizer
+        finally:
+            _model_activity_lock.release()
+    finally:
+        _model_lock.release()
 def model_by_key(model_key):
     return None
+def content_to_text(value):
+    if value is None:
+        return ""
+    if isinstance(value, str):
+        return value
+    if isinstance(value, dict):
+        for key in ("text", "content", "value"):
+            if key in value:
+                return content_to_text(value[key])
+        return " ".join(content_to_text(item) for item in value.values())
+    if isinstance(value, (list, tuple)):
+        return "\n".join(content_to_text(item) for item in value)
+    return str(value)
+def normalize_text(value):
+    text = content_to_text(value).lower()
+    text = unicodedata.normalize("NFKD", text)
+    text = "".join(char for char in text if not unicodedata.combining(char))
+    return re.sub(r"\s+", " ", text).strip()
+def safe_chat_fallback(_message=""):
+    return (
+        "Selecione um schema e faça uma pergunta SQL, "
+        "ou peça para criar ou editar uma tabela. "
+        "Exemplo: 'crie tabela produtos com id nome preco' "
+        "ou 'qual o produto mais caro?'."
+    )
 def clean_generation(text):
+    cleaned = content_to_text(text).strip()
     if cleaned.startswith("```"):
         lines = cleaned.splitlines()
         if lines and lines[0].strip().lower() in {"```", "```sql"}:
     for marker in ("<|end|>", "<|user|>", "<|assistant|>", "</s>"):
         if marker in cleaned:
             cleaned = cleaned.split(marker, 1)[0].strip()
+    if cleaned.upper().startswith("SQL:"):
+        cleaned = cleaned[4:].strip()
     return cleaned
+def extract_sql_candidate(text):
+    cleaned = clean_generation(text)
+    match = re.search(r"\b(SELECT|WITH|INSERT|UPDATE|DELETE|CREATE|ALTER|DROP)\b", cleaned, flags=re.IGNORECASE)
+    if not match:
+        return cleaned
+    return cleaned[match.start() :].strip()
 def is_sql_like(text):
     text = (text or "").strip()
     if not text:
 def is_sql_intent(message, schema):
+    message = normalize_text(message)
     schema = (schema or "").strip()
     if not message:
         return False
+    # P1 fix: if schema exists and message has substance, treat as SQL intent
+    # (user is likely asking a question about the known schema)
+    # Exclude short greetings/acknowledgments that could accompany a schema setup
+    short_greetings = {
+        "oi", "olá", "ola", "hi", "hello", "hey", "bom", "boa",
+        "obrigado", "thanks", "ok", "sim", "claro", "de nada",
+    }
+    # Extended exclusions for FAQ/off-topic with schema active
+    off_topic_patterns = {
+        "obrigado", "thanks", "thank you", "muito obrigado", "obrigada",
+        "como você funciona", "como voce funciona", "como funciona",
+        "o que você faz", "o que voce faz", "o que faz",
+        "como foi treinado", "como voce foi treinado", "treinado",
+        "quais habilidades", "o que consegue", "o que pode fazer",
+        "me ajude", "help me", "ajuda", "help",
+        # Edit/table manipulation terms — prevent blanket-catch from routing to model
+        "troca", "trocar", "renomeia", "renomear", "renomeie",
+        "muda", "mudar", "altera", "alterar", "edita", "editar",
+        "adiciona", "adicionar", "adicione", "remove", "remover",
+        "apaga", "apagar", "delete column", "drop column",
+        "coluna nova", "nova coluna", "novo campo", "campo novo",
+        "trocando", "mudando", "alterando", "editando",
+    }
+    words = message.split()
+    # Check if message is off-topic even with 2+ words
+    if schema and len(words) >= 2:
+        # Check exact matches and patterns
+        if message in short_greetings or message in off_topic_patterns:
+            return False
+        # Check partial matches for common off-topic phrases
+        for pattern in off_topic_patterns:
+            if pattern in message:
+                return False
+    if schema and len(words) >= 2 and message not in short_greetings:
+        return True
     sql_terms = {
+        "all",
+        "average",
+        "count",
+        "columns",
         "database",
+        "find",
+        "get",
         "group by",
+        "join",
+        "list",
         "order by",
+        "query",
         "rows",
+        "schema",
+        "select",
+        "show",
+        "sql",
+        "sum",
+        "table",
+        "where",
+        "consulta",
+        "consultar",
+        "contar",
+        "colunas",
+        "linhas",
+        "liste",
+        "listar",
+        "maior",
+        "mais caro",
+        "menor",
+        "media",
+        "média",
+        "mostre",
+        "mostrar",
+        "ordene",
+        "por departamento",
+        "selecione",
+        "sql",
+        "some",
+        "soma",
+        "tabela",
     }
+    return any(
+        re.search(rf"(?<!\w){re.escape(normalize_text(term))}(?!\w)", message)
+        for term in sql_terms
+    )
+def build_generation_prompt(schema, message, chat_history=None):
     schema = (schema or "").strip()
     message = (message or "").strip()
     if is_sql_intent(message, schema):
+        table_schema = schema or "CREATE TABLE unknown (id INTEGER)"
+        # Inject last 3 conversation exchanges for multi-turn context
+        history_context = ""
+        if chat_history:
+            trimmed = trim_chat_history(chat_history, max_exchanges=3)
+            if trimmed:
+                lines = []
+                for i in range(0, len(trimmed), 2):
+                    entry1 = trimmed[i]
+                    entry2 = trimmed[i + 1] if i + 1 < len(trimmed) else None
+                    user_msg = entry1.get("content", "") if isinstance(entry1, dict) else (entry1[1] if isinstance(entry1, tuple) else str(entry1))
+                    asst_msg = entry2.get("content", "") if isinstance(entry2, dict) else (entry2[1] if isinstance(entry2, tuple) else str(entry2)) if entry2 else ""
+                    lines.append(f"User: {user_msg}")
+                    if asst_msg:
+                        lines.append(f"Assistant: {asst_msg}")
+                if lines:
+                    history_context = "\n\nPrevious conversation:\n" + "\n".join(lines) + "\n"
+        return PROMPT_TEMPLATE.format(schema=table_schema, question=message) + history_context
     return GENERAL_PROMPT_TEMPLATE.format(message=message)
 def format_generation_result(text):
+    cleaned = extract_sql_candidate(text)
     if is_sql_like(cleaned):
         return str(cleaned), EMPTY_CHAT_OUTPUT, validate_sql(cleaned)
     return "", str(cleaned), CHAT_VALIDATOR
     selected = model_key == selected_key
     state_class = " selected" if selected else ""
     return f"""
+    <article class="model-card{state_class}">
       <div class="model-tag">{model_def["tag"]}</div>
       <h3>{model_def["title"]}</h3>
       <code>{model_def["model_id"]}</code>
     """
+def render_baseline_evidence():
+    return """
+    <section class="evidence-panel">
+      <div class="evidence-copy">
+        <h2>Offline baseline comparison</h2>
+        <p>The live Space loads only the fine-tuned model to keep the CPU demo testable. The base model comparison is kept as evaluation evidence instead of a second live 3.8B CPU load.</p>
+      </div>
+      <div class="evidence-grid">
+        <div class="evidence-card">
+          <span>Base Phi-3 Mini</span>
+          <strong>2.0%</strong>
+          <small>exact match</small>
+        </div>
+        <div class="evidence-card highlighted">
+          <span>Fine-tuned QLoRA</span>
+          <strong>73.5%</strong>
+          <small>exact match</small>
+        </div>
+        <div class="evidence-card">
+          <span>Gain</span>
+          <strong>+71.5pp</strong>
+          <small>same comparison setup</small>
+        </div>
+      </div>
+    </section>
+    """
 def schema_name_by_value(schema):
     schema = (schema or "").strip()
     for name, value in PRESETS.items():
     return "custom"
+def is_create_table_intent(message):
+    message = (message or "").strip().lower()
+    return bool(
+        re.search(r"\b(create|make|build|generate|criar|crie|cria|gerar|gere|faz|faça)\b", message)
+        and re.search(r"\b(table|schema|tabela)\b", message)
+    )
+def is_table_edit_intent(message):
+    message = (message or "").strip().lower()
+    edit_terms = r"\b(edit|update|modify|alter|add|include|remove|delete|drop|edita|editar|altera|altere|alterar|mude|mudar|adicione|adicionar|inclua|incluir|acrescente|remova|remover|delete|deletar|exclua|excluir|novo|nova)\b"
+    direct_add_terms = r"\b(add|include|adicione|adicionar|adicionando|inclua|incluir|acrescente)\b"
+    direct_remove_terms = r"\b(remove|delete|drop|remova|remover|deletar|exclua|excluir)\b"
+    target_terms = r"\b(column|field|element|coluna|campo|elemento|item)\b"
+    # SQL aggregation keywords that indicate query, not table edit
+    sql_aggregation_terms = {"up", "sum", "total", "count", "average", "avg", "max", "min", "by"}
+    words = message.split()
+    # For add: require target term OR check if it's clearly a column name list
+    # "add up the total" is SQL query; "add email and phone" is table edit
+    add_match = re.search(direct_add_terms, message)
+    has_target = re.search(target_terms, message)
+    if add_match:
+        # Find position after "add" keyword
+        match_pos = add_match.start()
+        after_add = message[match_pos + len(add_match.group()):].strip()
+        first_word_after = after_add.split()[0] if after_add.split() else ""
+        # If first word after "add" is aggregation term, it's SQL query, not edit
+        is_sql_query = first_word_after in sql_aggregation_terms
+        is_add_intent = not is_sql_query
+    else:
+        is_add_intent = False
+    return bool(
+        is_add_intent
+        or re.search(direct_remove_terms, message)
+        or is_rename_intent(message)
+        or re.search(r"\b(?:altere|alterar|mude|mudar)\b.*\bter\b", message)
+        or (re.search(edit_terms, message) and (re.search(target_terms, message) or ":" in message))
+    )
+def infer_column_type(column_name):
+    name = column_name.strip().lower()
+    if name == "id" or name.endswith("_id") or name in {"quantity", "quantidade", "stock", "estoque", "year"}:
+        return "INTEGER"
+    if name in {
+        "salary",
+        "price",
+        "preco",
+        "amount",
+        "total",
+        "grade",
+        "peso",
+        "weight",
+        "idade",
+        "age",
+        "altura",
+        "height",
+        "largura",
+        "width",
+        "comprimento",
+        "length",
+        "desconto",
+        "discount",
+    }:
+        return "NUMERIC"
+    if name in {"date", "created_at", "updated_at"} or name.endswith("_date"):
+        return "DATE"
+    return "TEXT"
+def normalize_identifier(value):
+    identifier = re.sub(r"\W+", "_", normalize_text(value)).strip("_")
+    if not identifier:
+        return ""
+    if identifier[0].isdigit():
+        identifier = f"col_{identifier}"
+    return identifier
+def parse_column_definition(raw_column):
+    raw_column = re.sub(r"\b(for me|please|por favor)\b", "", raw_column or "", flags=re.IGNORECASE)
+    raw_column = raw_column.strip(" .;:")
+    if not raw_column:
+        return None
+    # P2 fix: procurar o tipo como token FINAL, não o primeiro match
+    # "date DATE" deve ser interpretado como nome="date", tipo="DATE", não nome="" tipo="date"
+    type_matches = list(
+        re.finditer(
+            r"\b(integer|int|numeric|decimal|real|float|double|text|varchar|char|date|datetime|timestamp|boolean|bool)\b",
+            raw_column,
+            flags=re.IGNORECASE,
+        )
+    )
+    explicit_type = type_matches[-1] if type_matches else None
+    if explicit_type:
+        name_part = raw_column[: explicit_type.start()].strip()
+        column_type = explicit_type.group(1).upper()
+        if column_type == "INT":
+            column_type = "INTEGER"
+        elif column_type == "BOOL":
+            column_type = "BOOLEAN"
+        elif column_type == "DECIMAL":
+            column_type = "NUMERIC"
+        elif column_type in {"FLOAT", "DOUBLE"}:
+            column_type = "REAL"
+        if not name_part.strip():
+            column_type = None
+            name_part = raw_column
+    else:
+        name_part = raw_column
+        column_type = None
+    name_part = re.sub(r"\b(column|field|coluna|campo)\b", "", name_part, flags=re.IGNORECASE)
+    column_name = normalize_identifier(name_part)
+    if not column_name:
+        return None
+    return column_name, column_type or infer_column_type(column_name)
+def split_column_list(columns_text):
+    columns_text = re.sub(r"\s+(and|e)\s+", ",", columns_text or "", flags=re.IGNORECASE)
+    parts = []
+    type_pattern = (
+        r"\b(integer|int|numeric|decimal|real|float|double|text|varchar|char|date|datetime|timestamp|boolean|bool)\b"
+    )
+    type_tokens = {
+        "integer",
+        "int",
+        "numeric",
+        "decimal",
+        "real",
+        "float",
+        "double",
+        "text",
+        "varchar",
+        "char",
+        "date",
+        "datetime",
+        "timestamp",
+        "boolean",
+        "bool",
+    }
+    STOPWORDS = {
+        "to", "from", "into", "as", "for",
+        "o", "a", "os", "de", "do", "da", "dos", "das",
+    }
+    for part in (item.strip() for item in columns_text.split(",") if item.strip()):
+        tokens = [token.strip() for token in re.split(r"\s+", part) if token.strip()]
+        tokens = [t for t in tokens if t.lower() not in STOPWORDS]
+        if not tokens:
+            continue
+        if re.search(type_pattern, part, flags=re.IGNORECASE) and len(tokens) > 2:
+            index = 0
+            # Column names that could be confused with SQL types when followed by date/datetime/timestamp
+            # These should be treated as column names, not as part of type specification
+            inferrable_names = {"total", "date", "time", "timestamp", "int", "text", "real", "char"}
+            while index < len(tokens):
+                current = tokens[index]
+                next_token = tokens[index + 1].lower() if index + 1 < len(tokens) else ""
+                # If current could be inferred as a different type, don't pair with date/datetime/timestamp
+                # This preserves "total date" → "total" (inferred NUMERIC) + "date" (type)
+                if next_token in type_tokens and not (current.lower() in inferrable_names and next_token in {"date", "datetime", "timestamp"}):
+                    parts.append(f"{current} {tokens[index + 1]}")
+                    index += 2
+                else:
+                    parts.append(current)
+                    index += 1
+            continue
+        if re.search(type_pattern, part, flags=re.IGNORECASE):
+            parts.append(part)
+            continue
+        if len(tokens) > 1 and all(re.match(r"^[A-Za-z_][\wàáâãçèéêíóôõúÀÁÂÃÇÈÉÊÍÓÔÕÚ]*$", token) for token in tokens):
+            parts.extend(tokens)
+        else:
+            parts.append(part)
+    return parts
+def format_create_table(table_name, columns):
+    if not table_name or not columns:
+        return ""
+    seen = set()
+    column_lines = []
+    for column_name, column_type in columns:
+        if column_name in seen:
+            continue
+        seen.add(column_name)
+        column_lines.append(f"    {column_name} {column_type}")
+    if not column_lines:
+        return ""
+    return f"CREATE TABLE {table_name} (\n" + ",\n".join(column_lines) + "\n);"
+def create_table_from_message(message):
+    message = (message or "").strip()
+    patterns = (
+        r"\b(?:table|tabela)\s+(?:called\s+|named\s+|chamada?\s+|nomeada?\s+)?([A-Za-z_][\w]*)\s+(?:with|containing|including|com)\s+(.+)$",
+        r"\b(?:create|make|build|generate|criar|crie|gerar|gere)\b.*?\b(?:table|tabela)\b\s+([A-Za-z_][\w]*)\s+(?:with|containing|including|com)\s+(.+)$",
+    )
+    for pattern in patterns:
+        match = re.search(pattern, message, flags=re.IGNORECASE)
+        if not match:
+            continue
+        table_name = normalize_identifier(match.group(1))
+        columns = [
+            parsed
+            for parsed in (parse_column_definition(column) for column in split_column_list(match.group(2)))
+            if parsed
+        ]
+        return format_create_table(table_name, columns)
+    return ""
+def parse_create_table_schema(schema):
+    schema = (schema or "").strip()
+    match = re.match(
+        r"^\s*(?:CREATE\s+TABLE\s+)?([A-Za-z_][\w]*)\s*\((.*?)\)\s*;?\s*$",
+        schema,
+        flags=re.IGNORECASE | re.DOTALL,
+    )
+    if not match:
+        return "", []
+    table_name = normalize_identifier(match.group(1))
+    columns = [
+        parsed
+        for parsed in (parse_column_definition(column) for column in split_column_list(match.group(2)))
+        if parsed
+    ]
+    return table_name, columns
+def create_table_from_schema(schema):
+    table_name, columns = parse_create_table_schema(schema)
+    return format_create_table(table_name, columns)
+def extract_create_table_statement(text):
+    cleaned = extract_sql_candidate(text)
+    match = re.search(
+        r"\bCREATE\s+TABLE\s+[A-Za-z_][\w]*\s*\(.*?\)\s*;?",
+        cleaned,
+        flags=re.IGNORECASE | re.DOTALL,
+    )
+    return clean_generation(match.group(0)) if match else ""
+def last_create_table_from_history(chat_history):
+    for item in reversed(list(chat_history or [])):
+        if not isinstance(item, dict) or item.get("role") != "assistant":
+            continue
+        statement = extract_create_table_statement(item.get("content", ""))
+        if statement:
+            return statement
+    return ""
+def extract_added_columns(message):
+    message = (message or "").strip()
+    patterns = (
+        r":\s*(.+)$",
+        r"\b(?:add|include|with|adicionar|adicione|adicionando|inclua|incluir|acrescente|ter)\b\s+(?:um\s+|uma\s+|a\s+|an\s+)?(?:novo\s+|nova\s+|new\s+)?(?:column|field|element|coluna|campo|elemento|item)?\s*(.+)$",
+    )
+    for pattern in patterns:
+        match = re.search(pattern, message, flags=re.IGNORECASE)
+        if not match:
+            continue
+        columns = [
+            parsed
+            for parsed in (parse_column_definition(column) for column in split_column_list(match.group(1)))
+            if parsed
+        ]
+        if columns:
+            return columns
+    return []
+def extract_removed_columns(message):
+    message = (message or "").strip()
+    patterns = (
+        r"\b(?:remove|delete|drop|remova|remover|deletar|exclua|excluir)\b\s+(?:a\s+|o\s+|the\s+)?(?:column|field|element|coluna|campo|elemento|item)?\s*(.+)$",
+    )
+    for pattern in patterns:
+        match = re.search(pattern, message, flags=re.IGNORECASE)
+        if not match:
+            continue
+        columns = [normalize_identifier(column) for column in split_column_list(match.group(1))]
+        columns = [column for column in columns if column]
+        if columns:
+            return columns
+    return []
+def is_rename_intent(message):
+    message = (message or "").strip().lower()
+    return bool(
+        re.search(
+            r"\b(rename|edit|change|renomeie|renomear|altere|mude)\s+\w+\s+(to|para|as|como)\s+\w+",
+            message,
+            flags=re.IGNORECASE,
+        )
+    )
+def extract_renamed_columns(message):
+    pattern = (
+        r"\b(?:rename|edit|change|renomeie|renomear|altere|mude)\s+"
+        r"(\w+)\s+(?:to|para|as|como)\s+(\w+)"
+    )
+    matches = re.findall(pattern, message or "", flags=re.IGNORECASE)
+    return [
+        (normalize_identifier(old), normalize_identifier(new))
+        for old, new in matches
+        if normalize_identifier(old) and normalize_identifier(new)
+    ]
+def parse_compound_edit(message):
+    """Divide um prompt composto em segmentos e extrai add/remove/rename."""
+    segment_pattern = (
+        r"\s+(?:and|e)\s+"
+        r"(?=\b(?:add|include|remove|delete|drop|rename|edit|change|"
+        r"adicione|adicionar|inclua|acrescente|remova|remover|deletar|"
+        r"exclua|renomeie|renomear|altere|mude)\b)"
+    )
+    segments = re.split(segment_pattern, message or "", flags=re.IGNORECASE)
+    added, removed, renamed = [], [], []
+    for seg in segments:
+        seg = seg.strip()
+        if not seg:
+            continue
+        if is_rename_intent(seg):
+            renamed.extend(extract_renamed_columns(seg))
+        elif re.search(
+            r"\b(remove|delete|drop|remova|remover|deletar|exclua|excluir)\b",
+            seg,
+            flags=re.IGNORECASE,
+        ):
+            removed.extend(extract_removed_columns(seg))
+        else:
+            cols = extract_added_columns(seg)
+            if cols:
+                added.extend(cols)
+    return added, removed, renamed
+def edit_create_table_from_message(message, chat_history, active_schema):
+    if not is_table_edit_intent(message) and not is_rename_intent(message):
+        return ""
+    base_sql = last_create_table_from_history(chat_history) or create_table_from_schema(active_schema)
+    table_name, existing_columns = parse_create_table_schema(base_sql)
+    if not table_name:
+        return ""
+    added_columns, removed_columns_list, renamed_columns = parse_compound_edit(message)
+    removed_set = set(extract_removed_columns(message)) | {r for r in removed_columns_list}
+    if not added_columns and not removed_set and not renamed_columns:
+        return ""
+    rename_map = dict(renamed_columns)
+    kept_columns = [
+        (rename_map.get(col_name, col_name), col_type)
+        for col_name, col_type in existing_columns
+        if col_name not in removed_set
+    ]
+    return format_create_table(table_name, [*kept_columns, *added_columns])
 def render_schema_context(schema=""):
     schema = (schema or "").strip()
     if not schema:
 def query_control_updates(can_generate):
     context_updates = [gr.update(interactive=True) for _ in range(6)]
+    # Keep submit button enabled - model requirement is checked in generate_response
+    return [*context_updates, gr.update(interactive=True), gr.update(interactive=True)]
 def render_message(message="", kind="error"):
     )
+def load_selected_model(selected_key=FINE_TUNED_MODEL_KEY):
+    selected_key = FINE_TUNED_MODEL_KEY
     model_def = model_by_key(selected_key)
+    print(
+        f"[LOAD_REQUEST] selected_key={selected_key} model_id={model_def['model_id']}",
+        flush=True,
+    )
     yield (
         None,
         render_status(selected_key, None, state="loading"),
     )
     started = time.time()
     try:
+        executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
+        future = executor.submit(_run_model_load, model_def["model_id"])
+        try:
+            result = future.result(timeout=LOAD_TIMEOUT_SECONDS)
+        except concurrent.futures.TimeoutError:
+            # Timeout reached but cannot truly cancel a running thread.
+            # Wait for the operation to complete naturally to avoid race conditions.
+            # The UI stays in loading state until the operation finishes.
+            result = future.result()
+            print(f"[LOAD] Completed after timeout warning ({int(time.time() - started)}s)", flush=True)
+        finally:
+            executor.shutdown(wait=False, cancel_futures=True)
     except Exception as exc:
         error = f"Load failed for {model_def['model_id']}: {type(exc).__name__}: {exc}"
+        print(f"[LOAD_ERROR] {error}", flush=True)
+        traceback.print_exc()
         yield (
             None,
             render_status(selected_key, None),
             render_loading_overlay(visible=False),
             model_metadata(selected_key),
+            gr.update(interactive=True, visible=True),
             *query_control_updates(False),
             "",
             EMPTY_VALIDATOR,
         render_status(selected_key, selected_key),
         render_loading_overlay(visible=False),
         model_metadata(selected_key),
+        gr.update(interactive=True, visible=True, value="Load fine-tuned model"),
         *query_control_updates(True),
         "",
         EMPTY_VALIDATOR,
     )
+def deterministic_response(
+    chat_history,
+    message,
+    active_schema,
+    loaded_key,
+    saved_state,
+    assistant_content,
+    status_message,
+    *,
+    sql_text="",
+    validator=CHAT_VALIDATOR,
+    status_kind="ok",
+):
+    new_history = trim_chat_history(
+        [
+            *list(chat_history or []),
+            {"role": "user", "content": message},
+            {"role": "assistant", "content": assistant_content},
+        ]
+    )
+    # If sql_text is a CREATE TABLE, promote it to active_schema for subsequent queries
+    new_schema = active_schema
+    if sql_text and "CREATE TABLE" in sql_text.upper():
+        new_schema = sql_text
+    compare = comparison_updates(saved_state, sql_text, loaded_key)
+    return (
+        new_history,
+        "",
+        new_schema,
+        message,
+        sql_text,
+        validator,
+        gr.update(interactive=False, visible=False),
+        render_message(status_message, kind=status_kind),
+        *compare,
+    )
 def generate_response(message, chat_history, active_schema, loaded_key, saved_state):
     message = (message or "").strip()
     active_schema = (active_schema or "").strip()
     chat_history = list(chat_history or [])
+    if not message:
+        compare = comparison_updates(saved_state, "", loaded_key)
+        return (
+            chat_history,
+            "",
+            active_schema,
+            "",
+            "",
+            EMPTY_VALIDATOR,
+            gr.update(interactive=False, visible=False),
+            render_message("Type a message before sending."),
+            *compare,
+        )
+    # Routing debug log — shows which intent matched
+    _routing = []
+    edited_table = edit_create_table_from_message(message, chat_history, active_schema)
+    if edited_table:
+        _routing.append("edit_create_table")
+    elif is_table_edit_intent(message):
+        _routing.append("is_table_edit_intent")
+    elif is_create_table_intent(message):
+        _routing.append("is_create_table_intent")
+    elif is_sql_intent(message, active_schema):
+        _routing.append("is_sql_intent")
+    else:
+        _routing.append("no_match")
+    print(f"[ROUTING] \"{message[:60]}\" → {_routing}")
+    if edited_table:
+        display_response = f"```sql\n{edited_table}\n```"
+        return deterministic_response(
+            chat_history,
+            message,
+            active_schema,
+            loaded_key,
+            saved_state,
+            display_response,
+            "Edited CREATE TABLE without calling the model.",
+            sql_text=edited_table,
+            validator=validate_sql(edited_table),
+        )
+    if is_table_edit_intent(message):
         compare = comparison_updates(saved_state, "", loaded_key)
         return (
             chat_history,
             "",
             EMPTY_VALIDATOR,
             gr.update(interactive=False, visible=False),
+            render_message("I need an existing CREATE TABLE in the chat or an active schema before editing columns."),
             *compare,
         )
+    if is_create_table_intent(message):
+        sql_text = create_table_from_message(message) or create_table_from_schema(active_schema)
+        if sql_text:
+            display_response = f"```sql\n{sql_text}\n```"
+            return deterministic_response(
+                chat_history,
+                message,
+                active_schema,
+                loaded_key,
+                saved_state,
+                display_response,
+                "Generated CREATE TABLE without calling the model.",
+                sql_text=sql_text,
+                validator=validate_sql(sql_text),
+            )
         compare = comparison_updates(saved_state, "", loaded_key)
         return (
             chat_history,
+            message,
+            active_schema,
+            "",
             "",
+            EMPTY_VALIDATOR,
+            gr.update(interactive=False, visible=False),
+            render_message("CREATE TABLE needs a table name and columns, or an active schema context."),
+            *compare,
+        )
+    if not is_sql_intent(message, active_schema):
+        fallback = safe_chat_fallback()
+        return deterministic_response(
+            chat_history,
+            message,
+            active_schema,
+            loaded_key,
+            saved_state,
+            fallback,
+            "No SQL intent or active schema detected.",
+        )
+    if not loaded_key or _model is None or _tokenizer is None:
+        compare = comparison_updates(saved_state, "", loaded_key)
+        return (
+            chat_history,
+            message,
             active_schema,
             "",
             "",
             EMPTY_VALIDATOR,
             gr.update(interactive=False, visible=False),
+            render_message("Load a model before generating SQL."),
             *compare,
         )
     started = time.time()
     try:
+        import_model_runtime()
         with _model_lock:
+            prompt = build_generation_prompt(active_schema, message, chat_history)
             inputs = _tokenizer(prompt, return_tensors="pt")
             input_length = inputs["input_ids"].shape[-1]
+            gen_kwargs = {
+                "max_new_tokens": 80,
+                "max_time": GENERATION_MAX_TIME_SECONDS,
+                "do_sample": False,
+                "use_cache": False,
+                "repetition_penalty": 1.1,
+                "eos_token_id": getattr(_model.generation_config, "eos_token_id", _tokenizer.eos_token_id),
+                "pad_token_id": _tokenizer.pad_token_id or _tokenizer.eos_token_id,
+            }
+            executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
+            future = executor.submit(_run_generation, _model, inputs, gen_kwargs)
+            try:
+                output_ids = future.result(timeout=GENERATION_TIMEOUT_SECONDS)
+            except concurrent.futures.TimeoutError:
+                # Timeout reached - do NOT call future.result() without timeout as it can block indefinitely.
+                # The thread may continue in background but we won't wait for it.
+                # Return error to user and release the slot.
+                executor.shutdown(wait=False, cancel_futures=False)
+                raise TimeoutError(f"Generation timed out after {GENERATION_TIMEOUT_SECONDS}s")
+            finally:
+                executor.shutdown(wait=False, cancel_futures=True)
             generated_ids = output_ids[0][input_length:]
+            generated_text = _tokenizer.decode(generated_ids, skip_special_tokens=True)
     except Exception as exc:
         compare = comparison_updates(saved_state, "", loaded_key)
         return (
         message,
         str(sql_text),
         validator,
+        gr.update(interactive=False, visible=False),
         render_message(f"Generated {response_kind} with {model_def['model_id']} in {elapsed}s.", kind="ok"),
         *compare,
     )
     )
+def sync_on_load():
+    if _model is not None and _current_model_id is not None:
+        loaded_key = model_key_by_id(_current_model_id)
+        if loaded_key:
+            return (
+                loaded_key,
+                render_status(loaded_key, loaded_key),
+                render_loading_overlay(visible=False),
+                model_metadata(loaded_key),
+                gr.update(interactive=True, visible=True, value="Load fine-tuned model"),
+                *query_control_updates(True),
+                "",
+                EMPTY_VALIDATOR,
+                gr.update(interactive=False, visible=False),
+                render_message(f"Model already loaded: {_current_model_id}", kind="ok"),
+                gr.update(visible=False),
+            )
+    return (
+        None,
+        render_status(DEFAULT_MODEL_KEY, None),
+        render_loading_overlay(visible=False),
+        model_metadata(DEFAULT_MODEL_KEY),
+        gr.update(interactive=True, visible=True),
+        *query_control_updates(False),
+        "",
+        EMPTY_VALIDATOR,
+        gr.update(interactive=False, visible=False),
+        render_message(),
+        gr.update(visible=False),
+    )
 CSS = """
 @import url('https://fonts.googleapis.com/css2?family=Space+Mono:wght@400;500;700&display=swap');
+/* Prevent Gradio dark theme from overriding text in light-bg components */
+[class*="badge"],
+[class*="validator-"],
+[class*="compare-head"],
+[class*="model-tag"],
+[class*="stat-card"] {
+  color: inherit !important;
+}
 :root {
   --bg-base: #0c0c0b;
   --bg-surface: #1a1a18;
 .badge-green,
 .validator-ok {
   background: var(--teal-soft);
+  color: var(--teal-text) !important;
 }
 .badge-cream,
 .validator-warn {
   background: var(--amber-soft);
+  color: var(--amber-text) !important;
 }
 .badge-light,
 .validator-empty {
   background: var(--bg-raised);
+  color: var(--text-secondary) !important;
   border: 0.5px solid var(--border);
 }
   background: var(--bg-surface);
   border: 0.5px solid var(--border);
   border-radius: 6px;
   min-height: 176px;
   padding: 16px;
   transition: border-color 160ms ease, background 160ms ease;
 }
 .model-card.selected {
   border: 1.5px solid var(--teal);
 }
 .model-tag {
   background: var(--amber-soft);
+  color: var(--amber-text) !important;
   margin-bottom: 18px;
 }
 .model-card.selected .model-tag {
   background: var(--teal-soft);
+  color: var(--teal-text) !important;
 }
 .model-card h3 {
   display: flex;
 }
+.evidence-panel {
+  background: var(--bg-surface);
+  border: 0.5px solid var(--border);
+  border-radius: 6px;
+  margin-top: 12px;
+  padding: 16px;
+}
+.evidence-copy h2 {
+  color: var(--text-primary);
+  font-size: 13px;
+  font-weight: 500;
+  line-height: 1.3;
+  margin: 0 0 6px;
+}
+.evidence-copy p {
+  color: var(--text-secondary);
+  font-size: 12px;
+  line-height: 1.45;
+  margin: 0;
+}
+.evidence-grid {
+  display: grid;
+  gap: 8px;
+  grid-template-columns: repeat(3, minmax(0, 1fr));
+  margin-top: 14px;
+}
+.evidence-card {
+  background: var(--bg-raised);
+  border: 0.5px solid var(--border);
+  border-radius: 6px;
+  padding: 10px;
+}
+.evidence-card.highlighted {
+  border-color: rgba(29, 158, 117, 0.5);
+}
+.evidence-card span,
+.evidence-card small {
+  color: var(--text-secondary);
+  display: block;
+  font-size: 10px;
+  line-height: 1.25;
+}
+.evidence-card strong {
+  color: var(--text-primary);
+  display: block;
+  font-size: 20px;
+  font-weight: 500;
+  line-height: 1.1;
+  margin: 5px 0;
+}
 #load-button,
 #generate-button,
 #save-button {
   width: 100% !important;
 }
+#generate-button button {
+  height: 42px !important;
+  min-height: 42px !important;
+}
 #load-button button:hover,
 #generate-button button:hover {
   background: var(--text-primary) !important;
 }
 .stat-card strong {
+  color: var(--text-primary) !important;
   display: block;
   font-size: 15px;
   font-weight: 500;
 }
 .stat-card span {
+  color: var(--text-secondary) !important;
   display: block;
   font-size: 11px;
   font-weight: 400;
 }
 .composer-row {
+  align-items: flex-end !important;
+  display: flex !important;
   gap: 8px !important;
 }
+.composer-row > div {
+  display: flex !important;
+  flex-direction: column !important;
+  justify-content: flex-end !important;
+}
 #message-input {
   flex: 1 1 auto;
 }
 #message-input textarea {
   min-height: 42px !important;
+  max-height: 120px !important;
+  height: 42px !important;
+  resize: none !important;
+  overflow-y: auto !important;
+}
+#generate-button {
+  align-self: flex-end !important;
+  margin-bottom: 0 !important;
 }
 #clear-schema-button button {
 }
 .validator-detail {
+  color: var(--text-secondary) !important;
   font-size: 11px;
   margin-left: 8px;
 }
 .compare-head {
   align-items: center;
   background: var(--amber-soft);
+  color: var(--amber-text) !important;
   display: flex;
   font-size: 11px;
   font-weight: 500;
 .compare-card.current .compare-head,
 .current-compare-head .compare-head {
   background: var(--teal-soft);
+  color: var(--teal-text) !important;
 }
 .compare-head strong {
 @media (max-width: 860px) {
   .top-panel,
   .model-grid,
+  .compare-grid,
+  .evidence-grid {
     grid-template-columns: 1fr;
   }
 }
 """
+with gr.Blocks(title="Phi-3 Mini SQL Generator") as demo:
     loaded_key_state = gr.State(value=None)
     saved_output = gr.State(value=None)
     active_schema = gr.State(value="")
         gr.HTML(render_step("01", "Model"))
         with gr.Row(elem_classes=["model-grid"]):
             fine_tuned_model_card = gr.HTML(render_model_card(FINE_TUNED_MODEL_KEY, DEFAULT_MODEL_KEY))
+        load_button = gr.Button("Load fine-tuned model", variant="primary", elem_id="load-button")
         model_status = gr.HTML(render_status(DEFAULT_MODEL_KEY, None))
         model_info = gr.HTML(model_metadata(DEFAULT_MODEL_KEY))
+        gr.HTML(render_baseline_evidence())
         with gr.Column(elem_id="query-section", elem_classes=["query-section"]):
             gr.HTML(render_step("02", "Chat"))
                 show_label=False,
             )
         save_button = gr.Button(
+            "Save output",
             interactive=False,
             visible=False,
             elem_id="save-button",
                     current_sql = gr.Code(label="", language="sql", lines=6, show_label=False)
     model_state_outputs = [
         fine_tuned_model_card,
         model_status,
         model_info,
         save_button,
         error_output,
     ]
     load_button.click(
         load_selected_model,
+        inputs=None,
         outputs=[
             loaded_key_state,
             model_status,
             error_output,
         ],
     )
+    demo.load(
+        sync_on_load,
+        outputs=[
+            loaded_key_state,
+            model_status,
+            loading_overlay,
+            model_info,
+            load_button,
+            employees_preset,
+            orders_preset,
+            students_preset,
+            products_preset,
+            sales_preset,
+            clear_schema_button,
+            message_input,
+            send_button,
+            sql_output,
+            validator_output,
+            save_button,
+            error_output,
+            comparison_panel,
+        ],
+    )
 queue_kwargs = {}
 if "default_concurrency_limit" in inspect.signature(demo.queue).parameters:
 if __name__ == "__main__":
+    demo.launch(css=CSS)

requirements.txt CHANGED Viewed

@@ -2,6 +2,6 @@ transformers>=4.44.0
 peft>=0.11.0
 accelerate>=0.30.0
 torch
-gradio>=4.0.0
 sqlparse
 huggingface_hub

 peft>=0.11.0
 accelerate>=0.30.0
 torch
+gradio>=6.0.0
 sqlparse
 huggingface_hub

tests/e2e_flow_test.py ADDED Viewed

	@@ -0,0 +1,250 @@

+"""
+End-to-end flow tests for phi3-mini-sql-generator demo.
+Run with: python tests/e2e_flow_test.py
+Model must be loaded first. Call app.load_model(app.FINE_TUNED_MODEL_ID)
+before running these tests.
+"""
+import app
+import types
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+def sql_out(result):
+    return result[4]
+def status(result):
+    return result[7]
+def reset_model_state():
+    app._model = None
+    app._tokenizer = None
+    app._current_model_id = None
+def check_sql(result, expected_fragments, description):
+    """Print and assert SQL output checks."""
+    sql = sql_out(result)
+    status_msg = status(result)
+    ok = True
+    for frag in expected_fragments:
+        if frag not in sql:
+            print(f"  FAIL: missing '{frag}' in output")
+            ok = False
+    if ok:
+        print(f"  OK: {description}")
+        print(f"  SQL: {sql[:200]}")
+    return ok
+# ---------------------------------------------------------------------------
+# Scenario 1: Parser still works (no model call)
+# ---------------------------------------------------------------------------
+def test_scenario1_parser_keeps_working():
+    print("\n=== Scenario 1: Parser — accented columns ===")
+    result = app.generate_response(
+        "criar tabela animal com nome nome cientifico e especie",
+        [], "", None, None
+    )
+    fragments = ["CREATE TABLE animal", "nome TEXT", "cientifico TEXT", "especie TEXT"]
+    return check_sql(result, fragments, "3 columns from Portuguese message")
+# ---------------------------------------------------------------------------
+# Scenario 2: SELECT all
+# ---------------------------------------------------------------------------
+def test_scenario2_select_all():
+    print("\n=== Scenario 2: SELECT all rows ===")
+    schema = app.PRESETS["employees"]
+    result = app.generate_response(
+        "liste todos os funcionarios",
+        [], schema, app.FINE_TUNED_MODEL_KEY, None
+    )
+    sql = sql_out(result)
+    status_msg = status(result)
+    ok = True
+    if "SELECT" not in sql.upper():
+        print(f"  FAIL: no SELECT in output")
+        ok = False
+    if "FROM" not in sql.upper():
+        print(f"  FAIL: no FROM in output")
+        ok = False
+    if ok:
+        print(f"  OK: generated SELECT")
+        print(f"  SQL: {sql}")
+    return ok
+# ---------------------------------------------------------------------------
+# Scenario 3: SELECT with WHERE filter
+# ---------------------------------------------------------------------------
+def test_scenario3_select_with_filter():
+    print("\n=== Scenario 3: SELECT with WHERE ===")
+    schema = app.PRESETS["employees"]
+    result = app.generate_response(
+        "mostre os funcionarios do departamento de vendas",
+        [], schema, app.FINE_TUNED_MODEL_KEY, None
+    )
+    sql = sql_out(result)
+    ok = True
+    if "SELECT" not in sql.upper():
+        print(f"  FAIL: no SELECT")
+        ok = False
+    if "WHERE" not in sql.upper():
+        print(f"  FAIL: no WHERE")
+        ok = False
+    if "department" in sql.lower() or "vendas" in sql.lower():
+        print(f"  OK: WHERE clause present")
+        print(f"  SQL: {sql}")
+    else:
+        print(f"  FAIL: filter condition missing")
+        ok = False
+    return ok
+# ---------------------------------------------------------------------------
+# Scenario 4: Aggregate (COUNT, AVG, GROUP BY)
+# ---------------------------------------------------------------------------
+def test_scenario4_aggregates():
+    print("\n=== Scenario 4: Aggregate query ===")
+    schema = app.PRESETS["employees"]
+    result = app.generate_response(
+        "qual a media de salarios por departamento",
+        [], schema, app.FINE_TUNED_MODEL_KEY, None
+    )
+    sql = sql_out(result)
+    ok = True
+    checks = ["SELECT", "AVG", "GROUP BY"]
+    for c in checks:
+        if c not in sql.upper():
+            print(f"  FAIL: missing '{c}'")
+            ok = False
+    if ok:
+        print(f"  OK: aggregate query generated")
+        print(f"  SQL: {sql}")
+    return ok
+# ---------------------------------------------------------------------------
+# Scenario 5: Natural language SQL (Issue 3)
+# ---------------------------------------------------------------------------
+def test_scenario5_natural_language():
+    print("\n=== Scenario 5: Natural language SQL (Issue 3) ===")
+    schema = app.PRESETS["products"]
+    result = app.generate_response(
+        "me diz qual o produto mais caro",
+        [], schema, app.FINE_TUNED_MODEL_KEY, None
+    )
+    sql = sql_out(result)
+    status_msg = status(result)
+    ok = True
+    if not sql.strip():
+        print(f"  FAIL: no SQL generated — model returned: {status_msg[:100]}")
+        ok = False
+    elif "SELECT" not in sql.upper():
+        print(f"  FAIL: output is not SQL: {sql[:100]}")
+        ok = False
+    else:
+        print(f"  OK: natural language produced SQL")
+        print(f"  SQL: {sql}")
+    return ok
+# ---------------------------------------------------------------------------
+# Scenario 6: Multi-turn flow (create → add → remove → query)
+# ---------------------------------------------------------------------------
+def test_scenario6_multiturn_flow():
+    print("\n=== Scenario 6: Multi-turn schema build + query ===")
+    ok = True
+    # Step 1: Create table
+    r1 = app.generate_response(
+        "crie tabela vendas com id produto quantidade total",
+        [], "", None, None
+    )
+    if not check_sql(r1, ["CREATE TABLE vendas", "id INTEGER", "produto TEXT", "quantidade INTEGER", "total NUMERIC"], "Step 1: CREATE TABLE"):
+        ok = False
+    # Step 2: Add column
+    r2 = app.generate_response("adicione desconto", r1[0], "", None, None)
+    if not check_sql(r2, ["desconto NUMERIC", "CREATE TABLE vendas"], "Step 2: ADD COLUMN"):
+        ok = False
+    # Step 3: Remove column
+    r3 = app.generate_response("remova quantidade", r2[0], "", None, None)
+    sql3 = sql_out(r3)
+    # CORRECT: quantidade should NOT be in SQL (it was removed)
+    if "quantidade" in sql3:
+        print(f"  FAIL: 'quantidade' still in table after remove (regression)")
+        ok = False
+    else:
+        print(f"  OK: Step 3: REMOVE COLUMN - 'quantidade' removed")
+    # Verify remaining columns still exist
+    for col in ["id", "produto", "desconto", "total"]:
+        if col not in sql3:
+            print(f"  FAIL: column '{col}' missing after remove")
+            ok = False
+    # Step 4: Query (model call)
+    final_schema = sql_out(r3)
+    r4 = app.generate_response(
+        "quanto vendemos no total",
+        r3[0], final_schema, app.FINE_TUNED_MODEL_KEY, None
+    )
+    sql4 = sql_out(r4)
+    if "SELECT" not in sql4.upper():
+        print(f"  FAIL: Step 4 no SELECT generated. Status: {status(r4)[:100]}")
+        ok = False
+    else:
+        print(f"  OK: Step 4: model generated SQL from multi-turn context")
+        print(f"  SQL: {sql4}")
+    return ok
+# ---------------------------------------------------------------------------
+# Run all
+# ---------------------------------------------------------------------------
+def run_all():
+    if app._model is None:
+        print("ERROR: model not loaded. Run app.load_model(app.FINE_TUNED_MODEL_ID) first.")
+        return
+    results = {}
+    results["s1_parser"] = test_scenario1_parser_keeps_working()
+    results["s2_select_all"] = test_scenario2_select_all()
+    results["s3_where"] = test_scenario3_select_with_filter()
+    results["s4_aggregates"] = test_scenario4_aggregates()
+    results["s5_natlang"] = test_scenario5_natural_language()
+    results["s6_multiturn"] = test_scenario6_multiturn_flow()
+    print("\n" + "=" * 50)
+    print("SUMMARY")
+    print("=" * 50)
+    passed = sum(1 for v in results.values() if v)
+    total = len(results)
+    for name, result in results.items():
+        mark = "PASS" if result else "FAIL"
+        print(f"  {mark}  {name}")
+    print(f"\n  Total: {passed}/{total} passed")
+    return passed == total
+if __name__ == "__main__":
+    # Check model loaded
+    if app._model is None:
+        print("Model not loaded. Call app.load_model(app.FINE_TUNED_MODEL_ID) then re-run.")
+        print("From python: python -c \"import app; app.load_model(app.FINE_TUNED_MODEL_ID); exec(open('tests/e2e_flow_test.py').read())\"")
+    else:
+        run_all()

tests/test_chatbot_behavior.py ADDED Viewed

	@@ -0,0 +1,672 @@

+import types
+import pytest
+import app
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+def reset_model_state():
+    app._model = None
+    app._tokenizer = None
+    app._current_model_id = None
+def assistant_text(result):
+    return result[0][-1]["content"]
+def sql_output(result):
+    return result[4]
+def status_html(result):
+    return result[7]
+@pytest.fixture(autouse=True)
+def clean_model_state():
+    reset_model_state()
+    yield
+    reset_model_state()
+# ---------------------------------------------------------------------------
+# CREATE TABLE — formas verbais PT/EN
+# ---------------------------------------------------------------------------
+@pytest.mark.parametrize(
+    ("message", "expected"),
+    [
+        (
+            "crie tabela pesquisadores com id nome artigo e curriculo",
+            ["CREATE TABLE pesquisadores", "id INTEGER", "nome TEXT", "artigo TEXT", "curriculo TEXT"],
+        ),
+        (
+            "cria tabela animal com nome tamanho peso especie",
+            ["CREATE TABLE animal", "nome TEXT", "tamanho TEXT", "peso NUMERIC", "especie TEXT"],
+        ),
+        (
+            "faça tabela clientes com id nome email",
+            ["CREATE TABLE clientes", "id INTEGER", "nome TEXT", "email TEXT"],
+        ),
+        (
+            "create table researchers with id, name, articles and cv",
+            ["CREATE TABLE researchers", "id INTEGER", "name TEXT", "articles TEXT", "cv TEXT"],
+        ),
+        (
+            "crie tabela alunos com id int nome text nota numeric",
+            ["CREATE TABLE alunos", "id INTEGER", "nome TEXT", "nota NUMERIC"],
+        ),
+        (
+            "crie tabela pesquisadores com id nome artigo curriculo",
+            ["CREATE TABLE pesquisadores", "id INTEGER", "nome TEXT", "artigo TEXT", "curriculo TEXT"],
+        ),
+    ],
+)
+def test_create_table_without_model(message, expected, monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    result = app.generate_response(message, [], "", None, None)
+    for fragment in expected:
+        assert fragment in sql_output(result), f"missing: {fragment!r}"
+    assert "validator-ok" in result[5]
+    assert "without calling the model" in status_html(result)
+# ---------------------------------------------------------------------------
+# CREATE TABLE — preset ativo como fallback de schema
+# ---------------------------------------------------------------------------
+def test_create_table_from_active_preset(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    result = app.generate_response(
+        "gere esta tabela", [], app.PRESETS["employees"], None, None
+    )
+    assert "CREATE TABLE employees" in sql_output(result)
+    assert "without calling the model" in status_html(result)
+# ---------------------------------------------------------------------------
+# EDIT — add coluna (formas verbais e padrões)
+# ---------------------------------------------------------------------------
+@pytest.mark.parametrize(
+    ("message", "expected_col"),
+    [
+        ("adicione cpf",                   "cpf TEXT"),
+        ("add email",                      "email TEXT"),
+        ("inclua telefone",                "telefone TEXT"),
+        ("acrescente campo bonus numeric", "bonus NUMERIC"),
+        ("adicione: matricula",            "matricula TEXT"),
+    ],
+)
+def test_add_column_variants(message, expected_col, monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    base = app.generate_response(
+        "crie tabela funcionarios com id nome salario", [], "", None, None
+    )
+    result = app.generate_response(message, base[0], "", None, None)
+    assert expected_col in sql_output(result)
+    assert "CREATE TABLE funcionarios" in sql_output(result)
+    assert "without calling the model" in status_html(result)
+# ---------------------------------------------------------------------------
+# EDIT — remove coluna
+# ---------------------------------------------------------------------------
+@pytest.mark.parametrize(
+    ("message", "removed_col"),
+    [
+        ("remova salario",  "salario"),
+        ("remove nome",     "nome"),
+        ("delete salary",   "salary"),
+        ("drop coluna id",  "id"),
+    ],
+)
+def test_remove_column_variants(message, removed_col, monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    base = app.generate_response(
+        "crie tabela funcionarios com id nome salario", [], "", None, None
+    )
+    result = app.generate_response(message, base[0], "", None, None)
+    assert "CREATE TABLE funcionarios" in sql_output(result)
+    assert removed_col not in sql_output(result)
+    assert "validator-ok" in result[5]
+# ---------------------------------------------------------------------------
+# EDIT — "altere" e "mude" (regressão fix is_table_edit_intent)
+# ---------------------------------------------------------------------------
+@pytest.mark.parametrize(
+    "edit_message",
+    [
+        "altere para ter também email",
+        "mude adicionando telefone",
+    ],
+)
+def test_edit_intent_recognizes_pt_conjugations(edit_message, monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    base = app.generate_response(
+        "crie tabela x com id nome", [], "", None, None
+    )
+    result = app.generate_response(edit_message, base[0], "", None, None)
+    assert "CREATE TABLE x" in sql_output(result)
+    assert "without calling the model" in status_html(result)
+# ---------------------------------------------------------------------------
+# EDIT — múltiplos add/remove no mesmo turno
+# ---------------------------------------------------------------------------
+def test_add_multiple_columns(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    base = app.generate_response(
+        "crie tabela pesquisadores com id nome artigo e curriculo", [], "", None, None
+    )
+    result = app.generate_response("add email and phone", base[0], "", None, None)
+    assert "email TEXT" in sql_output(result)
+    assert "phone TEXT" in sql_output(result)
+def test_remove_column(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    base = app.generate_response(
+        "crie tabela pesquisadores com id nome artigo e curriculo", [], "", None, None
+    )
+    added = app.generate_response("adicione cpf", base[0], "", None, None)
+    result = app.generate_response("remover curriculo", added[0], "", None, None)
+    assert "curriculo TEXT" not in sql_output(result)
+    assert "cpf TEXT" in sql_output(result)
+    assert "id INTEGER" in sql_output(result)
+    assert "validator-ok" in result[5]
+# ---------------------------------------------------------------------------
+# EDIT — histórico com diferentes formatos de content
+# ---------------------------------------------------------------------------
+@pytest.mark.parametrize(
+    "history",
+    [
+        [{"role": "assistant", "content": "```sql\nCREATE TABLE pesquisadores (\n    id INTEGER,\n    nome TEXT\n);\n```"}],
+        [{"role": "assistant", "content": [{"text": "```sql\nCREATE TABLE pesquisadores (\n    id INTEGER,\n    nome TEXT\n);\n```"}]}],
+        [{"role": "assistant", "content": "CREATE TABLE pesquisadores (\n    id INTEGER,\n    nome TEXT\n);"}],
+    ],
+)
+def test_edit_from_history_content_shapes(history, monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    result = app.generate_response(
+        "edita ela para ter um novo elemento: cpf", history, "", None, None
+    )
+    assert "CREATE TABLE pesquisadores" in sql_output(result)
+    assert "cpf TEXT" in sql_output(result)
+    assert "id INTEGER" in sql_output(result)
+    assert "validator-ok" in result[5]
+# ---------------------------------------------------------------------------
+# EDIT — com active_schema e histórico vazio
+# ---------------------------------------------------------------------------
+def test_edit_from_active_schema_no_history(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    result = app.generate_response(
+        "adicione bonus", [], app.PRESETS["employees"], None, None
+    )
+    assert "CREATE TABLE employees" in sql_output(result)
+    assert "bonus" in sql_output(result)
+    assert "without calling the model" in status_html(result)
+# ---------------------------------------------------------------------------
+# EDIT — last_create_table_from_history retorna o mais recente
+# ---------------------------------------------------------------------------
+def test_last_create_table_returns_most_recent():
+    history = [
+        {"role": "assistant", "content": "```sql\nCREATE TABLE old (x TEXT);\n```"},
+        {"role": "user",      "content": "adicione id"},
+        {"role": "assistant", "content": "```sql\nCREATE TABLE new (id INTEGER);\n```"},
+    ]
+    result = app.last_create_table_from_history(history)
+    assert "CREATE TABLE new" in result
+    assert "CREATE TABLE old" not in result
+# ---------------------------------------------------------------------------
+# FLUXO COMPLETO multi-turn: create → add → add → remove → intenção SQL
+# ---------------------------------------------------------------------------
+def test_full_schema_build_flow(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    r1 = app.generate_response(
+        "crie tabela produtos com id nome preco", [], "", None, None
+    )
+    assert "CREATE TABLE produtos" in sql_output(r1)
+    assert "preco NUMERIC" in sql_output(r1)
+    r2 = app.generate_response("adicione categoria e estoque", r1[0], "", None, None)
+    assert "categoria TEXT" in sql_output(r2)
+    assert "estoque INTEGER" in sql_output(r2)
+    assert "id INTEGER" in sql_output(r2)
+    r3 = app.generate_response("remova preco", r2[0], "", None, None)
+    assert "preco" not in sql_output(r3)
+    assert "categoria TEXT" in sql_output(r3)
+    r4 = app.generate_response(
+        "qual o produto mais caro?", r3[0], sql_output(r3), None, None
+    )
+    assert "Load a model" in status_html(r4)
+# ---------------------------------------------------------------------------
+# SQL intent routing
+# ---------------------------------------------------------------------------
+def test_sql_prompt_uses_schema_template():
+    prompt = app.build_generation_prompt(
+        app.PRESETS["employees"],
+        "What is the average salary per department?",
+    )
+    assert "CREATE TABLE employees" in prompt
+    assert "<|user|>" in prompt
+    assert "<|assistant|>" in prompt
+def test_sql_prompt_fallback_schema_when_empty():
+    prompt = app.build_generation_prompt("", "select all rows")
+    assert "CREATE TABLE unknown (id INTEGER)" in prompt
+def test_sql_intent_detected():
+    assert app.is_sql_intent("What is the average salary per department?", app.PRESETS["employees"])
+    assert app.is_sql_intent("liste todos os funcionários", app.PRESETS["employees"])
+    assert app.is_sql_intent("mostre os alunos com nota maior que 8", app.PRESETS["students"])
+def test_greeting_not_sql_intent():
+    assert not app.is_sql_intent("oi", app.PRESETS["employees"])
+    assert not app.is_sql_intent("hello", "")
+# ---------------------------------------------------------------------------
+# Output parsing — clean_generation e format_generation_result
+# ---------------------------------------------------------------------------
+@pytest.mark.parametrize(("raw", "expected"), [
+    ("```sql\nSELECT * FROM x\n```", "SELECT * FROM x"),
+    ("SELECT id FROM t<|end|>",      "SELECT id FROM t"),
+    ("SQL: SELECT name FROM t",      "SELECT name FROM t"),
+    ("```\nSELECT 1\n```",           "SELECT 1"),
+])
+def test_clean_generation_strips_artifacts(raw, expected):
+    assert app.clean_generation(raw) == expected
+def test_format_generation_result_sql_path():
+    sql, chat, validator = app.format_generation_result("SELECT * FROM employees")
+    assert sql == "SELECT * FROM employees"
+    assert chat == ""
+    assert "validator-ok" in validator
+def test_format_generation_result_chat_path():
+    sql, chat, validator = app.format_generation_result("I don't know, try again.")
+    assert sql == ""
+    assert "I don't know" in chat
+    assert validator == app.CHAT_VALIDATOR
+# ---------------------------------------------------------------------------
+# validate_sql — starters além de SELECT
+# ---------------------------------------------------------------------------
+@pytest.mark.parametrize("stmt", [
+    "SELECT * FROM employees",
+    "CREATE TABLE t (id INTEGER)",
+    "INSERT INTO t VALUES (1)",
+    "WITH cte AS (SELECT 1) SELECT * FROM cte",
+    "DROP TABLE t",
+    "UPDATE t SET x = 1 WHERE id = 1",
+])
+def test_validate_sql_valid_starters(stmt):
+    assert "validator-ok" in app.validate_sql(stmt)
+def test_validate_sql_garbage_returns_warn():
+    assert "validator-warn" in app.validate_sql("isto nao e sql %$#")
+def test_validate_sql_empty_returns_empty_badge():
+    assert app.validate_sql("") == app.EMPTY_VALIDATOR
+# ---------------------------------------------------------------------------
+# Normalização de tipos explícitos no parser de colunas
+# ---------------------------------------------------------------------------
+@pytest.mark.parametrize(("raw", "expected_type"), [
+    ("price DECIMAL", "NUMERIC"),
+    ("active BOOL",   "BOOLEAN"),
+    ("qty INT",       "INTEGER"),
+    ("score REAL",    "REAL"),
+    # P2 fix: column name matches SQL type keyword (date DATE, int INTEGER)
+    # Parser agora pega o último match como tipo, não o primeiro
+    ("date DATE",     "DATE"),
+    ("int INTEGER",   "INTEGER"),
+    ("name TEXT",     "TEXT"),
+])
+def test_parse_column_explicit_type_normalization(raw, expected_type):
+    parsed = app.parse_column_definition(raw)
+    assert parsed is not None
+    assert parsed[1] == expected_type
+    _, col_type = parsed
+    assert col_type == expected_type
+# ---------------------------------------------------------------------------
+# trim_chat_history
+# ---------------------------------------------------------------------------
+def test_trim_chat_history_caps_at_max_exchanges():
+    history = [
+        {"role": "user" if i % 2 == 0 else "assistant", "content": str(i)}
+        for i in range(30)
+    ]
+    trimmed = app.trim_chat_history(history)
+    assert len(trimmed) == 20
+# ---------------------------------------------------------------------------
+# Errors e estado do modelo
+# ---------------------------------------------------------------------------
+def test_empty_input_returns_error():
+    result = app.generate_response("", [], "", None, None)
+    assert result[0] == []
+    assert "Type a message" in status_html(result)
+def test_malformed_create_table_returns_error():
+    result = app.generate_response("crie tabela", [], "", None, None)
+    assert sql_output(result) == ""
+    assert "CREATE TABLE needs" in status_html(result)
+def test_edit_without_existing_table_returns_error():
+    result = app.generate_response("adicione cpf", [], "", None, None)
+    assert sql_output(result) == ""
+    assert "existing CREATE TABLE" in status_html(result)
+def test_sql_intent_without_model_returns_load_error():
+    result = app.generate_response(
+        "What is the average salary?", [], app.PRESETS["employees"], None, None
+    )
+    assert "Load a model" in status_html(result)
+def test_model_id_mismatch_returns_inconsistency_error():
+    app._model = types.SimpleNamespace(
+        generation_config=types.SimpleNamespace(eos_token_id=0)
+    )
+    app._tokenizer = object()
+    app._current_model_id = app.BASE_MODEL_ID
+    try:
+        result = app.generate_response(
+            "select all", [], app.PRESETS["employees"], app.FINE_TUNED_MODEL_KEY, None
+        )
+        assert "inconsistent" in status_html(result)
+    finally:
+        reset_model_state()
+def test_busy_generation_lock_raises():
+    assert app._model_activity_lock.acquire(blocking=False)
+    try:
+        with pytest.raises(RuntimeError, match="Another model operation"):
+            app._run_generation(object(), {}, {})
+    finally:
+        app._model_activity_lock.release()
+def test_generation_exception_is_rendered_not_raised(monkeypatch):
+    class DummyTokenizer:
+        eos_token_id = 0
+        pad_token_id = 0
+        def __call__(self, prompt, return_tensors):
+            return {"input_ids": types.SimpleNamespace(shape=(1, 1))}
+    monkeypatch.setattr(app, "import_model_runtime", lambda: (object(), None, None, None))
+    monkeypatch.setattr(
+        app, "_run_generation",
+        lambda *a, **k: (_ for _ in ()).throw(RuntimeError("timeout"))
+    )
+    app._model = types.SimpleNamespace(
+        generation_config=types.SimpleNamespace(eos_token_id=0)
+    )
+    app._tokenizer = DummyTokenizer()
+    app._current_model_id = app.FINE_TUNED_MODEL_ID
+    result = app.generate_response(
+        "select all rows", [], "", app.FINE_TUNED_MODEL_KEY, None
+    )
+    assert sql_output(result) == ""
+    assert "Generation failed: RuntimeError: timeout" in status_html(result)
+# ---------------------------------------------------------------------------
+# Fallback para mensagens fora de contexto SQL
+# ---------------------------------------------------------------------------
+def test_off_topic_message_returns_fallback(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    result = app.generate_response("me conte uma piada", [], "", None, None)
+    assert sql_output(result) == ""
+    assert "schema" in assistant_text(result).lower() or "tabela" in assistant_text(result).lower()
+def test_greeting_returns_fallback(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    result = app.generate_response("oi", [], "", None, None)
+    assert sql_output(result) == ""
+# ---------------------------------------------------------------------------
+# Stopwords não viram colunas
+# ---------------------------------------------------------------------------
+def test_stopwords_not_treated_as_columns(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    base = app.generate_response(
+        "crie tabela animal com nome especie", [], "", None, None
+    )
+    result = app.generate_response("add peso", base[0], "", None, None)
+    schema = sql_output(result)
+    assert "peso NUMERIC" in schema
+    assert " to TEXT" not in schema
+    assert " as TEXT" not in schema
+    assert " from TEXT" not in schema
+# ---------------------------------------------------------------------------
+# Rename de coluna
+# ---------------------------------------------------------------------------
+def test_rename_column_basic(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    base = app.generate_response(
+        "crie tabela animal com nome cientifico especie", [], "", None, None
+    )
+    result = app.generate_response(
+        "rename cientifico to nome_cientifico", base[0], "", None, None
+    )
+    schema = sql_output(result)
+    assert "nome_cientifico TEXT" in schema
+    assert "\n    cientifico TEXT" not in schema
+    assert "nome TEXT" in schema
+    assert "especie TEXT" in schema
+    assert "validator-ok" in result[5]
+def test_rename_column_pt(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    base = app.generate_response(
+        "crie tabela produto com id nome preco", [], "", None, None
+    )
+    result = app.generate_response(
+        "renomeie preco para valor", base[0], "", None, None
+    )
+    schema = sql_output(result)
+    assert "valor NUMERIC" in schema
+    assert "preco" not in schema
+# ---------------------------------------------------------------------------
+# Operação composta: add + rename no mesmo prompt
+# ---------------------------------------------------------------------------
+def test_compound_add_and_rename(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    base = app.generate_response(
+        "crie tabela animal com nome cientifico especie", [], "", None, None
+    )
+    result = app.generate_response(
+        "add peso and rename cientifico to nome_cientifico", base[0], "", None, None
+    )
+    schema = sql_output(result)
+    assert "peso" in schema
+    assert "nome_cientifico TEXT" in schema
+    assert "\n    cientifico TEXT" not in schema
+    assert "edit TEXT" not in schema
+    assert " to TEXT" not in schema
+    assert "validator-ok" in result[5]
+def test_compound_add_and_remove(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    base = app.generate_response(
+        "crie tabela funcionarios com id nome salario departamento", [], "", None, None
+    )
+    result = app.generate_response(
+        "add email and remove salario", base[0], "", None, None
+    )
+    schema = sql_output(result)
+    assert "email TEXT" in schema
+    assert "salario" not in schema
+    assert "id INTEGER" in schema
+    assert "nome TEXT" in schema
+# ---------------------------------------------------------------------------
+# Rename preserva tipo original da coluna
+# ---------------------------------------------------------------------------
+def test_rename_preserves_column_type(monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    base = app.generate_response(
+        "crie tabela vendas com id total date", [], "", None, None
+    )
+    result = app.generate_response(
+        "rename total to valor_total", base[0], "", None, None
+    )
+    schema = sql_output(result)
+    assert "valor_total NUMERIC" in schema
+    assert "\n    total NUMERIC" not in schema
+# ---------------------------------------------------------------------------
+# Edit terms → off-topic, not SQL intent (Fix 1: off_topic_patterns blocklist)
+# ---------------------------------------------------------------------------
+@pytest.mark.parametrize(
+    ("message", "schema"),
+    [
+        ("troca tipo por medida", "CREATE TABLE comida (id INTEGER)"),
+        ("renomeia nome para titulo", "CREATE TABLE livro (id INTEGER, nome TEXT)"),
+        ("muda preco para numeric", "CREATE TABLE produto (id INTEGER, preco TEXT)"),
+        ("altera coluna idade para integer", "CREATE TABLE pessoa (id INTEGER, idade TEXT)"),
+    ],
+)
+def test_edit_terms_routed_to_off_topic(message, schema, monkeypatch):
+    monkeypatch.setattr(app, "_run_generation", lambda *a, **k: pytest.fail("model should not run"))
+    # Result must NOT ask to load model — edit terms are off-topic, not SQL intent
+    result = app.generate_response(message, [], schema, None, None)
+    status = status_html(result)
+    assert "Load a model" not in status
+    # Should be either edit-without-table error or safe fallback — not model path
+# ---------------------------------------------------------------------------
+# build_generation_prompt injects last 3 conversation exchanges (Fix 2)
+# ---------------------------------------------------------------------------
+def test_build_generation_prompt_injects_history():
+    schema = "CREATE TABLE comida (id INTEGER, nome TEXT, sabor TEXT)"
+    message = "liste tudo ordenado por nome"
+    chat_history = [
+        {"role": "user", "content": "crie tabela comida com nome sabor"},
+        {"role": "assistant", "content": "```sql\nCREATE TABLE comida (id INTEGER, nome TEXT, sabor TEXT)\n```"},
+        {"role": "user", "content": "adiciona coluna peso"},
+        {"role": "assistant", "content": "```sql\nALTER TABLE comida ADD COLUMN peso NUMERIC\n```"},
+    ]
+    prompt = app.build_generation_prompt(schema, message, chat_history)
+    assert "Previous conversation:" in prompt
+    assert "crie tabela comida" in prompt
+    assert "adiciona coluna peso" in prompt
+def test_build_generation_prompt_no_history_no_context():
+    schema = "CREATE TABLE comida (id INTEGER)"
+    message = "liste todos"
+    prompt = app.build_generation_prompt(schema, message, None)
+    # Should not include conversation context header
+    assert "Previous conversation:" not in prompt
+    # But should still include schema and question
+    assert "comida" in prompt
+    assert "liste todos" in prompt or "liste" in prompt