--- license: apache-2.0 base_model: Qwen/Qwen3.5-27B tags: - text-to-sql - sql - qwen3.5 - fine-tuned - fsdp - nebius datasets: - b-mc2/sql-create-context - gretelai/synthetic_text_to_sql language: - en pipeline_tag: text-generation --- # Qwen3.5-27B-Text2SQL Fine-tuned [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) for **Text-to-SQL** generation. Given a database schema and a natural language question, the model outputs a clean SQL query. ## Key Results | Metric | Base Model | This Model | Improvement | |---|---|---|---| | **Gretel Execution Accuracy (3,492 samples)** | 26.5% | **66.7%** | **+40.2%** | | **Gretel Valid SQL** | 37.8% | **89.4%** | +51.6% | | **Spider Execution Accuracy (1,032, gold std)** | 46.6% | **55.1%** | +8.5% | | **Spider Exact Match** | 0.0% | **22.2%** | +22.2% | | **Spider Keyword Score** | 45.5% | **85.4%** | +39.9% | ### Regression (standard benchmarks via lm-eval-harness) | Benchmark | Base | This Model | Delta | |---|---|---|---| | **MMLU (humanities)** | 81.9% | 83.5% | +1.6% (no regression) | | **MMLU (STEM)** | 87.2% | 86.7% | -0.5% (no regression) | | **MMLU (social sciences)** | 92.0% | 92.0% | 0% (no regression) | | **MMLU (other)** | 87.5% | 87.5% | 0% (no regression) | | **GSM8K (math, strict)** | 60.4% | 35.4% | **-25.0% (regression)** | | **HellaSwag (common sense)** | 79.6% | 84.1% | +4.5% (improved) | | **ARC-Challenge (reasoning)** | 69.3% | 71.3% | +2.0% (improved) | ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "mahernaija/Qwen3.5-27B-Text2SQL", torch_dtype="auto", device_map="auto", trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained( "mahernaija/Qwen3.5-27B-Text2SQL", trust_remote_code=True, ) schema = "CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);" question = "Find all employees in Engineering with salary above 90000." prompt = f"<|im_start|>system\nYou are a SQL expert. Given a database schema and a natural language question, write the correct SQL query.<|im_end|>\n<|im_start|>user\nSchema: {schema}\nQuestion: {question}<|im_end|>\n<|im_start|>assistant\n" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=256, temperature=0) response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) print(response) # SELECT * FROM employees WHERE department = 'Engineering' AND salary > 90000 ``` ### With vLLM ```bash vllm serve mahernaija/Qwen3.5-27B-Text2SQL --tensor-parallel-size 2 ``` ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:8000/v1", api_key="token") response = client.chat.completions.create( model="mahernaija/Qwen3.5-27B-Text2SQL", messages=[ {"role": "system", "content": "You are a SQL expert. Given a database schema and a natural language question, write the correct SQL query."}, {"role": "user", "content": "Schema: CREATE TABLE products (id INT, name TEXT, price REAL);\nQuestion: What are the 3 most expensive products?"}, ], max_tokens=256, temperature=0, ) print(response.choices[0].message.content) # SELECT name FROM products ORDER BY price DESC LIMIT 3 ``` ## Training Details | Parameter | Value | |---|---| | **Base model** | Qwen/Qwen3.5-27B (26.9B params) | | **Method** | Full fine-tuning with FSDP | | **Hardware** | 16× NVIDIA H200 (2 nodes, Nebius AI Cloud) | | **Training time** | 3 hours 6 minutes | | **Dataset** | sql-create-context (78K) + Gretel synthetic (100K) = 178K samples | | **Epochs** | 1 | | **Batch size** | 2 per GPU × 8 grad accum × 16 GPUs = 256 global | | **Learning rate** | 2e-5 (cosine decay, 5% warmup) | | **Sequence length** | 512 (SQL samples P99=373 tokens) | | **Precision** | BF16 | | **GPU utilization** | 98-100% | | **Final train loss** | 0.144 | | **Final eval loss** | 0.176 | ### Data Preprocessing - Schemas stripped of INSERT data (model learns SQL from schema structure, not memorized answers) - Manual chat format (bypasses Qwen3.5 `` tag injection) - Label masking: loss only on SQL output, prompt tokens masked with -100 - Deduplication + contamination check between train and eval splits ### Evaluation **Gretel Execution Accuracy** (gold standard — runs SQL in SQLite, compares results): | Complexity | Base | This Model | |---|---|---| | Basic SQL | 23.8% | **71.4%** | | Aggregation | 18.2% | **54.5%** | | Single JOIN | 25.0% | **75.0%** | | Window functions | 0.0% | **33.3%** | **Spider Benchmark** (1,034 dev questions, public standard): - Exact match: 22.2% - Keyword score: 85.4% ### Known Limitations **Catastrophic forgetting**: Full fine-tuning on SQL-only data caused regression in general capabilities. The model tries to answer non-SQL questions with SQL (56% SQL contamination on general prompts). For production use with mixed tasks, consider: - LoRA fine-tuning instead of full FT - Mixed training data (SQL + general chat) - Using this model only for SQL-specific pipelines ### Regression Test | Category | Base | This Model | SQL Contamination | |---|---|---|---| | General Knowledge | 84% | 44% | 2/5 | | Math | 100% | 40% | 3/5 | | Code | 48% | 47% | 5/5 | | Language | 90% | 78% | 4/5 | | Common Sense | 82% | 93% | 0/5 | ## Architecture - **Architecture**: Qwen3_5ForConditionalGeneration (VLM with text + vision) - **Text backbone**: 64 layers, hidden_size=5120, 24 attention heads, 4 KV heads - **Attention**: Hybrid GDN (linear_attention + full_attention) - **Context**: 262K tokens - **Vision**: Built-in 27-layer ViT (weights from base model, not finetuned) ## Files | File | Description | |---|---| | `model-00001-of-00003.safetensors` | Text backbone weights (shard 1/3) | | `model-00002-of-00003.safetensors` | Text backbone weights (shard 2/3) | | `model-00003-of-00003.safetensors` | Text backbone weights (shard 3/3) | | `model-visual.safetensors` | Vision encoder weights (from base model) | | `config.json` | Full VLM config (required by vLLM) | | `tokenizer.json` | Tokenizer | ## Citation ```bibtex @misc{naija2026qwen35text2sql, title={Qwen3.5-27B-Text2SQL: Fine-tuned Qwen 3.5 for Text-to-SQL}, author={Maher Naija}, year={2026}, url={https://huggingface.co/mahernaija/Qwen3.5-27B-Text2SQL} } ``` ## License Apache 2.0 (same as base model [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B)) ## Acknowledgments - Base model: [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) by Alibaba - Training data: [sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context) + [Gretel synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)