---
license: apache-2.0
base_model: Qwen/Qwen3.5-27B
tags:
  - text-to-sql
  - sql
  - qwen3.5
  - fine-tuned
  - fsdp
  - nebius
datasets:
  - b-mc2/sql-create-context
  - gretelai/synthetic_text_to_sql
language:
  - en
pipeline_tag: text-generation
---

# Qwen3.5-27B-Text2SQL

Fine-tuned [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) for **Text-to-SQL** generation. Given a database schema and a natural language question, the model outputs a clean SQL query.

## Key Results

| Metric | Base Model | This Model | Improvement |
|---|---|---|---|
| **Gretel Execution Accuracy (3,492 samples)** | 26.5% | **66.7%** | **+40.2%** |
| **Gretel Valid SQL** | 37.8% | **89.4%** | +51.6% |
| **Spider Execution Accuracy (1,032, gold std)** | 46.6% | **55.1%** | +8.5% |
| **Spider Exact Match** | 0.0% | **22.2%** | +22.2% |
| **Spider Keyword Score** | 45.5% | **85.4%** | +39.9% |

### Regression (standard benchmarks via lm-eval-harness)

| Benchmark | Base | This Model | Delta |
|---|---|---|---|
| **MMLU (humanities)** | 81.9% | 83.5% | +1.6% (no regression) |
| **MMLU (STEM)** | 87.2% | 86.7% | -0.5% (no regression) |
| **MMLU (social sciences)** | 92.0% | 92.0% | 0% (no regression) |
| **MMLU (other)** | 87.5% | 87.5% | 0% (no regression) |
| **GSM8K (math, strict)** | 60.4% | 35.4% | **-25.0% (regression)** |
| **HellaSwag (common sense)** | 79.6% | 84.1% | +4.5% (improved) |
| **ARC-Challenge (reasoning)** | 69.3% | 71.3% | +2.0% (improved) |

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "mahernaija/Qwen3.5-27B-Text2SQL",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "mahernaija/Qwen3.5-27B-Text2SQL",
    trust_remote_code=True,
)

schema = "CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);"
question = "Find all employees in Engineering with salary above 90000."

prompt = f"<|im_start|>system\nYou are a SQL expert. Given a database schema and a natural language question, write the correct SQL query.<|im_end|>\n<|im_start|>user\nSchema: {schema}\nQuestion: {question}<|im_end|>\n<|im_start|>assistant\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
# SELECT * FROM employees WHERE department = 'Engineering' AND salary > 90000
```

### With vLLM

```bash
vllm serve mahernaija/Qwen3.5-27B-Text2SQL --tensor-parallel-size 2
```

```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")

response = client.chat.completions.create(
    model="mahernaija/Qwen3.5-27B-Text2SQL",
    messages=[
        {"role": "system", "content": "You are a SQL expert. Given a database schema and a natural language question, write the correct SQL query."},
        {"role": "user", "content": "Schema: CREATE TABLE products (id INT, name TEXT, price REAL);\nQuestion: What are the 3 most expensive products?"},
    ],
    max_tokens=256,
    temperature=0,
)
print(response.choices[0].message.content)
# SELECT name FROM products ORDER BY price DESC LIMIT 3
```

## Training Details

| Parameter | Value |
|---|---|
| **Base model** | Qwen/Qwen3.5-27B (26.9B params) |
| **Method** | Full fine-tuning with FSDP |
| **Hardware** | 16× NVIDIA H200 (2 nodes, Nebius AI Cloud) |
| **Training time** | 3 hours 6 minutes |
| **Dataset** | sql-create-context (78K) + Gretel synthetic (100K) = 178K samples |
| **Epochs** | 1 |
| **Batch size** | 2 per GPU × 8 grad accum × 16 GPUs = 256 global |
| **Learning rate** | 2e-5 (cosine decay, 5% warmup) |
| **Sequence length** | 512 (SQL samples P99=373 tokens) |
| **Precision** | BF16 |
| **GPU utilization** | 98-100% |
| **Final train loss** | 0.144 |
| **Final eval loss** | 0.176 |

### Data Preprocessing

- Schemas stripped of INSERT data (model learns SQL from schema structure, not memorized answers)
- Manual chat format (bypasses Qwen3.5 `<think>` tag injection)
- Label masking: loss only on SQL output, prompt tokens masked with -100
- Deduplication + contamination check between train and eval splits

### Evaluation

**Gretel Execution Accuracy** (gold standard — runs SQL in SQLite, compares results):

| Complexity | Base | This Model |
|---|---|---|
| Basic SQL | 23.8% | **71.4%** |
| Aggregation | 18.2% | **54.5%** |
| Single JOIN | 25.0% | **75.0%** |
| Window functions | 0.0% | **33.3%** |

**Spider Benchmark** (1,034 dev questions, public standard):
- Exact match: 22.2%
- Keyword score: 85.4%

### Known Limitations

**Catastrophic forgetting**: Full fine-tuning on SQL-only data caused regression in general capabilities. The model tries to answer non-SQL questions with SQL (56% SQL contamination on general prompts). For production use with mixed tasks, consider:
- LoRA fine-tuning instead of full FT
- Mixed training data (SQL + general chat)
- Using this model only for SQL-specific pipelines

### Regression Test

| Category | Base | This Model | SQL Contamination |
|---|---|---|---|
| General Knowledge | 84% | 44% | 2/5 |
| Math | 100% | 40% | 3/5 |
| Code | 48% | 47% | 5/5 |
| Language | 90% | 78% | 4/5 |
| Common Sense | 82% | 93% | 0/5 |

## Architecture

- **Architecture**: Qwen3_5ForConditionalGeneration (VLM with text + vision)
- **Text backbone**: 64 layers, hidden_size=5120, 24 attention heads, 4 KV heads
- **Attention**: Hybrid GDN (linear_attention + full_attention)
- **Context**: 262K tokens
- **Vision**: Built-in 27-layer ViT (weights from base model, not finetuned)

## Files

| File | Description |
|---|---|
| `model-00001-of-00003.safetensors` | Text backbone weights (shard 1/3) |
| `model-00002-of-00003.safetensors` | Text backbone weights (shard 2/3) |
| `model-00003-of-00003.safetensors` | Text backbone weights (shard 3/3) |
| `model-visual.safetensors` | Vision encoder weights (from base model) |
| `config.json` | Full VLM config (required by vLLM) |
| `tokenizer.json` | Tokenizer |

## Citation

```bibtex
@misc{naija2026qwen35text2sql,
  title={Qwen3.5-27B-Text2SQL: Fine-tuned Qwen 3.5 for Text-to-SQL},
  author={Maher Naija},
  year={2026},
  url={https://huggingface.co/mahernaija/Qwen3.5-27B-Text2SQL}
}
```

## License

Apache 2.0 (same as base model [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B))

## Acknowledgments

- Base model: [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) by Alibaba
- Training data: [sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context) + [Gretel synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)