Qwen3.5-4B Text-to-SQL Data Forge LoRA

This repository contains an MLX LoRA adapter for Qwen/Qwen3.5-4B, fine-tuned with the data-forge Text-to-SQL synthetic dataset pipeline.

The goal of this checkpoint is narrow: improve SQLite Text-to-SQL behavior for compact specialist-model experiments. It is not a merged full model; load it as an adapter on top of the base model.

Training Summary

Base model: Qwen/Qwen3.5-4B
Fine-tune type: LoRA
Runtime: MLX LM
Train rows: 6,633
Validation rows: 369
Held-out synthetic test rows: 368
Iterations: 800
Max sequence length: 1,024
LoRA rank: 8
LoRA scale: 20
Learning rate: 2e-5
Prompt masking: enabled

Local Held-Out Result

Measured on the first 100 held-out synthetic Text-to-SQL examples from the same generation run:

Model	Execution Accuracy	Exact SQL Match	Valid SQL Rate
Qwen/Qwen3.5-4B base	74%	11%	96%
Fine-tuned adapter	78%	28%	95%
Delta	+4 pts	+17 pts	-1 pt

The evaluation report is included as eval_synthetic_heldout_100.json.

Spider Dev Smoke Result

Measured on the first 100 examples from the official Spider dev set with SQLite execution against the official Spider databases:

Model	Execution Accuracy	Correct / Total
Qwen/Qwen3.5-4B base	41%	41 / 100
Fine-tuned adapter	51%	51 / 100
Delta	+10 pts	+10

The Spider smoke report is included as eval_spider_dev_100.json.

Full Spider Dev Result

Measured on all 1,034 examples from the official Spider dev set with SQLite execution against the official Spider databases. The primary number below uses deterministic SQL extraction from model output before execution, because both the base model and adapter sometimes emit reasoning text before the SQL query.

Model	Execution Accuracy	Valid SQL Rate	Correct / Total
Qwen/Qwen3.5-4B base	40.81%	55.42%	422 / 1,034
Fine-tuned adapter	57.83%	78.24%	598 / 1,034
Delta	+17.02 pts	+22.82 pts	+176

The full Spider dev report is included as eval_spider_dev_full.json.

Full Spider Dev With Execution-Result Voting

The strongest local result uses the same LoRA adapter with multiple deterministic prompt variants and a gold-free selector:

Generate candidate SQL with the fine-tuned adapter using legacy and hardened prompt styles.
Include base-model and gap-adapter candidates as additional fallbacks.
Execute candidates against the Spider SQLite database.
Select the executable result set with the most candidate agreement, breaking ties by prompt order.

This does not inspect the gold SQL or gold execution result.

System	Execution Accuracy	Correct / Total
Qwen/Qwen3.5-4B base	40.81%	422 / 1,034
Fine-tuned adapter, single-pass	57.83%	598 / 1,034
Fine-tuned adapter, fixed extractor + hardened prompt	66.44%	687 / 1,034
Fine-tuned adapter + result voting	71.47%	739 / 1,034

The result-vote report is included as eval_spider_dev_full_result_vote.json, and the selector metadata is included as selector_report_result_vote.json.

Usage

python -m pip install mlx-lm
python -m mlx_lm.generate \
  --model Qwen/Qwen3.5-4B \
  --adapter-path sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora \
  --prompt "Return SQLite SQL only: ..."

For programmatic use:

from mlx_lm import generate, load

model, tokenizer = load(
    "Qwen/Qwen3.5-4B",
    adapter_path="sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora",
)

prompt = "Return SQLite SQL only: ..."
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=False))

Limitations

The full Spider dev score is a local execution-accuracy run with deterministic SQL extraction, not an official leaderboard submission.
The 71.47% result uses execution-result voting over multiple local candidate generations, not a single greedy generation.
The adapter was trained for SQLite-style Text-to-SQL, not general chat.
Manual review/signoff was skipped for this first proof run.
Public Claude Sonnet numbers use different harnesses/splits and are included only as external reference points.

Provenance

Generated and trained with data-forge from run t2sql_v4pro_10k_012.

Public code repository: https://github.com/sangwansahil/data-forge

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Adapter

(232)

this model