Qwen3.5-4B Text-to-SQL Data Forge LoRA

This repository contains an MLX LoRA adapter for Qwen/Qwen3.5-4B, fine-tuned with the data-forge Text-to-SQL synthetic dataset pipeline.

The goal of this checkpoint is narrow: improve SQLite Text-to-SQL behavior for compact specialist-model experiments. It is not a merged full model; load it as an adapter on top of the base model.

Training Summary

  • Base model: Qwen/Qwen3.5-4B
  • Fine-tune type: LoRA
  • Runtime: MLX LM
  • Train rows: 6,633
  • Validation rows: 369
  • Held-out synthetic test rows: 368
  • Iterations: 800
  • Max sequence length: 1,024
  • LoRA rank: 8
  • LoRA scale: 20
  • Learning rate: 2e-5
  • Prompt masking: enabled

Local Held-Out Result

Measured on the first 100 held-out synthetic Text-to-SQL examples from the same generation run:

Model Execution Accuracy Exact SQL Match Valid SQL Rate
Qwen/Qwen3.5-4B base 74% 11% 96%
Fine-tuned adapter 78% 28% 95%
Delta +4 pts +17 pts -1 pt

The evaluation report is included as eval_synthetic_heldout_100.json.

Spider Dev Smoke Result

Measured on the first 100 examples from the official Spider dev set with SQLite execution against the official Spider databases:

Model Execution Accuracy Correct / Total
Qwen/Qwen3.5-4B base 41% 41 / 100
Fine-tuned adapter 51% 51 / 100
Delta +10 pts +10

The Spider smoke report is included as eval_spider_dev_100.json.

Full Spider Dev Result

Measured on all 1,034 examples from the official Spider dev set with SQLite execution against the official Spider databases. The primary number below uses deterministic SQL extraction from model output before execution, because both the base model and adapter sometimes emit reasoning text before the SQL query.

Model Execution Accuracy Valid SQL Rate Correct / Total
Qwen/Qwen3.5-4B base 40.81% 55.42% 422 / 1,034
Fine-tuned adapter 57.83% 78.24% 598 / 1,034
Delta +17.02 pts +22.82 pts +176

The full Spider dev report is included as eval_spider_dev_full.json.

Full Spider Dev With Execution-Result Voting

The strongest local result uses the same LoRA adapter with multiple deterministic prompt variants and a gold-free selector:

  1. Generate candidate SQL with the fine-tuned adapter using legacy and hardened prompt styles.
  2. Include base-model and gap-adapter candidates as additional fallbacks.
  3. Execute candidates against the Spider SQLite database.
  4. Select the executable result set with the most candidate agreement, breaking ties by prompt order.

This does not inspect the gold SQL or gold execution result.

System Execution Accuracy Correct / Total
Qwen/Qwen3.5-4B base 40.81% 422 / 1,034
Fine-tuned adapter, single-pass 57.83% 598 / 1,034
Fine-tuned adapter, fixed extractor + hardened prompt 66.44% 687 / 1,034
Fine-tuned adapter + result voting 71.47% 739 / 1,034

The result-vote report is included as eval_spider_dev_full_result_vote.json, and the selector metadata is included as selector_report_result_vote.json.

Usage

python -m pip install mlx-lm
python -m mlx_lm.generate \
  --model Qwen/Qwen3.5-4B \
  --adapter-path sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora \
  --prompt "Return SQLite SQL only: ..."

For programmatic use:

from mlx_lm import generate, load

model, tokenizer = load(
    "Qwen/Qwen3.5-4B",
    adapter_path="sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora",
)

prompt = "Return SQLite SQL only: ..."
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=False))

Limitations

  • The full Spider dev score is a local execution-accuracy run with deterministic SQL extraction, not an official leaderboard submission.
  • The 71.47% result uses execution-result voting over multiple local candidate generations, not a single greedy generation.
  • The adapter was trained for SQLite-style Text-to-SQL, not general chat.
  • Manual review/signoff was skipped for this first proof run.
  • Public Claude Sonnet numbers use different harnesses/splits and are included only as external reference points.

Provenance

Generated and trained with data-forge from run t2sql_v4pro_10k_012.

Public code repository: https://github.com/sangwansahil/data-forge

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora

Finetuned
Qwen/Qwen3.5-4B
Adapter
(232)
this model