Instructions to use sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora" --prompt "Once upon a time"
Qwen3.5-4B Text-to-SQL Data Forge LoRA
This repository contains an MLX LoRA adapter for Qwen/Qwen3.5-4B, fine-tuned with the data-forge Text-to-SQL synthetic dataset pipeline.
The goal of this checkpoint is narrow: improve SQLite Text-to-SQL behavior for compact specialist-model experiments. It is not a merged full model; load it as an adapter on top of the base model.
Training Summary
- Base model:
Qwen/Qwen3.5-4B - Fine-tune type: LoRA
- Runtime: MLX LM
- Train rows: 6,633
- Validation rows: 369
- Held-out synthetic test rows: 368
- Iterations: 800
- Max sequence length: 1,024
- LoRA rank: 8
- LoRA scale: 20
- Learning rate: 2e-5
- Prompt masking: enabled
Local Held-Out Result
Measured on the first 100 held-out synthetic Text-to-SQL examples from the same generation run:
| Model | Execution Accuracy | Exact SQL Match | Valid SQL Rate |
|---|---|---|---|
| Qwen/Qwen3.5-4B base | 74% | 11% | 96% |
| Fine-tuned adapter | 78% | 28% | 95% |
| Delta | +4 pts | +17 pts | -1 pt |
The evaluation report is included as eval_synthetic_heldout_100.json.
Spider Dev Smoke Result
Measured on the first 100 examples from the official Spider dev set with SQLite execution against the official Spider databases:
| Model | Execution Accuracy | Correct / Total |
|---|---|---|
| Qwen/Qwen3.5-4B base | 41% | 41 / 100 |
| Fine-tuned adapter | 51% | 51 / 100 |
| Delta | +10 pts | +10 |
The Spider smoke report is included as eval_spider_dev_100.json.
Full Spider Dev Result
Measured on all 1,034 examples from the official Spider dev set with SQLite execution against the official Spider databases. The primary number below uses deterministic SQL extraction from model output before execution, because both the base model and adapter sometimes emit reasoning text before the SQL query.
| Model | Execution Accuracy | Valid SQL Rate | Correct / Total |
|---|---|---|---|
| Qwen/Qwen3.5-4B base | 40.81% | 55.42% | 422 / 1,034 |
| Fine-tuned adapter | 57.83% | 78.24% | 598 / 1,034 |
| Delta | +17.02 pts | +22.82 pts | +176 |
The full Spider dev report is included as eval_spider_dev_full.json.
Full Spider Dev With Execution-Result Voting
The strongest local result uses the same LoRA adapter with multiple deterministic prompt variants and a gold-free selector:
- Generate candidate SQL with the fine-tuned adapter using legacy and hardened prompt styles.
- Include base-model and gap-adapter candidates as additional fallbacks.
- Execute candidates against the Spider SQLite database.
- Select the executable result set with the most candidate agreement, breaking ties by prompt order.
This does not inspect the gold SQL or gold execution result.
| System | Execution Accuracy | Correct / Total |
|---|---|---|
| Qwen/Qwen3.5-4B base | 40.81% | 422 / 1,034 |
| Fine-tuned adapter, single-pass | 57.83% | 598 / 1,034 |
| Fine-tuned adapter, fixed extractor + hardened prompt | 66.44% | 687 / 1,034 |
| Fine-tuned adapter + result voting | 71.47% | 739 / 1,034 |
The result-vote report is included as eval_spider_dev_full_result_vote.json, and the selector metadata is included as selector_report_result_vote.json.
Usage
python -m pip install mlx-lm
python -m mlx_lm.generate \
--model Qwen/Qwen3.5-4B \
--adapter-path sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora \
--prompt "Return SQLite SQL only: ..."
For programmatic use:
from mlx_lm import generate, load
model, tokenizer = load(
"Qwen/Qwen3.5-4B",
adapter_path="sahilsangwan/qwen35-4b-text-to-sql-data-forge-lora",
)
prompt = "Return SQLite SQL only: ..."
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=False))
Limitations
- The full Spider dev score is a local execution-accuracy run with deterministic SQL extraction, not an official leaderboard submission.
- The 71.47% result uses execution-result voting over multiple local candidate generations, not a single greedy generation.
- The adapter was trained for SQLite-style Text-to-SQL, not general chat.
- Manual review/signoff was skipped for this first proof run.
- Public Claude Sonnet numbers use different harnesses/splits and are included only as external reference points.
Provenance
Generated and trained with data-forge from run t2sql_v4pro_10k_012.
Public code repository: https://github.com/sangwansahil/data-forge
Quantized