--- base_model: - griffith-bigdata/Qwen-2.5-Coder-0.5B-SQL-Writer license: apache-2.0 language: - en tags: - text-to-sql - spider - grpo - finer-sql - code library_name: transformers pipeline_tag: text-generation --- # FINER-SQL-0.5B-Spider A small but capable 0.5 B-parameter Text-to-SQL model fine-tuned from [`griffith-bigdata/Qwen-2.5-Coder-0.5B-SQL-Writer`](https://huggingface.co/griffith-bigdata/Qwen-2.5-Coder-0.5B-SQL-Writer) with GRPO + the FINER-SQL dense rewards (Memory + Atomic). ✅ **75.0% Execution Accuracy on Spider Dev** (n=30, value-aware voting). Runs on a 4-8 GB GPU. 📄 See other models: https://huggingface.co/collections/griffith-bigdata/finer-sql 📄 GitHub: https://github.com/thanhdath/finer-sql/tree/main --- ## FINER-SQL Model Family — Comparison Across All Sizes | Model | Params | BIRD Dev (n=30, vav) | Spider Dev (n=30, vav, +agg_hint) | |-------|--------|---------------------|----------------------------------| | [FINER-SQL-3B-BIRD](https://huggingface.co/griffith-bigdata/FINER-SQL-3B-BIRD) | 3 B | **67.54%** ✅ | 83.8% | | [FINER-SQL-3B-Spider](https://huggingface.co/griffith-bigdata/FINER-SQL-3B-Spider) | 3 B | 63.04% | **85.10%** ✅ | | [FINER-SQL-0.5B-BIRD](https://huggingface.co/griffith-bigdata/FINER-SQL-0.5B-BIRD) | 0.5 B | **50.85%** ✅ | 68.6% | | **FINER-SQL-0.5B-Spider** *(this model)* | 0.5 B | TBD | **75.0%** ✅ | The 0.5 B Spider model is **6.4 pp better** than the 0.5 B BIRD model on Spider Dev — confirming dataset-specific specialisation matters even at small scales. --- ## Inference ### Quick start (vLLM) ```python from vllm import LLM, SamplingParams llm = LLM( model="griffith-bigdata/FINER-SQL-0.5B-Spider", dtype="bfloat16", max_model_len=4096, gpu_memory_utilization=0.7, ) system_prompt = """You are a meticulous SQL expert. Generate a single, correct SQL query for the user question and the provided database schema. Follow this exact response format: Rules: - Output exactly one SQL statement. - The SQL must be executable on SQLite. - Do not include any explanatory text. - Output one SQL statement only. Do not include any extra text, tags, or code fences.""" sampling = SamplingParams(n=30, temperature=1.0, max_tokens=2048) messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": f"Database Schema:\n{schema}\n\nQuestion: {question}"}, ] output = llm.chat(messages, sampling) candidate_sqls = [c.text.split("")[-1].strip() for c in output[0].outputs] # Apply majority voting (vav) — see GitHub repo ``` ### Recommended evaluation pipeline 1. Generate n=30 candidates with temperature=1.0 2. Execute each candidate; group results 3. Pick from the largest non-empty success group (value-aware voting, "vav") 4. Score with the official Spider evaluator (`test_suite_sql_eval`) This pipeline gives **75.0% Spider Dev EX** (75.44% MV). --- ## Detailed Spider Dev results (n=30, vav) | Hardness | Count | Execution Accuracy | |----------|-------|--------------------| | Easy | 248 | 91.9% | | Medium | 446 | 82.5% | | Hard | 174 | 62.6% | | Extra Hard | 166 | 42.8% | | **All** | **1034** | **75.0%** | Recall@30: **85.11%** (any-correct rate among 30 candidates). --- ## Training | Parameter | Value | |-----------|-------| | Base model | `griffith-bigdata/Qwen-2.5-Coder-0.5B-SQL-Writer` | | Algorithm | GRPO | | Train data | Spider train (8,659 samples) | | Total steps | 2000 (this checkpoint = 2000) | | Learning rate | 8e-6 | | Num generations per prompt | 32 | | Gradient accumulation | 32 | | Max completion length | 2048 | | Max prompt length | 1500 | | Temperature (rollout) | 1.0 | | Selection during eval | vav (value-aware voting) | | Rewards | Execution + Atomic + Memory + Format | --- ## License Inherits the base model's license (Apache 2.0). --- ## Citation ```bibtex @article{finer-sql-2026, title = {FINER-SQL: Fine-grained reasoning rewards for small Text-to-SQL models}, author = {Thanh Dat and others}, year = {2026}, } ```