Struct-SQL / README.md

KhushbooThaker

Update README.md

843b0b2 verified about 1 month ago

preview code

raw

history blame

2.99 kB

metadata

license: cc-by-4.0
metrics:
  - exact_match

language: - en pipeline_tag: text-generation tags: - text-to-sql - knowledge-distillation - struct-sql - qwen - generated_from_trainer base_model: Qwen/Qwen3-4B-Instruct-2507 dataset: - bird-bench/bird arxiv: 2512.17053

Struct-SQL-8B: Knowledge Distillation with Structured Chain-of-Thought

Struct-SQL is a specialized Text-to-SQL model based on Qwen3-4B-Instruct. It was trained using a novel Knowledge Distillation (KD) framework that transfers structured reasoning (Query Execution Plans) from a state-of-the-art teacher LLM (GPT-4o) to a smaller student model.

Unlike standard distillation methods that rely on unstructured Chain-of-Thought (CoT), Struct-SQL learns to generate a formal, logical blueprint (a query plan) before generating the final SQL. This approach significantly reduces syntactic errors and schema hallucinations.

📄 Paper: Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL

Performance

On the BIRD mini-dev benchmark, Struct-SQL achieves an Execution Accuracy (EX) of 45.0%, outperforming standard unstructured CoT distillation baselines by 8.1 points.

Model	Distillation Method	Execution Accuracy (EX)
Struct-SQL (Ours)	Structured QP-CoT	45.0%
ReasonSQL Baseline	Unstructured CoT	36.9%
FN-Gold Baseline	No Reasoning (SQL Only)	34.3%
Base Student (Zero-shot)	None	17.0%

Methodology

The model was trained on a curated dataset of 1,000 samples generated by GPT-4o. The training data consists of:

Input: Natural Language Question + Database Schema.
Output: A structured Query Execution Plan (Reasoning) + Final SQL Query.

By forcing the model to explicitly plan the query execution (e.g., "Scan Table", "Filter by...", "Join with..."), the model learns the logical structure of SQL generation rather than just memorizing patterns.

Usage

You can use this model with the transformers library. It expects the input to be formatted with a specific system prompt or structure if you want to elicit the query plan.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "craterlabs/Struct-SQL"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    device_map="auto"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))



# Citation
### If you use this model or method in your research, please cite our paper:

@article{thaker2025knowledge,
  title={Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL},
  author={Thaker, Khushboo and Bresler, Yony},
  journal={arXiv preprint arXiv:2512.17053},
  year={2025}
}