|
|
--- |
|
|
base_model: Qwen/Qwen3-4B |
|
|
library_name: transformers |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
- open-r1 |
|
|
- Text2SQL |
|
|
- Reasoning |
|
|
licence: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Model Information |
|
|
|
|
|
This model is the reasoning model for the Text-to-SQL task introduced in [Think2SQL: Blueprinting Reward Density and Advantage Scaling for Effective Text-to-SQL Reasoning]() |
|
|
|
|
|
|
|
|
This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) with thinking disabled on the [BIRD](https://bird-bench.github.io/) dataset. |
|
|
It has been trained using [TRL](https://github.com/huggingface/trl). |
|
|
|
|
|
|
|
|
|
|
|
## Quick start |
|
|
|
|
|
The best model performance is given with its System and User prompts. |
|
|
The model is intended to be used with three inputs: question, evidence, and the database schema. |
|
|
|
|
|
|
|
|
Required `transformers > 4.51.0` to have Qwen3. Make sure to update your transformers installation via `pip install --upgrade transformers`. |
|
|
|
|
|
```python |
|
|
import transformers |
|
|
import torch |
|
|
model_id = "anonymous-2321/Think2SQL-4B" |
|
|
pipeline = transformers.pipeline( |
|
|
"text-generation", |
|
|
model=model_id, |
|
|
model_kwargs={"torch_dtype": torch.bfloat16}, |
|
|
device_map="auto", |
|
|
) |
|
|
|
|
|
system_message =""" |
|
|
You are a data science expert that provides well-reasoned and detailed responses. Your task is to understand the schema and generate a valid SQL query to answer the question. |
|
|
You first think about the reasoning process as an internal monologue and then provide the user with the answer. |
|
|
Respond in the following format: |
|
|
<reasoning> |
|
|
... |
|
|
</reasoning> |
|
|
<answer> |
|
|
... |
|
|
</answer> |
|
|
""".strip() |
|
|
|
|
|
user_message = """ |
|
|
Answer the following question with the SQL code. Use the piece of evidence and base your answer on the database schema. |
|
|
Given the question, the evidence and the database schema, return in the <answer> tags only the SQL script that addresses the question. |
|
|
|
|
|
Database Engine: |
|
|
SQLite |
|
|
|
|
|
Question: |
|
|
Return the product name, sorted alphabetically and by price in descending order. |
|
|
|
|
|
|
|
|
Evidence: |
|
|
|
|
|
|
|
|
Database Schema: |
|
|
CREATE TABLE products ( |
|
|
id INTEGER PRIMARY KEY, |
|
|
name TEXT NOT NULL, |
|
|
price REAL NOT NULL |
|
|
); |
|
|
|
|
|
CREATE TABLE customers ( |
|
|
id INTEGER PRIMARY KEY, |
|
|
name TEXT NOT NULL, |
|
|
email TEXT NOT NULL |
|
|
); |
|
|
""" |
|
|
|
|
|
|
|
|
messages = [ |
|
|
{"role": "system", "content": system_message}, |
|
|
{"role": "user", "content": user_message}, |
|
|
] |
|
|
|
|
|
outputs = pipeline( |
|
|
messages, |
|
|
max_new_tokens=4096, |
|
|
temperature=0.6, |
|
|
top_p=0.95, |
|
|
top_k=20 |
|
|
) |
|
|
print(outputs[0]["generated_text"][-1]) |
|
|
``` |
|
|
|
|
|
## 📖 Overview |
|
|
Think2SQL is a systematic study on injecting reasoning capabilities into Text-to-SQL through Reinforcement Learning with Verifiable Rewards (RLVR). We uncover the critical interplay between reward density, advantage scaling, and model capacity, proposing novel execution-guided dense rewards and optimal scaling strategies. Our 4B-parameter model achieves reasoning capabilities competitive with state-of-the-art models, while providing a comprehensive analysis for optimizing Text-to-SQL reasoning under computational constraints. |
|
|
|
|
|
**Key Contributions:** |
|
|
- Execution-guided dense reward function that outperforms binary signals |
|
|
- Analysis of advantage scaling mechanics for models of different sizes |
|
|
- Evaluation of cold start effects and supervised fine-tuning impact |
|
|
- Pareto frontier mapping for training efficiency optimization |