anonymous-2321 commited on
Commit
c89807c
·
verified ·
1 Parent(s): c572cc0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -0
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-8B
3
+ library_name: transformers
4
+ tags:
5
+ - generated_from_trainer
6
+ - open-r1
7
+ - Text2SQL
8
+ - Reasoning
9
+ licence: apache-2.0
10
+ language:
11
+ - en
12
+ ---
13
+
14
+ # Model Information
15
+
16
+ This model is the reasoning model for the Text-to-SQL task introduced in [Think2SQL: Blueprinting Reward Density and Advantage Scaling for Effective Text-to-SQL Reasoning]()
17
+
18
+
19
+ This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) with thinking disabled on the [BIRD](https://bird-bench.github.io/) dataset.
20
+ It has been trained using [TRL](https://github.com/huggingface/trl).
21
+
22
+
23
+
24
+ ## Quick start
25
+
26
+ The best model performance is given with its System and User prompts.
27
+ The model is intended to be used with three inputs: question, evidence, and the database schema.
28
+
29
+
30
+ Required `transformers > 4.51.0` to have Qwen3. Make sure to update your transformers installation via `pip install --upgrade transformers`.
31
+
32
+ ```python
33
+ import transformers
34
+ import torch
35
+ model_id = "anonymous-2321/Think2SQL-8B"
36
+ pipeline = transformers.pipeline(
37
+ "text-generation",
38
+ model=model_id,
39
+ model_kwargs={"torch_dtype": torch.bfloat16},
40
+ device_map="auto",
41
+ )
42
+
43
+ system_message ="""
44
+ You are a data science expert that provides well-reasoned and detailed responses. Your task is to understand the schema and generate a valid SQL query to answer the question.
45
+ You first think about the reasoning process as an internal monologue and then provide the user with the answer.
46
+ Respond in the following format:
47
+ <reasoning>
48
+ ...
49
+ </reasoning>
50
+ <answer>
51
+ ...
52
+ </answer>
53
+ """.strip()
54
+
55
+ user_message = """
56
+ Answer the following question with the SQL code. Use the piece of evidence and base your answer on the database schema.
57
+ Given the question, the evidence and the database schema, return in the <answer> tags only the SQL script that addresses the question.
58
+
59
+ Database Engine:
60
+ SQLite
61
+
62
+ Question:
63
+ Return the product name, sorted alphabetically and by price in descending order.
64
+
65
+
66
+ Evidence:
67
+
68
+
69
+ Database Schema:
70
+ CREATE TABLE products (
71
+ id INTEGER PRIMARY KEY,
72
+ name TEXT NOT NULL,
73
+ price REAL NOT NULL
74
+ );
75
+
76
+ CREATE TABLE customers (
77
+ id INTEGER PRIMARY KEY,
78
+ name TEXT NOT NULL,
79
+ email TEXT NOT NULL
80
+ );
81
+ """
82
+
83
+
84
+ messages = [
85
+ {"role": "system", "content": system_message},
86
+ {"role": "user", "content": user_message},
87
+ ]
88
+
89
+ outputs = pipeline(
90
+ messages,
91
+ max_new_tokens=4096,
92
+ temperature=0.6,
93
+ top_p=0.95,
94
+ top_k=20
95
+ )
96
+ print(outputs[0]["generated_text"][-1])
97
+ ```
98
+
99
+ ## 📖 Overview
100
+ Think2SQL is a systematic study on injecting reasoning capabilities into Text-to-SQL through Reinforcement Learning with Verifiable Rewards (RLVR). We uncover the critical interplay between reward density, advantage scaling, and model capacity, proposing novel execution-guided dense rewards and optimal scaling strategies. Our 4B-parameter model achieves reasoning capabilities competitive with state-of-the-art models, while providing a comprehensive analysis for optimizing Text-to-SQL reasoning under computational constraints.
101
+
102
+ **Key Contributions:**
103
+ - Execution-guided dense reward function that outperforms binary signals
104
+ - Analysis of advantage scaling mechanics for models of different sizes
105
+ - Evaluation of cold start effects and supervised fine-tuning impact
106
+ - Pareto frontier mapping for training efficiency optimization