irisaparina commited on
Commit
10d6f82
·
verified ·
1 Parent(s): 5280ebb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -46
README.md CHANGED
@@ -1,69 +1,62 @@
1
  ---
2
- base_model: Qwen/Qwen3-4B-Instruct-2507
3
- datasets: ambrosia
4
- library_name: transformers
 
 
5
  tags:
6
- - generated_from_trainer
7
- - trl
8
- - open-r1
9
  - grpo
10
- licence: license
11
  ---
12
 
13
- # Model Card for None
14
 
15
- This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) on the [ambrosia](https://huggingface.co/datasets/ambrosia) dataset.
16
- It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
- ## Quick start
19
 
20
- ```python
21
- from transformers import pipeline
22
 
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
- generator = pipeline("text-generation", model="None", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
- print(output["generated_text"])
27
- ```
28
 
29
- ## Training procedure
 
 
30
 
31
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/saparina/cscsql_ambrosia_grpo/runs/mbpj7wgz)
32
 
 
 
 
33
 
34
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
35
 
36
- ### Framework versions
37
 
38
- - TRL: 0.23.0
39
- - Transformers: 4.56.2
40
- - Pytorch: 2.8.0
41
- - Datasets: 4.1.1
42
- - Tokenizers: 0.22.1
43
 
44
- ## Citations
45
 
46
- Cite GRPO as:
 
 
 
47
 
48
- ```bibtex
49
- @article{shao2024deepseekmath,
50
- title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
51
- author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
52
- year = 2024,
53
- eprint = {arXiv:2402.03300},
54
- }
55
 
56
- ```
57
 
58
- Cite TRL as:
59
-
60
  ```bibtex
61
- @misc{vonwerra2022trl,
62
- title = {{TRL: Transformer Reinforcement Learning}},
63
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
64
- year = 2020,
65
- journal = {GitHub repository},
66
- publisher = {GitHub},
67
- howpublished = {\url{https://github.com/huggingface/trl}}
 
68
  }
69
  ```
 
1
  ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen3-4B-Instruct-2507
7
  tags:
8
+ - text-to-sql
9
+ - ambiguity
10
+ - reinforcement-learning
11
  - grpo
 
12
  ---
13
 
14
+ # IntentRL-Ambig-Text2SQL-4B
15
 
16
+ This model is trained to handle **ambiguous text-to-SQL requests** by explicitly reasoning about user intent and producing multiple interpretation–answer pairs rather than silently committing to a single interpretation.
 
17
 
18
+ It is based on [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507), fine-tuned with **RL (DAPO/GRPO)** using a custom reward that encourages recall (covering more valid interpretations) for ambiguous questions and precision for unambiguous ones.
19
 
20
+ ## Example
 
21
 
22
+ Given a schema and an ambiguous question:
 
 
 
 
23
 
24
+ > **Schema:** `CREATE TABLE Jobs (JobID INTEGER PRIMARY KEY, Min_Years INTEGER, Pref_Years INTEGER, Position TEXT, Salary REAL);`
25
+ >
26
+ > **Question:** Show the required experience for the best-paid role.
27
 
28
+ The model produces multiple interpretation–answer pairs:
29
 
30
+ 1. **Minimum years of experience required** → `SELECT Min_Years ...`
31
+ 2. **Preferred years of experience** → `SELECT Pref_Years ...`
32
+ 3. **Both minimum and preferred years** → `SELECT Min_Years, Pref_Years ...`
33
 
34
+ ## Paper
35
 
36
+ [Reasoning about Intent for Ambiguous Requests](https://arxiv.org/abs/2511.10453)
37
 
38
+ **Authors:** Irina Saparina, Mirella Lapata
 
 
 
 
39
 
40
+ ## Training Details
41
 
42
+ - **Base model:** Qwen3-4B-Instruct-2507
43
+ - **Method:** RL with DAPO/GRPO and a custom recall/precision reward
44
+ - **Training data:** [Ambrosia](https://ambrosia-benchmark.github.io/) text-to-SQL benchmark
45
+ - **Ambiguous examples** are upsampled to balance training
46
 
47
+ ## Code
 
 
 
 
 
 
48
 
49
+ Training and evaluation code: [https://github.com/saparina/intentRL](https://github.com/saparina/intentRL)
50
 
51
+ ## Citation
 
52
  ```bibtex
53
+ @misc{saparina2025reasoningintentambiguousrequests,
54
+ title={Reasoning about Intent for Ambiguous Requests},
55
+ author={Irina Saparina and Mirella Lapata},
56
+ year={2025},
57
+ eprint={2511.10453},
58
+ archivePrefix={arXiv},
59
+ primaryClass={cs.CL},
60
+ url={https://arxiv.org/abs/2511.10453},
61
  }
62
  ```