irisaparina
/

IntentRL-Ambig-Text2SQL-4B

@@ -1,69 +1,62 @@
 ---
-base_model: Qwen/Qwen3-4B-Instruct-2507
-datasets: ambrosia
-library_name: transformers
 tags:
-- generated_from_trainer
-- trl
-- open-r1
 - grpo
-licence: license
 ---
-# Model Card for None
-This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) on the [ambrosia](https://huggingface.co/datasets/ambrosia) dataset.
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="None", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
-```
-## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/saparina/cscsql_ambrosia_grpo/runs/mbpj7wgz)
-This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
-### Framework versions
-- TRL: 0.23.0
-- Transformers: 4.56.2
-- Pytorch: 2.8.0
-- Datasets: 4.1.1
-- Tokenizers: 0.22.1
-## Citations
-Cite GRPO as:
-```bibtex
-@article{shao2024deepseekmath,
-    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
-    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
-    year         = 2024,
-    eprint       = {arXiv:2402.03300},
-}
-```
-Cite TRL as:
 ```bibtex
-@misc{vonwerra2022trl,
-	title        = {{TRL: Transformer Reinforcement Learning}},
-	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
-	year         = 2020,
-	journal      = {GitHub repository},
-	publisher    = {GitHub},
-	howpublished = {\url{https://github.com/huggingface/trl}}
 }
 ```

 ---
+license: mit
+language:
+- en
+base_model:
+- Qwen/Qwen3-4B-Instruct-2507
 tags:
+- text-to-sql
+- ambiguity
+- reinforcement-learning
 - grpo
 ---
+# IntentRL-Ambig-Text2SQL-4B
+This model is trained to handle **ambiguous text-to-SQL requests** by explicitly reasoning about user intent and producing multiple interpretation–answer pairs rather than silently committing to a single interpretation.
+It is based on [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507), fine-tuned with **RL (DAPO/GRPO)** using a custom reward that encourages recall (covering more valid interpretations) for ambiguous questions and precision for unambiguous ones.
+## Example
+Given a schema and an ambiguous question:
+> **Schema:** `CREATE TABLE Jobs (JobID INTEGER PRIMARY KEY, Min_Years INTEGER, Pref_Years INTEGER, Position TEXT, Salary REAL);`
+>
+> **Question:** Show the required experience for the best-paid role.
+The model produces multiple interpretation–answer pairs:
+1. **Minimum years of experience required** → `SELECT Min_Years ...`
+2. **Preferred years of experience** → `SELECT Pref_Years ...`
+3. **Both minimum and preferred years** → `SELECT Min_Years, Pref_Years ...`
+## Paper
+[Reasoning about Intent for Ambiguous Requests](https://arxiv.org/abs/2511.10453)
+**Authors:** Irina Saparina, Mirella Lapata
+## Training Details
+- **Base model:** Qwen3-4B-Instruct-2507
+- **Method:** RL with DAPO/GRPO and a custom recall/precision reward
+- **Training data:** [Ambrosia](https://ambrosia-benchmark.github.io/) text-to-SQL benchmark
+- **Ambiguous examples** are upsampled to balance training
+## Code
+Training and evaluation code: [https://github.com/saparina/intentRL](https://github.com/saparina/intentRL)
+## Citation
 ```bibtex
+@misc{saparina2025reasoningintentambiguousrequests,
+      title={Reasoning about Intent for Ambiguous Requests},
+      author={Irina Saparina and Mirella Lapata},
+      year={2025},
+      eprint={2511.10453},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2511.10453},
 }
 ```