--- license: mit language: - en base_model: - Qwen/Qwen3-4B-Instruct-2507 tags: - text-to-sql - ambiguity - reinforcement-learning - grpo --- # IntentRL-Ambig-Text2SQL-4B This model is trained to handle **ambiguous text-to-SQL requests** by explicitly reasoning about user intent and producing multiple interpretation–answer pairs rather than silently committing to a single interpretation. It is based on [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507), fine-tuned with **RL (DAPO/GRPO)** using a custom reward that encourages recall (covering more valid interpretations) for ambiguous questions and precision for unambiguous ones. ## Example Given a schema and an ambiguous question: > **Schema:** `CREATE TABLE Jobs (JobID INTEGER PRIMARY KEY, Min_Years INTEGER, Pref_Years INTEGER, Position TEXT, Salary REAL);` > > **Question:** Show the required experience for the best-paid role. The model produces multiple interpretation–answer pairs: 1. **Minimum years of experience required** → `SELECT Min_Years ...` 2. **Preferred years of experience** → `SELECT Pref_Years ...` 3. **Both minimum and preferred years** → `SELECT Min_Years, Pref_Years ...` ## Paper [Reasoning about Intent for Ambiguous Requests](https://arxiv.org/abs/2511.10453) **Authors:** Irina Saparina, Mirella Lapata ## Training Details - **Base model:** Qwen3-4B-Instruct-2507 - **Method:** RL with DAPO/GRPO and a custom recall/precision reward - **Training data:** [Ambrosia](https://ambrosia-benchmark.github.io/) text-to-SQL benchmark - **Ambiguous examples** are upsampled to balance training ## Code Training and evaluation code: [https://github.com/saparina/intentRL](https://github.com/saparina/intentRL) ## Citation ```bibtex @misc{saparina2025reasoningintentambiguousrequests, title={Reasoning about Intent for Ambiguous Requests}, author={Irina Saparina and Mirella Lapata}, year={2025}, eprint={2511.10453}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2511.10453}, } ```