IntentRL-Ambig-CoQA-4B
This model is trained to handle ambiguous conversational questions by explicitly reasoning about user intent and producing multiple interpretation–answer pairs rather than silently committing to a single interpretation.
It is based on Qwen/Qwen3-4B-Instruct-2507, fine-tuned with RL (DAPO) using a custom reward that encourages recall (covering more valid interpretations) for ambiguous questions and precision for unambiguous ones.
Example
Given a passage and conversational context:
Hilary Duff says her new album is ... "a lot heavier and a lot darker" because of the separation from her husband, Mike Comrie. Duff married Comrie ... in 2010 after dating for three years. Their son, Luca, was born in 2012...
- How long were they married before they had a child? — 2 years
- What is his name?
The model produces multiple interpretation–answer pairs:
- The question refers to the husband's name → His name is Mike Comrie.
- The question refers to the son's name → His name is Luca.
Paper
Reasoning about Intent for Ambiguous Requests
Authors: Irina Saparina, Mirella Lapata
Training Details
- Base model: Qwen3-4B-Instruct-2507
- Method: RL with DAPO and a custom recall/precision reward
- Training data: Abg-CoQA conversational QA benchmark
- Ambiguous examples are upsampled to balance training
Code
Training and evaluation code: https://github.com/saparina/intentRL
Citation
@misc{saparina2025reasoningintentambiguousrequests,
title={Reasoning about Intent for Ambiguous Requests},
author={Irina Saparina and Mirella Lapata},
year={2025},
eprint={2511.10453},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.10453},
}
- Downloads last month
- 8