IntentRL-Ambig-CoQA-4B

This model is trained to handle ambiguous conversational questions by explicitly reasoning about user intent and producing multiple interpretation–answer pairs rather than silently committing to a single interpretation.

It is based on Qwen/Qwen3-4B-Instruct-2507, fine-tuned with RL (DAPO) using a custom reward that encourages recall (covering more valid interpretations) for ambiguous questions and precision for unambiguous ones.

Example

Given a passage and conversational context:

Hilary Duff says her new album is ... "a lot heavier and a lot darker" because of the separation from her husband, Mike Comrie. Duff married Comrie ... in 2010 after dating for three years. Their son, Luca, was born in 2012...

How long were they married before they had a child? — 2 years

What is his name?

The model produces multiple interpretation–answer pairs:

The question refers to the husband's name → His name is Mike Comrie.
The question refers to the son's name → His name is Luca.

Paper

Reasoning about Intent for Ambiguous Requests

Authors: Irina Saparina, Mirella Lapata

Training Details

Base model: Qwen3-4B-Instruct-2507
Method: RL with DAPO and a custom recall/precision reward
Training data: Abg-CoQA conversational QA benchmark
Ambiguous examples are upsampled to balance training

Code

Training and evaluation code: https://github.com/saparina/intentRL

Citation

@misc{saparina2025reasoningintentambiguousrequests,
      title={Reasoning about Intent for Ambiguous Requests},
      author={Irina Saparina and Mirella Lapata},
      year={2025},
      eprint={2511.10453},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2511.10453},
}