Spaces:

Proactive-Interactive-R1
/

README

No application file

App Files Files Community

README / README.md

Xinging

Update README.md

3935961 verified 2 months ago

4.54 kB

title: Proactive Interactive Reasoning (PIR)
emoji: 🌖
colorFrom: blue
colorTo: indigo
sdk: gradio
pinned: false
license: apache-2.0
short_description: Enables reasoning-LLM to ask clarification questions

Reasoning While Asking: Transforming Reasoning LLMs into Proactive Inquirers (PIR)

This organization hosts the official models and datasets for the paper "Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers".

💡 Motivation

Current reasoning LLMs (e.g., GPT-o1, DeepSeek-R1) suffer from blind self-thinking: they perform extensive internal reasoning even when critical information is missing or user intent is ambiguous. This leads to overthinking, hallucinations, and misaligned conclusions.

PIR (Proactive Interactive Reasoning) is a new paradigm that transforms reasoning LLMs from passive solvers into proactive inquirers. Instead of guessing, PIR-enabled models detect uncertainty during reasoning and actively ask users for clarification before proceeding.

(Note: If the image above does not load, please view it on our GitHub)

Key Features

User-Intent Alignment: Optimizes interaction through US-GRPO with composite rewards balancing accuracy, efficiency, and helpfulness.
Significant Improvements: Up to 32.70% higher accuracy, 22.90% higher pass rate, and 41.36 BLEU improvement over baselines.
Reduced Computation: Nearly halves unnecessary reasoning tokens and interaction turns.

📦 Models

We provide the following models trained with the PIR paradigm:

Model Name	Description	Link
Proactive-Interactive-R1-Math-7B	The core model optimized for mathematical reasoning with clarification capabilities.	View Model
Proactive-Interactive-R1-Math-7B-Pro	An enhanced version of the Math-7B model.	View Model
Proactive-Interactive-R1-SFT-7B	The base SFT model before Reinforcement Learning alignment.	View Model

📚 Datasets

The datasets used to train and evaluate PIR are available here:

Reasoning-While-Asking-SFT-Dataset: The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
DeepSeek-R1-Distill-Data-5k: Distilled data used for training.

🔬 Method

PIR consists of two phases:

Interactive Capability Activation (Phase I):
- Detects uncertainty via Predictive Entropy at each reasoning step.
- Injects clarification questions at high-uncertainty points using instruction-following LLMs.
- Performs Supervised Fine-Tuning to teach models the "think-ask-respond" pattern.
User-Intent Alignment (Phase II):
- US-GRPO: Group Relative Policy Optimization with a dynamic User Simulator.
- Composite Reward: Combines output accuracy (extrinsic) with reasoning efficiency and helpfulness (intrinsic).
- Aligns model behavior with user intent while minimizing unnecessary interactions.

📜 Citation

If you find this work useful, please cite our paper:

@misc{chen2026reasoningaskingtransformingreasoning,
      title={Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers}, 
      author={Xin Chen and Feng Jiang and Yiqian Zhang and Hardy Chen and Shuo Yan and Wenya Xie and Min Yang and Shujian Huang},
      year={2026},
      eprint={2601.22139},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.22139},
}