| The official repository for the paper ["CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation"](https://arxiv.org/pdf/2602.01660) | |
| ## 💡 Introduction | |
| Large Reasoning Models (LRMs) benefit substantially from training on challenging, competition-level questions. However, existing automated synthesis methods struggle with **"fake hard"** questions—problems that are complex but unsolvable or ill-defined. | |
| **CoDiQ (Controllable Difficult Question Generation)** is a novel framework that enables fine-grained difficulty control via **test-time scaling** while ensuring solvability. | |
| Key innovations include: | |
| 1. **Test-Time Scaling Tendency**: We identify that extending the reasoning token budget boosts difficulty but can reduce solvability. | |
| 2. **CoDiQ-Generator**: A specialized model (finetuned from Qwen3-8B) that improves the upper bound of valid, high-difficulty question generation. | |
| 3. **CoDiQ-Corpus**: A dataset of **44K** competition-grade math and coding question sequences, which is significantly more challenging than LiveCodeBench and AIME. | |
| Training LRMs on CoDiQ-Corpus substantially enhances downstream reasoning performance. The [CoDiQ-Generator](https://huggingface.co/AleXGroup/CoDiQ-Gen-8B) and [CoDiQ-Corpus](https://huggingface.co/datasets/AleXGroup/CoDiQ-Corpus) are released. | |
| ## 📖 Citation | |
| If you find **CoDiQ** useful for your research, please consider citing our paper: | |
| ```bibtex | |
| @article{codiq2026, | |
| title={CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation}, | |
| author={Zhongyuan Peng, Caijun Xu, Changyi Xiao, Shibo Hong, Eli Zhang, Stephen Huang, Yixin Cao}, | |
| journal={arXiv preprint arXiv:2602.01660}, | |
| year={2026} | |
| } | |
| ``` |