Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
Paper • 2511.07328 • Published • 16
Q-RAG is a resource-efficient method for multi-step retrieval trained with reinforcement learning (RL) directly in the latent space of text-chunk embeddings. Instead of expensive LLM fine-tuning, Q-RAG trains only a lightweight embedder agent using value-based RL (temporal difference learning), keeping the LLM frozen.
Q-RAG achieves state-of-the-art results on long-context benchmarks (BabiLong, RULER) for contexts up to 10M tokens and competitive performance on open-domain multi-hop QA (HotpotQA, Musique).
# Create conda environment
conda create -n qrag python=3.12 -y
conda activate qrag
# Install dependencies
python -m pip install pip==26.0.1 wheel==0.46.3
pip install vllm==0.18.0
pip install hydra-core==1.3.2 tensorboard==2.20.0 rotary-embedding-torch==0.8.9 pandas==3.0.1 nltk==3.9.4 sortedcontainers==2.4.0 accelerate==1.13.0 datasets==4.8.4
If you find Q-RAG useful, please cite the paper:
@inproceedings{sorokin2026qrag,
title = {{Q-RAG}: Long Context Multi-Step Retrieval via Value-Based Embedder Training},
author = {Sorokin, Artyom and Buzun, Nazar and Anokhin, Alexander and Inozemcev, Oleg and Vedernikov, Egor and Anokhin, Petr and Burtsev, Mikhail and Trushkov, Alexey and Yin, Wenshuai and Burnaev, Evgeny},
booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)},
year = {2026}
}
@article{sorokin2025qrag,
title = {{Q-RAG}: Long Context Multi-Step Retrieval via Value-Based Embedder Training},
author = {Sorokin, Artyom and Buzun, Nazar and Anokhin, Alexander and Inozemcev, Oleg and Vedernikov, Egor and Anokhin, Petr and Burtsev, Mikhail and Trushkov, Alexey and Yin, Wenshuai and Burnaev, Evgeny},
journal = {arXiv preprint arXiv:2511.07328},
year = {2025}
}