Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training

Q-RAG is a resource-efficient method for multi-step retrieval trained with reinforcement learning (RL) directly in the latent space of text-chunk embeddings. Instead of expensive LLM fine-tuning, Q-RAG trains only a lightweight embedder agent using value-based RL (temporal difference learning), keeping the LLM frozen.

Q-RAG achieves state-of-the-art results on long-context benchmarks (BabiLong, RULER) for contexts up to 10M tokens and competitive performance on open-domain multi-hop QA (HotpotQA, Musique).

Paper: Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
Repository: https://github.com/griver/Q-RAG

Key Results

RULER: Achieves near-perfect retrieval on all NIAH subtasks, generalizing up to 1M tokens while being trained on only 4K-length documents.
BabiLong: Highest average performance across tasks QA1–QA5 at context lengths from 1M to 10M tokens.
Efficiency: All training can be performed on a single A100-80GB GPU within 12 hours per model.

Installation

# Create conda environment
conda create -n qrag python=3.12 -y
conda activate qrag

# Install dependencies
python -m pip install pip==26.0.1 wheel==0.46.3
pip install vllm==0.18.0 
pip install hydra-core==1.3.2 tensorboard==2.20.0 rotary-embedding-torch==0.8.9 pandas==3.0.1 nltk==3.9.4 sortedcontainers==2.4.0 accelerate==1.13.0 datasets==4.8.4

Citation

If you find Q-RAG useful, please cite the paper:

@inproceedings{sorokin2026qrag,
  title     = {{Q-RAG}: Long Context Multi-Step Retrieval via Value-Based Embedder Training},
  author    = {Sorokin, Artyom and Buzun, Nazar and Anokhin, Alexander and Inozemcev, Oleg and Vedernikov, Egor and Anokhin, Petr and Burtsev, Mikhail and Trushkov, Alexey and Yin, Wenshuai and Burnaev, Evgeny},
  booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)},
  year      = {2026}
}

@article{sorokin2025qrag,
  title   = {{Q-RAG}: Long Context Multi-Step Retrieval via Value-Based Embedder Training},
  author  = {Sorokin, Artyom and Buzun, Nazar and Anokhin, Alexander and Inozemcev, Oleg and Vedernikov, Egor and Anokhin, Petr and Burtsev, Mikhail and Trushkov, Alexey and Yin, Wenshuai and Burnaev, Evgeny},
  journal = {arXiv preprint arXiv:2511.07328},
  year    = {2025}
}

Downloads last month: 5

Inference Providers NEW

Text Retrieval

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Q-RAG/qrag-ft-e5-on-musique

Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

Paper • 2511.07328 • Published 27 days ago • 16