Parallel Context-of-Experts Decoding for Retrieval Augmented Generation
Abstract
Parallel Context-of-Experts Decoding enables multi-document reasoning in retrieval-augmented generation by treating retrieved documents as isolated experts and synchronizing predictions through a retrieval-aware contrastive decoding rule.
Retrieval Augmented Generation faces a trade-off: concatenating documents in a long prompt enables multi-document reasoning but creates prefill bottlenecks, while encoding document KV caches separately offers speed but breaks cross-document interaction. We propose Parallel Context-of-Experts Decoding (Pced), a training-free framework that shifts evidence aggregation from the attention mechanism to the decoding. Pced treats retrieved documents as isolated "experts", synchronizing their predictions via a novel retrieval-aware contrastive decoding rule that weighs expert logits against the model prior. This approach recovers cross-document reasoning capabilities without constructing a shared attention across documents.
Community
Parallel Context-of-Experts Decoding (Pced) speeds up RAG by decoding in parallel from per-document KV-cache “experts” and selecting retrieval-supported tokens to recover cross-document reasoning.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning (2025)
- CIRAG: Construction-Integration Retrieval and Adaptive Generation for Multi-hop Question Answering (2026)
- ARK: Answer-Centric Retriever Tuning via KG-augmented Curriculum Learning (2025)
- Context-Picker: Dynamic context selection using multi-stage reinforcement learning (2025)
- Disco-RAG: Discourse-Aware Retrieval-Augmented Generation (2026)
- Decide Then Retrieve: A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval (2026)
- RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper