DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs
Abstract
DarkForest is a controlled-communication framework that enhances multi-agent LLM reasoning by clustering semantic candidates and using calibrated belief distributions to reduce error propagation and communication overhead.
Multi-agent LLM systems improve reasoning by combining outputs from multiple agents, but interaction-heavy methods can introduce error propagation and high communication overhead. When agents exchange raw responses or reasoning traces, incorrect intermediate reasoning may be adopted and amplified, leading to confident but wrong consensus; multi-round communication also increases token consumption, latency, and inference cost. In this paper, we propose a controlled-communication coordination framework named DarkForest. DarkForest first keeps agents independent, so each agent produces an answer without seeing the others' outputs. It then parses the raw responses into structured candidate records, groups semantically equivalent candidates into clusters, and estimates a calibrated belief distribution over these clusters using agent reliability, confidence, parse quality, support-pattern reliability, and independence corrections. A coordinator receives only policy-permitted evidence from this belief state with controlled communication. Experiments on six reasoning benchmarks show that DarkForest achieves leading overall quality, improves the strongest baseline by up to 30.7\% on benchmark metrics, and reduces token consumption by up to 6.5times compared with communication-heavy baselines.
Community
DarkForest is a controlled-communication framework for multi-agent LLM reasoning. Instead of letting agents exchange raw reasoning traces, it keeps agents independent, clusters their candidate answers, estimates a calibrated belief distribution, and only passes policy-permitted evidence to the coordinator.
The goal is to reduce error propagation while preserving useful diversity. Experiments across six reasoning benchmarks show stronger accuracy and much lower token consumption than communication-heavy multi-agent baselines.
Get this paper in your agent:
hf papers read 2605.25188 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper