arxiv:2605.25188

DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs

Published on May 24

· Submitted by

Authors:

Abstract

DarkForest is a controlled-communication framework that enhances multi-agent LLM reasoning by clustering semantic candidates and using calibrated belief distributions to reduce error propagation and communication overhead.

AI-generated summary

Multi-agent LLM systems improve reasoning by combining outputs from multiple agents, but interaction-heavy methods can introduce error propagation and high communication overhead. When agents exchange raw responses or reasoning traces, incorrect intermediate reasoning may be adopted and amplified, leading to confident but wrong consensus; multi-round communication also increases token consumption, latency, and inference cost. In this paper, we propose a controlled-communication coordination framework named DarkForest. DarkForest first keeps agents independent, so each agent produces an answer without seeing the others' outputs. It then parses the raw responses into structured candidate records, groups semantically equivalent candidates into clusters, and estimates a calibrated belief distribution over these clusters using agent reliability, confidence, parse quality, support-pattern reliability, and independence corrections. A coordinator receives only policy-permitted evidence from this belief state with controlled communication. Experiments on six reasoning benchmarks show that DarkForest achieves leading overall quality, improves the strongest baseline by up to 30.7\% on benchmark metrics, and reduces token consumption by up to 6.5times compared with communication-heavy baselines.

View arXiv page View PDF Project page GitHub 7 Add to collection

Community

dj220001

Paper submitter about 17 hours ago

DarkForest is a controlled-communication framework for multi-agent LLM reasoning. Instead of letting agents exchange raw reasoning traces, it keeps agents independent, clusters their candidate answers, estimates a calibrated belief distribution, and only passes policy-permitted evidence to the coordinator.

The goal is to reduce error propagation while preserving useful diversity. Experiments across six reasoning benchmarks show stronger accuracy and much lower token consumption than communication-heavy multi-agent baselines.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.25188

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.25188 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.25188 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.25188 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.