Papers
arxiv:2604.18576

Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs

Published on May 4
Authors:

Abstract

The Bayesian Linguistic Forecaster combines linguistic belief states, hierarchical multi-trial aggregation, and hierarchical calibration to achieve superior binary forecasting performance on the ForecastBench benchmark.

We present the Bayesian Linguistic Forecaster (BLF), an agentic system for binary forecasting that achieves state-of-the-art performance on the ForecastBench benchmark. The system is built on three ideas. (1) Linguistic belief state: a semi-structured representation combining numerical probability estimates with natural-language evidence summaries, updated by the LLM at each step of an iterative tool-use loop. This contrasts with the common approach of appending all retrieved evidence to an ever-growing, unstructured context. (2) Hierarchical multi-trial aggregation: running K independent trials and combining them using logit-space averaging shrinkage with a data-dependent prior. (3) Hierarchical calibration: Platt scaling with a hierarchical prior, which avoids over-shrinking extreme predictions for sources with skewed base rates. On 400 questions from the ForecastBench leaderboard, BLF outperforms all the top public methods, including Cassi, GPT-5, Grok~4.20, and Foresight-32B. Careful ablation studies, using mixed effects analysis to control for question variability (which accounts for 62\% of the variance in performance), reveals that all 3 components contribute to the overall gains, but some components matter more than others, depending on the base LLM, and the setting (e.g.\ with or without a crowd prior). All our experiments are based on a robust back-testing framework which we develop, which has a leakage rate below 1.5\%, and may be of independent interest.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.18576
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.18576 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.18576 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.18576 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.