Papers
arxiv:2601.05039

FinDeepForecast: A Live Multi-Agent System for Benchmarking Deep Research Agents in Financial Forecasting

Published on Jan 8
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

A live multi-agent system enables continuous evaluation of deep research agents in financial forecasting tasks through a dual-track taxonomy generating corporate and macro-level recurrent and non-recurrent challenges.

AI-generated summary

Deep Research (DR) Agents powered by advanced Large Language Models (LLMs) have fundamentally shifted the paradigm for completing complex research tasks. Yet, a comprehensive and live evaluation of their forecasting performance on real-world, research-oriented tasks in high-stakes domains (e.g., finance) remains underexplored. We introduce FinDeepForecast, the first live, end-to-end multi-agent system for automatically evaluating DR agents by continuously generating research-oriented financial forecasting tasks. This system is equipped with a dual-track taxonomy, enabling the dynamic generation of recurrent and non-recurrent forecasting tasks at both corporate and macro levels. With this system, we generate FinDeepForecastBench, a weekly evaluation benchmark over a ten-week horizon, encompassing 8 global economies and 1,314 listed companies, and evaluate 13 representative methods. Extensive experiments show that, while DR agents consistently outperform strong baselines, their performance still falls short of genuine forward-looking financial reasoning. We expect the proposed FinDeepForecast system to consistently facilitate future advancements of DR agents in research-oriented financial forecasting tasks. The benchmark and leaderboard are publicly available on the OpenFinArena Platform.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.05039 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.05039 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.