Papers
arxiv:2601.17332

TheoremForge: Scaling up Formal Data Synthesis with Low-Budget Agentic Workflow

Published on Jan 24
Authors:

Abstract

TheoremForge is a cost-effective formal data synthesis framework that decomposes mathematical formalization into sub-tasks and uses a decoupled extraction strategy to recover valid training signals from failed trajectories, achieving higher data yield and lower costs than traditional approaches.

AI-generated summary

The high cost of agentic workflows in formal mathematics hinders large-scale data synthesis, exacerbating the scarcity of open-source corpora. To address this, we introduce TheoremForge, a cost-effective formal data synthesis pipeline that decomposes the formalization process into five sub-tasks, which are statement formalization, proof generation, premise selection, proof correction and proof sketching. By implementing a Decoupled Extraction Strategy, the workflow recovers valid training signals from globally failed trajectories, effectively utilizing wasted computation. Experiments on a 2,000-problem benchmark demonstrate that TheoremForge achieves a Verified Rate of 12.6\%, surpassing the 8.6\% baseline, at an average cost of only \0.481 per successful trajectory using Gemini-3-Flash. Crucially, our strategy increases data yield by 1.6\times$ for proof generation compared to standard filtering. These results establish TheoremForge as a scalable framework for constructing a data flywheel to train future expert models. Our code is available https://github.com/timechess/TheoremForge{here}.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.17332 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.17332 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.