Update README.md
Browse files
README.md
CHANGED
|
@@ -52,7 +52,7 @@ tags:
|
|
| 52 |
|
| 53 |
## Introduction
|
| 54 |
|
| 55 |
-
We introduce **LongCat-Flash-Prover**, a flagship
|
| 56 |
We decompose the native formal reasoning task into three independent formal capabilities, i.e., auto-formalization, sketching, and proving.
|
| 57 |
To facilitate these capabilities, we propose a Hybrid-Experts Iteration Framework to expand high-quality task trajectories, including generating a formal statement based on a given informal problem, producing a whole-proof directly from the statement, or a lemma-style sketch.
|
| 58 |
During agentic RL, we present a Hierarchical Importance Sampling Policy Optimization (HisPO) algorithm, which aims to stabilize the MoE model training on such long-horizon tasks. It employs a gradient masking strategy that accounts for the policy staleness and the inherent train-inference engine discrepancies at both sequence and token levels.
|
|
|
|
| 52 |
|
| 53 |
## Introduction
|
| 54 |
|
| 55 |
+
We introduce **LongCat-Flash-Prover**, a flagship 560-billion-parameter open-source Mixture-of-Experts (MoE) model that advances Native Formal Reasoning in Lean4 through agentic tool-integrated reasoning (TIR).
|
| 56 |
We decompose the native formal reasoning task into three independent formal capabilities, i.e., auto-formalization, sketching, and proving.
|
| 57 |
To facilitate these capabilities, we propose a Hybrid-Experts Iteration Framework to expand high-quality task trajectories, including generating a formal statement based on a given informal problem, producing a whole-proof directly from the statement, or a lemma-style sketch.
|
| 58 |
During agentic RL, we present a Hierarchical Importance Sampling Policy Optimization (HisPO) algorithm, which aims to stabilize the MoE model training on such long-horizon tasks. It employs a gradient masking strategy that accounts for the policy staleness and the inherent train-inference engine discrepancies at both sequence and token levels.
|