Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision
Abstract
Sci-CoE is a two-stage scientific co-evolving framework that enables large language models to self-evolve as both solver and verifier through sparse-to-unsupervised learning transitions, improving scientific reasoning capabilities and evaluation system robustness.
Large language models (LLMs) have demonstrated exceptional reasoning capabilities, and co-evolving paradigms have shown promising results in domains such as code and math. However, in scientific reasoning tasks, these models remain fragile due to unreliable solution evaluation and limited diversity in verification strategies. In this work, we propose Sci-CoE, a two-stage scientific co-evolving framework that enables models to self-evolve as both solver and verifier through a transition from sparse supervision to unsupervised learning. In the first stage, the model uses a small set of annotated data to establish fundamental correctness judgment anchors for the Verifier. In the second stage, we introduce a geometric reward mechanism that jointly considers consensus, reliability, and diversity, driving large-scale self-iteration on unlabeled data. Experiments on several general scientific benchmarks demonstrate that Sci-CoE enhances complex reasoning capabilities and exhibits strong scalability, facilitating the construction of more robust and diverse evaluation systems. Codes are available at https://github.com/InternScience/Sci-CoE.
Community
Weβre excited to share our new work, Sci-CoE! π
In this project, we tackle a fundamental challenge: π§© How can we train LLMs with RL when there is no explicit final answer and no way to compute outcome rewards via unit tests or exact matching?
This setting is common in π¬ scientific reasoning and π mathematical proofs, where correct solutions are often non unique and expressed as structured natural language rather than fixed outputs. Traditional reward signals simply do not work in these scenarios.
To address this, we propose a β¨ geometric consensus reward that models agreement, reliability, and diversity, enabling RL training without ground truth final answers. Starting from sparse supervision, Sci CoE transitions to large scale self evolution π, allowing models to improve as both solver and verifier in open ended scientific tasks.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper