arxiv:2602.12164

Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision

Published on Feb 12

· Submitted by

shiyang on Feb 13

Intern Science

Upvote

Authors:

Shiyang Feng ,

Abstract

Sci-CoE is a two-stage scientific co-evolving framework that enables large language models to self-evolve as both solver and verifier through sparse-to-unsupervised learning transitions, improving scientific reasoning capabilities and evaluation system robustness.

AI-generated summary

Large language models (LLMs) have demonstrated exceptional reasoning capabilities, and co-evolving paradigms have shown promising results in domains such as code and math. However, in scientific reasoning tasks, these models remain fragile due to unreliable solution evaluation and limited diversity in verification strategies. In this work, we propose Sci-CoE, a two-stage scientific co-evolving framework that enables models to self-evolve as both solver and verifier through a transition from sparse supervision to unsupervised learning. In the first stage, the model uses a small set of annotated data to establish fundamental correctness judgment anchors for the Verifier. In the second stage, we introduce a geometric reward mechanism that jointly considers consensus, reliability, and diversity, driving large-scale self-iteration on unlabeled data. Experiments on several general scientific benchmarks demonstrate that Sci-CoE enhances complex reasoning capabilities and exhibits strong scalability, facilitating the construction of more robust and diverse evaluation systems. Codes are available at https://github.com/InternScience/Sci-CoE.

View arXiv page View PDF GitHub 1 Add to collection

Community

sY713

Paper author Paper submitter about 12 hours ago

•

edited about 12 hours ago

We’re excited to share our new work, Sci-CoE! 🎉

In this project, we tackle a fundamental challenge: 🧩 How can we train LLMs with RL when there is no explicit final answer and no way to compute outcome rewards via unit tests or exact matching?

This setting is common in 🔬 scientific reasoning and 📐 mathematical proofs, where correct solutions are often non unique and expressed as structured natural language rather than fixed outputs. Traditional reward signals simply do not work in these scenarios.

To address this, we propose a ✨ geometric consensus reward that models agreement, reliability, and diversity, enabling RL training without ground truth final answers. Starting from sparse supervision, Sci CoE transitions to large scale self evolution 🚀, allowing models to improve as both solver and verifier in open ended scientific tasks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.12164 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.12164 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.12164 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.