Sovereign-Physics-V1 β€” Research-Level Physics Reasoning

Honest Targets (Based on Actual SOTA)

Benchmark Best Model (2025) Best Score 3B Target
CritPt challenges GPT-5 (high, code & web) 12.6% 1-3%
CritPt checkpoints GPT-5 (high, code & web) 24.5% 5-10%
HLE o3-mini 26% 2-5%
PHYBench Frontier models ~60% 15-25%

Architecture

Phase 1: GRPO with Symbolic Execution Reward
  R₁ [0.50] Symbolic verification (Python sandbox + sympy + numeric Β±2%)
  Rβ‚‚ [0.15] Code execution (rewards executable verification code)
  R₃ [0.25] Physics depth (reasoning patterns + derivation steps)
  Rβ‚„ [0.10] Length penalty

Phase 2: Failure-Mode Self-Repair
  β†’ Classify failures: algebraic, conceptual, setup, computation, incomplete
  β†’ When threshold reached: auto-generate Mini-SFT repair dataset
  β†’ SFT on repair data to prevent recurrence

Phase 3: Loop back to Phase 1 with repaired model

Launch

python physics_reasoning.py
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for moro72842/Sovereign-Physics-V1

Base model

Qwen/Qwen2.5-3B
Finetuned
(1265)
this model

Datasets used to train moro72842/Sovereign-Physics-V1