Sovereign-Physics-V1 β Research-Level Physics Reasoning
Honest Targets (Based on Actual SOTA)
| Benchmark | Best Model (2025) | Best Score | 3B Target |
|---|---|---|---|
| CritPt challenges | GPT-5 (high, code & web) | 12.6% | 1-3% |
| CritPt checkpoints | GPT-5 (high, code & web) | 24.5% | 5-10% |
| HLE | o3-mini | 26% | 2-5% |
| PHYBench | Frontier models | ~60% | 15-25% |
Architecture
Phase 1: GRPO with Symbolic Execution Reward
Rβ [0.50] Symbolic verification (Python sandbox + sympy + numeric Β±2%)
Rβ [0.15] Code execution (rewards executable verification code)
Rβ [0.25] Physics depth (reasoning patterns + derivation steps)
Rβ [0.10] Length penalty
Phase 2: Failure-Mode Self-Repair
β Classify failures: algebraic, conceptual, setup, computation, incomplete
β When threshold reached: auto-generate Mini-SFT repair dataset
β SFT on repair data to prevent recurrence
Phase 3: Loop back to Phase 1 with repaired model
Launch
python physics_reasoning.py
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support