PRIME-RL
/

P1-30B-A3B

@@ -2,6 +2,8 @@
 language:
 - en
 - multilingual
 tags:
 - physics
 - reinforcement-learning
@@ -9,8 +11,7 @@ tags:
 - reasoning
 - competition
 - education
-license: apache-2.0
-pipeline_tag: text-generation
 ---
 <div align="center">
@@ -18,6 +19,8 @@ pipeline_tag: text-generation
 </div>
 <p align="center">
   <a href="https://prime-rl.github.io/P1/"><b>🌐 P1 Project Page</b></a> |
   <a href="https://phyarena.github.io/"><b>🏆 HiPhO Leaderboard</b></a>
 </p>
@@ -47,7 +50,7 @@ pipeline_tag: text-generation
 <div align="center">
 | Model | Score | Medal |
-|:-----:|:-----:|:-----:|
 | **P1-30B-A3B** | **18.5** | **🥈 Silver** |
 | DeepSeek-R1 | 18.5 | **🥈 Silver** |
 | Qwen3-235B-A22B-Thinking-2507 | 17.1 | **🥈 Silver** |
@@ -59,7 +62,7 @@ pipeline_tag: text-generation
 <div align="center">
 | Category | P1-30B-A3B | Qwen3-235B-A22B | DeepSeek-R1 | Qwen3-30B-A3B (Base) |
-|:--------:|:----------:|:---------------:|:-----------:|:--------------------:|
 | **Overall Score** | **32.5** | 33.5 | 32.9 | 29.9 |
 | Gold Medals (🥇) | 8 | 10 | 9 | 6 |
 | Silver Medals (🥈) | 4 | 3 | 3 | 6 |
@@ -74,8 +77,8 @@ Beyond physics reasoning, P1 improves across multiple domains. As shown below, P
 <div align="center">
-| Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench |
-|:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:|
 | Qwen3-30B-A3B-Thinking-2507 (Base) | 90.4 | 85.0 | 71.3 | 73.0 | 11.6 | 66.7 | 76.6 |
 | **P1-30B-A3B** | **91.0** | **91.0** | **76.9** | **74.4** | **14.3** | **68.1** | **77.0** |
@@ -138,8 +141,6 @@ We are grateful to the open-source community for their invaluable contributions.
     title={P1: Mastering Physics Olympiads with Reinforcement Learning},
     author={P1 Team},
     year={2025},
-    url={https://prime-rl.github.io/P1/}
 }
-```
-</div>

 language:
 - en
 - multilingual
+license: apache-2.0
+pipeline_tag: text-generation
 tags:
 - physics
 - reinforcement-learning
 - reasoning
 - competition
 - education
+library_name: transformers
 ---
 <div align="center">
 </div>
 <p align="center">
+  <a href="https://huggingface.co/papers/2511.13612"><b>📚 Paper</b></a> |
+  <a href="https://github.com/PRIME-RL/P1"><b>💻 Code</b></a> |
   <a href="https://prime-rl.github.io/P1/"><b>🌐 P1 Project Page</b></a> |
   <a href="https://phyarena.github.io/"><b>🏆 HiPhO Leaderboard</b></a>
 </p>
 <div align="center">
 | Model | Score | Medal |
+|:-----:|:-----:|:-----:|\
 | **P1-30B-A3B** | **18.5** | **🥈 Silver** |
 | DeepSeek-R1 | 18.5 | **🥈 Silver** |
 | Qwen3-235B-A22B-Thinking-2507 | 17.1 | **🥈 Silver** |
 <div align="center">
 | Category | P1-30B-A3B | Qwen3-235B-A22B | DeepSeek-R1 | Qwen3-30B-A3B (Base) |
+|:--------:|:----------:|:---------------:|:-----------:|:--------------------:|\
 | **Overall Score** | **32.5** | 33.5 | 32.9 | 29.9 |
 | Gold Medals (🥇) | 8 | 10 | 9 | 6 |
 | Silver Medals (🥈) | 4 | 3 | 3 | 6 |
 <div align="center">
+| Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench |\
+|:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:|\
 | Qwen3-30B-A3B-Thinking-2507 (Base) | 90.4 | 85.0 | 71.3 | 73.0 | 11.6 | 66.7 | 76.6 |
 | **P1-30B-A3B** | **91.0** | **91.0** | **76.9** | **74.4** | **14.3** | **68.1** | **77.0** |
     title={P1: Mastering Physics Olympiads with Reinforcement Learning},
     author={P1 Team},
     year={2025},
+    url={https://huggingface.co/papers/2511.13612}
 }
+```