| | --- |
| | language: |
| | - en |
| | - multilingual |
| | tags: |
| | - physics |
| | - reinforcement-learning |
| | - olympiad |
| | - reasoning |
| | - competition |
| | - education |
| | license: apache-2.0 |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | <div align="center"> |
| | <h1 style="font-size: 2em; font-weight: bold;">P1: Mastering Physics Olympiads with Reinforcement Learning</h1> |
| | </div> |
| |
|
| | <p align="center"> |
| | <a href="https://prime-rl.github.io/P1/"><b>π P1 Project Page</b></a> | |
| | <a href="https://phyarena.github.io/"><b>π HiPhO Leaderboard</b></a> |
| | </p> |
| |
|
| | <p align="center"> |
| | <img src="https://raw.githubusercontent.com/PRIME-RL/P1/main/docs/imgs/Score_IPhO_2025_P1_v2.jpg" style="width: 800px" align=center> |
| | </p> |
| |
|
| | <p align="center"> |
| | <i>High-performance mid-scale model for physics reasoning</i> |
| | </p> |
| |
|
| | ## Model Description |
| |
|
| | **P1-30B-A3B** is the mid-size variant of the P1 series, a high-performance open-source language model specialized in physics reasoning. Built on *Qwen3-30B-A3B-Thinking-2507* and refined through multi-stage reinforcement learning on curated physics competition data, P1-30B-A3B achieves impressive results while maintaining reasonable computational requirements, making it accessible for researchers working with physics problems. |
| |
|
| | ### Key Highlights |
| |
|
| | - π₯ **IPhO 2025 Silver-tier Performance**: Strong competitive showing at international physics olympiad (18.5/30 points) |
| | - π₯ **HiPhO Excellence**: 8 gold medals, 4 silver medals, and 1 bronze medal across 13 physics contests |
| |
|
| |
|
| | ## Performance Benchmarks |
| |
|
| | ### IPhO 2025 Results |
| |
|
| | <div align="center"> |
| |
|
| | | Model | Score | Medal | |
| | |:-----:|:-----:|:-----:| |
| | | **P1-30B-A3B** | **18.5** | **π₯ Silver** | |
| | | DeepSeek-R1 | 18.5 | **π₯ Silver** | |
| | | Qwen3-235B-A22B-Thinking-2507 | 17.1 | **π₯ Silver** | |
| | | Qwen3-30B-A3B-Thinking-2507 | 15.6 | **π₯ Silver** | |
| | </div> |
| |
|
| | ### HiPhO Comprehensive Results |
| |
|
| | <div align="center"> |
| |
|
| | | Category | P1-30B-A3B | Qwen3-235B-A22B | DeepSeek-R1 | Qwen3-30B-A3B (Base) | |
| | |:--------:|:----------:|:---------------:|:-----------:|:--------------------:| |
| | | **Overall Score** | **32.5** | 33.5 | 32.9 | 29.9 | |
| | | Gold Medals (π₯) | 8 | 10 | 9 | 6 | |
| | | Silver Medals (π₯) | 4 | 3 | 3 | 6 | |
| | | Bronze Medals (π₯) | 1 | 0 | 1 | 1 | |
| | | Total Contests | 13 | 13 | 13 | 13 | |
| |
|
| | </div> |
| |
|
| | ### Generalization to STEM Tasks |
| |
|
| | Beyond physics reasoning, P1 improves across multiple domains. As shown below, P1-30B-A3B outperforms its base model Qwen3-30B-A3B-Thinking-2507 on math, coding, and STEM benchmarks, demonstrating strong generalization of physics reasoning. |
| |
|
| | <div align="center"> |
| |
|
| | | Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench | |
| | |:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:| |
| | | Qwen3-30B-A3B-Thinking-2507 (Base) | 90.4 | 85.0 | 71.3 | 73.0 | 11.6 | 66.7 | 76.6 | |
| | | **P1-30B-A3B** | **91.0** | **91.0** | **76.9** | **74.4** | **14.3** | **68.1** | **77.0** | |
| |
|
| | </div> |
| |
|
| |
|
| | ## Usage |
| |
|
| | ### Basic Inference |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | # Load model and tokenizer |
| | model_name = "P1-30B-A3B" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_name, |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto", |
| | trust_remote_code=True |
| | ) |
| | |
| | # Physics problem solving |
| | prompt = """Solve this physics problem: |
| | |
| | A pendulum of length L = 1.0 m swings with small amplitude. |
| | Calculate the period of oscillation and explain your reasoning. |
| | |
| | Use g = 9.8 m/sΒ²""" |
| | |
| | inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| | outputs = model.generate( |
| | **inputs, |
| | max_length=81920, |
| | temperature=0.6, |
| | top_p=0.9, |
| | do_sample=True |
| | ) |
| | |
| | solution = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| | print(solution) |
| | ``` |
| |
|
| | ## π Acknowledgements |
| |
|
| | We are grateful to the open-source community for their invaluable contributions. Special thanks to: |
| |
|
| | - **[Qwen3](https://huggingface.co/collections/Qwen/qwen3)** - for providing the foundational base models that powered our research |
| | - **[slime](https://github.com/THUDM/slime)** - for their innovative work on efficient reinforcement learning framework that powered our training pipeline |
| | - **[verl](https://github.com/volcengine/verl)** - for the versatile reinforcement learning framework that enabled our training pipeline |
| | - **[sglang](https://github.com/sgl-project/sglang)** - for the efficient LLM serving and inference infrastructure |
| | - **[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)** - for the large-scale model training framework |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{p1-2025, |
| | title={P1: Mastering Physics Olympiads with Reinforcement Learning}, |
| | author={P1 Team}, |
| | year={2025}, |
| | url={https://prime-rl.github.io/P1/} |
| | } |
| | ``` |
| |
|
| | </div> |
| |
|