Buckets:
| license: apache-2.0 | |
| library_name: transformers | |
| base_model: | |
| - deepseek-ai/DeepSeek-Math-V2 | |
| <!-- markdownlint-disable first-line-h1 --> | |
| <!-- markdownlint-disable html --> | |
| <!-- markdownlint-disable no-duplicate-header --> | |
| <div align="center"> | |
| <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" /> | |
| </div> | |
| <hr> | |
| <div align="center" style="line-height: 1;"> | |
| <a href="https://www.deepseek.com/"><img alt="Homepage" | |
| src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true"/></a> | |
| <a href="https://chat.deepseek.com/"><img alt="Chat" | |
| src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white"/></a> | |
| <a href="https://huggingface.co/deepseek-ai"><img alt="Hugging Face" | |
| src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white"/></a> | |
| <br> | |
| <a href="https://discord.gg/Tc7c45Zzu5"><img alt="Discord" | |
| src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da"/></a> | |
| <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true"><img alt="Wechat" | |
| src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white"/></a> | |
| <a href="https://twitter.com/deepseek_ai"><img alt="Twitter Follow" | |
| src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white"/></a> | |
| <br> | |
| <a href="LICENSE" style="margin: 2px;"> | |
| <img alt="License" src="https://img.shields.io/badge/License-Apache 2.0-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> | |
| </a> | |
| <br> | |
| </div> | |
| # DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning | |
| ## 1. Introduction | |
| Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced. | |
| By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year. | |
| However, this approach faces fundamental limitations. | |
| Pursuing higher final answer accuracy doesn't address a key issue: correct answers don't guarantee correct reasoning. | |
| Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable. | |
| To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning. | |
| Self-verification is particularly important for scaling test-time compute, especially for open problems without known solutions. | |
| Towards self-verifiable mathematical reasoning, we investigate how to train an accurate and faithful LLM-based verifier for theorem proving. | |
| We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them. | |
| To maintain the generation-verification gap as the generator becomes stronger, we propose to scale verification compute to automatically label new hard-to-verify proofs, creating training data to further improve the verifier. | |
| Our resulting model, DeepSeekMath-V2, demonstrates strong theorem-proving capabilities, achieving gold-level scores on IMO 2025 and CMO 2024 and a near-perfect 118/120 on Putnam 2024 with scaled test-time compute. | |
| While much work remains, these results suggest that self-verifiable mathematical reasoning is a feasible research direction that may help develop more capable mathematical AI systems. | |
| ## 2. Evaluation Results | |
| Below are evaluation results on [IMO-ProofBench](https://github.com/google-deepmind/superhuman/tree/main/imobench) (developed by the DeepMind team behind DeepThink IMO-Gold) and recent mathematics competitions including IMO 2025, CMO 2024, and Putnam 2024. | |
| **IMO-ProofBench** | |
| <p align="center"> | |
| <img width="100%" src="https://raw.githubusercontent.com/deepseek-ai/DeepSeek-Math-V2/refs/heads/main/figures/IMO-ProofBench.png"> | |
| </p> | |
| --- | |
| **Mathematics Competitions** | |
| <p align="center"> | |
| <img width=41%" src="https://raw.githubusercontent.com/deepseek-ai/DeepSeek-Math-V2/refs/heads/main/figures/Competitions.png"> | |
| </p> | |
| ## 4. Quick Start | |
| DeepSeekMath-V2 is built on top of DeepSeek-V3.2-Exp-Base. | |
| For inference support, please refer to [the DeepSeek-V3.2-Exp github repository](https://github.com/deepseek-ai/DeepSeek-V3.2-Exp). | |
| ## 6. License | |
| This repository and the model weights are licensed under [the Apache License, Version 2.0 (Apache 2.0)](LICENSE). | |
| ## 7. Citation | |
| ``` | |
| @misc{deepseek-math-v2, | |
| author = {Zhihong Shao, Yuxiang Luo, Chengda Lu, Z.Z. Ren, Jiewen Hu, Tian Ye, Zhibin Gou, Shirong Ma, Xiaokang Zhang}, | |
| title = {DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning}, | |
| year = {2025}, | |
| } | |
| ``` | |
| ## 8. Contact | |
| If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com). | |
Xet Storage Details
- Size:
- 5.17 kB
- Xet hash:
- 32d6f9dacca0c0e576f1d0783089d797a1e05e61b56fa564206e23d844d9ceca
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.