Mobile-ReasoningLLM-v0-1.5B
Model Description
Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of Qwen2.5-1.5B, optimized for reasoning tasks in mathematics and code generation. It supports up to 64K output tokens for math problems and 65K tokens for code generation. This model is designed for both commercial and non-commercial research use. This repository contains the evluation code of Mobile-ReasoningLLM-v0 which start to update the reference model in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning. In this work, we comprehensively consider to start to free the weights of refrence model in the contiue learning of Reasoning LLMs which are already learned after R1-Like reinforcement learning and its variants. In our version zero, we further demonstrate that our design of reforcement learning enhance the reasoning ability of small language models, with SoTA results for 5 reasoning benchmarks Mobile-Reasoning-LLM-1.5B. It takes the 30 days to train Mobile-ReasoningLLM-v0 on 1T Tokens using 8 NVIDIA A800 80G GPUs following pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning, and updaets reference model in the continue r1-reinforcement learning.
- Architecture: Dense decoder-only Transformer
- Base Model: Qwen2.5-1.5B
- Parameters: 1.5 billion
- Version: v0 (released September 29, 2025)
Intended Use
- Primary Use: Solving complex math problems and generating correct code solutions.
- Applications: Research, education, software development, and math reasoning tasks.
- Limitations: May not handle ambiguous or poorly formatted inputs well. Ethical use is encouraged to avoid harmful applications.
Benchmarks
The model was post-trained on a hybrid dataset (automated, human, synthetic) including:
- Math datasets: AIME 2024, AIME 2025, MATH-500, GSM8k.
- Code dataset: LiveCodeBench V6 (date range: 2408–2505).
Evaluation
The model was evaluated on the following benchmarks, achieving strong performance:
| Model | AIME24 | AIME25 | MATH-500 | GSM8k | LiveCodeBench* |
|---|---|---|---|---|---|
| Qwen3-0.6B-base | 11.3 | 17.0 | 73.0 | 79.2 | 14.9 |
| MobileLLM-R1-1B | 15.5 | 16.3 | 74.0 | 67.5 | 19.9 |
| DeepSeek-Qwen-1.5B | 29.1 | 23.4 | 83.4 | 77.3 | 19.9 |
| FastCurl-1.5B-V3 | 49.6 | 32.9 | 90.5 | --- | --- |
| Open-Nemotron-1.5B | 49.7 | 40.4 | 83.4 | 76.7 | 28.3 |
| Mobile-ReasoningLLM-v0-1.5B | 63.1 | 49.6 | 88.0 | 80.2 | 30.7 |
| Qwen3-1.7B | 47.0 | 37.0 | 89.4 | 90.3 | 29.8 |
How to Use
Requirements
- Library:
transformers,torch,vLLMorTensorRT-LLM - Hardware: Tested on NVIDIA 8xA800-80GB GPUs
- Environment: Python 3.10+ (e.g., Conda
hugenvironment)
Inference Example
import transformers
import torch
model_id = "deepgo/Mobile-ReasoningLLM-v0-1.5B"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
# Math problem prompt
prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}."""
temperature=0.6 max-length=64,000 is recommend.
# Code generation prompt
prompt = """It is advisable to include a directive in your prompt such as: "You are an expert Python programmer. You will be given a question (problem specification) and will generate a correct Python program that matches the specification and passes all tests."""
temperature=0.6 max-length=65,536 is recommend
- Downloads last month
- 7
Model tree for deepgo/Mobile-ReasoningLLM-v0
Evaluation results
- Pass@1 (avg16) on AIME 2024self-reported63.100
- Pass@1 (avg16) on AIME 2025self-reported49.600
- Pass@1 (avg16) on MATH-500self-reported88.000
- Pass@1 (avg16) on GSM8kself-reported80.200
- Pass@1 (avg16) on LiveCodeBench V6self-reported30.700