---
license: cc-by-4.0
language:
  - en
base_model: Qwen/Qwen2.5-1.5B
pipeline_tag: text-generation
library_name: transformers
tags:
  - DeepMiddleGo
  - code
  - math-reasoning
  - fine-tuned
  - qwen
model-index:
  - name: Mobile-ReasoningLLM-v0-1.5B
    results:
      - task:
          type: text-generation
          name: Math Reasoning
        dataset:
          name: AIME 2024
          type: aime-2024
        metrics:
          - name: Pass@1 (avg16)
            type: pass@1
            value: 63.1
      - task:
          type: text-generation
          name: Math Reasoning
        dataset:
          name: AIME 2025
          type: aime-2025
        metrics:
          - name: Pass@1 (avg16)
            type: pass@1
            value: 49.6
      - task:
          type: text-generation
          name: Math Reasoning
        dataset:
          name: MATH-500
          type: math-500
        metrics:
          - name: Pass@1 (avg16)
            type: pass@1
            value: 88.0
      - task:
          type: text-generation
          name: Math Reasoning
        dataset:
          name: GSM8k
          type: gsm8k
        metrics:
          - name: Pass@1 (avg16)
            type: pass@1
            value: 80.2
      - task:
          type: text-generation
          name: Code Generation
        dataset:
          name: LiveCodeBench V6
          type: livecodebench-v6
          args: date_range=2408-2505
        metrics:
          - name: Pass@1 (avg16)
            type: pass@1
            value: 30.7
---

# Mobile-ReasoningLLM-v0-1.5B

## Model Description
Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics and code generation. It supports up to 64K output tokens for math problems and 65K tokens for code generation. This model is designed for both commercial and non-commercial research use.
This repository contains the evluation code of Mobile-ReasoningLLM-v0 which start to update the reference model in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning. 
In this work, we comprehensively consider to start to free the weights of refrence model in the contiue learning of Reasoning LLMs which are already learned after R1-Like reinforcement learning and its variants. In our version zero, we further demonstrate that our design of reforcement learning enhance the reasoning ability of small language models, with SoTA results for 5 reasoning benchmarks Mobile-Reasoning-LLM-1.5B.
It takes the 30 days to train Mobile-ReasoningLLM-v0 on 1T Tokens using 8 NVIDIA A800 80G GPUs following pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning, and updaets reference model in the continue r1-reinforcement learning.

- **Architecture**: Dense decoder-only Transformer
- **Base Model**: Qwen2.5-1.5B
- **Parameters**: 1.5 billion
- **Version**: v0 (released September 29, 2025)
  
## Intended Use
- **Primary Use**: Solving complex math problems and generating correct code solutions.
- **Applications**: Research, education, software development, and math reasoning tasks.
- **Limitations**: May not handle ambiguous or poorly formatted inputs well. Ethical use is encouraged to avoid harmful applications.

## Benchmarks
The model was post-trained on a hybrid dataset (automated, human, synthetic) including:
- Math datasets: AIME 2024, AIME 2025, MATH-500, GSM8k.
- Code dataset: LiveCodeBench V6 (date range: 2408–2505).

## Evaluation
The model was evaluated on the following benchmarks, achieving strong performance:

| Model                    | AIME24 | AIME25 | MATH-500 | GSM8k | LiveCodeBench* |
|--------------------------|--------|--------|----------|-------|----------------|
| Qwen3-0.6B-base         | 11.3   | 17.0   | 73.0     | 79.2  | 14.9           |
| MobileLLM-R1-1B         | 15.5   | 16.3   | 74.0     | 67.5  | 19.9           |
| DeepSeek-Qwen-1.5B      | 29.1   | 23.4   | 83.4     | 77.3  | 19.9           |
| FastCurl-1.5B-V3        | 49.6   | 32.9   | **90.5** | ---   | ---            |
| Open-Nemotron-1.5B      | 49.7   | 40.4   | 83.4     | 76.7  | 28.3           |
| **Mobile-ReasoningLLM-v0-1.5B** | **63.1** | **49.6** | 88.0 | 80.2  | **30.7**       |
| Qwen3-1.7B              | 47.0   | 37.0   | 89.4     | **90.3** | 29.8      |

## How to Use
### Requirements
- **Library**: `transformers`, `torch`, `vLLM` or `TensorRT-LLM`
- **Hardware**: Tested on NVIDIA 8xA800-80GB GPUs
- **Environment**: Python 3.10+ (e.g., Conda `hug` environment)

### Inference Example
```python
import transformers
import torch

model_id = "deepgo/Mobile-ReasoningLLM-v0-1.5B"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

# Math problem prompt
prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}."""
temperature=0.6 max-length=64,000 is recommend. 

# Code generation prompt
prompt = """It is advisable to include a directive in your prompt such as: "You are an expert Python programmer. You will be given a question (problem specification) and will generate a correct Python program that matches the specification and passes all tests."""
temperature=0.6 max-length=65,536 is recommend