|
|
--- |
|
|
license: cc-by-4.0 |
|
|
language: |
|
|
- en |
|
|
base_model: Qwen/Qwen2.5-1.5B |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
tags: |
|
|
- DeepMiddleGo |
|
|
- math-reasoning |
|
|
- fine-tuned |
|
|
- qwen |
|
|
model-index: |
|
|
- name: Mobile-ReasoningLLM-v0-1.5B |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Math Reasoning |
|
|
dataset: |
|
|
name: AIME 2024 |
|
|
type: aime-2024 |
|
|
metrics: |
|
|
- name: Pass@1 (avg16) |
|
|
type: pass@1 |
|
|
value: 73.7 |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Math Reasoning |
|
|
dataset: |
|
|
name: AIME 2025 |
|
|
type: aime-2025 |
|
|
metrics: |
|
|
- name: Pass@1 (avg16) |
|
|
type: pass@1 |
|
|
value: 63.8 |
|
|
--- |
|
|
# Mobile-Flash-ReasoningLLM-v0-1.5B |
|
|
|
|
|
## Model Description |
|
|
Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use. |
|
|
This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which **starts to explore experience learning** besides of **sparse reward** learning in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning. |
|
|
In this work, I start to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the difficulty of sparse reward in the RL-Post training stage. |
|
|
It takes about 4 days to update Mobile-ReasoningLLM-v0 to Mobile-Flash-ReasoningLLM-v0-1.5B on 8 NVIDIA A800 80G GPUs. |
|
|
|
|
|
- **Architecture**: Dense decoder-only Transformer |
|
|
- **Base Model**: Qwen2.5-1.5B |
|
|
- **Parameters**: 1.5 billion |
|
|
- **Version**: v0 (released October 29, 2025) |
|
|
|
|
|
## Intended Use |
|
|
- **Primary Use**: Solving complex math problems. |
|
|
- **Applications**: Research, education, software development, and math reasoning tasks. |
|
|
- **Limitations**: May not handle ambiguous or poorly formatted inputs well. Ethical use is encouraged to avoid harmful applications. |
|
|
|
|
|
## Benchmarks |
|
|
The model was post-trained on a hybrid dataset (automated, human, synthetic) including: |
|
|
- Math datasets: AIME 2024, AIME 2025 |
|
|
|
|
|
## Evaluation |
|
|
The model was evaluated on the following benchmarks, achieving strong performance pass1@avg16: |
|
|
|
|
|
| Model | AIME24 | AIME25 | |
|
|
|--------------------------|--------|--------| |
|
|
| Qwen3-0.6B-base | 11.3 | 17.0 | |
|
|
| MobileLLM-R1-1B | 15.5 | 16.3 | |
|
|
| DeepSeek-Qwen-1.5B | 29.1 | 23.4 | |
|
|
| FastCurl-1.5B-V3 | 49.6 | 32.9 | |
|
|
| Open-Nemotron-1.5B | 49.7 | 40.4 | |
|
|
| **Mobile-ReasoningLLM-v0-1.5B** | **63.1** | **49.6** |
|
|
| **Mobile-Flash-ReasoningLLM-v0-1.5B** | **73.7** | **63.8** |
|
|
| Qwen3-1.7B | 47.0 | 37.0 | |
|
|
|
|
|
## How to Use |
|
|
### Requirements |
|
|
- **Library**: `transformers`, `torch`, `vLLM` or `TensorRT-LLM` |
|
|
- **Hardware**: Tested on NVIDIA 8xA800-80GB GPUs |
|
|
- **Environment**: Python 3.10+ (e.g., Conda `hug` environment) |
|
|
|
|
|
### Inference Example |
|
|
```python |
|
|
import transformers |
|
|
import torch |
|
|
model_id = "deepgo/Mobile-ReasoningLLM-v0.1-1.5B" |
|
|
pipeline = transformers.pipeline( |
|
|
"text-generation", |
|
|
model=model_id, |
|
|
model_kwargs={"torch_dtype": torch.bfloat16}, |
|
|
device_map="auto", |
|
|
) |
|
|
# Math problem prompt |
|
|
prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}.""" |
|
|
temperature=0.7 max-length=48,000 is recommend. |