--- license: cc-by-4.0 language: - en base_model: Qwen/Qwen2.5-1.5B pipeline_tag: text-generation library_name: transformers tags: - DeepMiddleGo - code - math-reasoning - fine-tuned - qwen model-index: - name: Mobile-ReasoningLLM-v0-1.5B results: - task: type: text-generation name: Math Reasoning dataset: name: AIME 2024 type: aime-2024 metrics: - name: Pass@1 (avg16) type: pass@1 value: 63.1 - task: type: text-generation name: Math Reasoning dataset: name: AIME 2025 type: aime-2025 metrics: - name: Pass@1 (avg16) type: pass@1 value: 49.6 - task: type: text-generation name: Math Reasoning dataset: name: MATH-500 type: math-500 metrics: - name: Pass@1 (avg16) type: pass@1 value: 88.0 - task: type: text-generation name: Math Reasoning dataset: name: GSM8k type: gsm8k metrics: - name: Pass@1 (avg16) type: pass@1 value: 80.2 - task: type: text-generation name: Code Generation dataset: name: LiveCodeBench V6 type: livecodebench-v6 args: date_range=2408-2505 metrics: - name: Pass@1 (avg16) type: pass@1 value: 30.7 --- # Mobile-ReasoningLLM-v0-1.5B ## Model Description Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics and code generation. It supports up to 64K output tokens for math problems and 65K tokens for code generation. This model is designed for both commercial and non-commercial research use. This repository contains the evluation code of Mobile-ReasoningLLM-v0 which start to update the reference model in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning. In this work, we comprehensively consider to start to free the weights of refrence model in the contiue learning of Reasoning LLMs which are already learned after R1-Like reinforcement learning and its variants. In our version zero, we further demonstrate that our design of reforcement learning enhance the reasoning ability of small language models, with SoTA results for 5 reasoning benchmarks Mobile-Reasoning-LLM-1.5B. It takes the 30 days to train Mobile-ReasoningLLM-v0 on 1T Tokens using 8 NVIDIA A800 80G GPUs following pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning, and updaets reference model in the continue r1-reinforcement learning. - **Architecture**: Dense decoder-only Transformer - **Base Model**: Qwen2.5-1.5B - **Parameters**: 1.5 billion - **Version**: v0 (released September 29, 2025) ## Intended Use - **Primary Use**: Solving complex math problems and generating correct code solutions. - **Applications**: Research, education, software development, and math reasoning tasks. - **Limitations**: May not handle ambiguous or poorly formatted inputs well. Ethical use is encouraged to avoid harmful applications. ## Benchmarks The model was post-trained on a hybrid dataset (automated, human, synthetic) including: - Math datasets: AIME 2024, AIME 2025, MATH-500, GSM8k. - Code dataset: LiveCodeBench V6 (date range: 2408–2505). ## Evaluation The model was evaluated on the following benchmarks, achieving strong performance: | Model | AIME24 | AIME25 | MATH-500 | GSM8k | LiveCodeBench* | |--------------------------|--------|--------|----------|-------|----------------| | Qwen3-0.6B-base | 11.3 | 17.0 | 73.0 | 79.2 | 14.9 | | MobileLLM-R1-1B | 15.5 | 16.3 | 74.0 | 67.5 | 19.9 | | DeepSeek-Qwen-1.5B | 29.1 | 23.4 | 83.4 | 77.3 | 19.9 | | FastCurl-1.5B-V3 | 49.6 | 32.9 | **90.5** | --- | --- | | Open-Nemotron-1.5B | 49.7 | 40.4 | 83.4 | 76.7 | 28.3 | | **Mobile-ReasoningLLM-v0-1.5B** | **63.1** | **49.6** | 88.0 | 80.2 | **30.7** | | Qwen3-1.7B | 47.0 | 37.0 | 89.4 | **90.3** | 29.8 | ## How to Use ### Requirements - **Library**: `transformers`, `torch`, `vLLM` or `TensorRT-LLM` - **Hardware**: Tested on NVIDIA 8xA800-80GB GPUs - **Environment**: Python 3.10+ (e.g., Conda `hug` environment) ### Inference Example ```python import transformers import torch model_id = "deepgo/Mobile-ReasoningLLM-v0-1.5B" pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto", ) # Math problem prompt prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}.""" temperature=0.6 max-length=64,000 is recommend. # Code generation prompt prompt = """It is advisable to include a directive in your prompt such as: "You are an expert Python programmer. You will be given a question (problem specification) and will generate a correct Python program that matches the specification and passes all tests.""" temperature=0.6 max-length=65,536 is recommend