Mobile-ReasoningLLM-v0-1.5B

Model Description

Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of Qwen2.5-1.5B, optimized for reasoning tasks in mathematics and code generation. It supports up to 64K output tokens for math problems and 65K tokens for code generation. This model is designed for both commercial and non-commercial research use. This repository contains the evluation code of Mobile-ReasoningLLM-v0 which start to update the reference model in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning. In this work, we comprehensively consider to start to free the weights of refrence model in the contiue learning of Reasoning LLMs which are already learned after R1-Like reinforcement learning and its variants. In our version zero, we further demonstrate that our design of reforcement learning enhance the reasoning ability of small language models, with SoTA results for 5 reasoning benchmarks Mobile-Reasoning-LLM-1.5B. It takes the 30 days to train Mobile-ReasoningLLM-v0 on 1T Tokens using 8 NVIDIA A800 80G GPUs following pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning, and updaets reference model in the continue r1-reinforcement learning.

  • Architecture: Dense decoder-only Transformer
  • Base Model: Qwen2.5-1.5B
  • Parameters: 1.5 billion
  • Version: v0 (released September 29, 2025)

Intended Use

  • Primary Use: Solving complex math problems and generating correct code solutions.
  • Applications: Research, education, software development, and math reasoning tasks.
  • Limitations: May not handle ambiguous or poorly formatted inputs well. Ethical use is encouraged to avoid harmful applications.

Benchmarks

The model was post-trained on a hybrid dataset (automated, human, synthetic) including:

  • Math datasets: AIME 2024, AIME 2025, MATH-500, GSM8k.
  • Code dataset: LiveCodeBench V6 (date range: 2408–2505).

Evaluation

The model was evaluated on the following benchmarks, achieving strong performance:

Model AIME24 AIME25 MATH-500 GSM8k LiveCodeBench*
Qwen3-0.6B-base 11.3 17.0 73.0 79.2 14.9
MobileLLM-R1-1B 15.5 16.3 74.0 67.5 19.9
DeepSeek-Qwen-1.5B 29.1 23.4 83.4 77.3 19.9
FastCurl-1.5B-V3 49.6 32.9 90.5 --- ---
Open-Nemotron-1.5B 49.7 40.4 83.4 76.7 28.3
Mobile-ReasoningLLM-v0-1.5B 63.1 49.6 88.0 80.2 30.7
Qwen3-1.7B 47.0 37.0 89.4 90.3 29.8

How to Use

Requirements

  • Library: transformers, torch, vLLM or TensorRT-LLM
  • Hardware: Tested on NVIDIA 8xA800-80GB GPUs
  • Environment: Python 3.10+ (e.g., Conda hug environment)

Inference Example

import transformers
import torch

model_id = "deepgo/Mobile-ReasoningLLM-v0-1.5B"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

# Math problem prompt
prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}."""
temperature=0.6 max-length=64,000 is recommend. 

# Code generation prompt
prompt = """It is advisable to include a directive in your prompt such as: "You are an expert Python programmer. You will be given a question (problem specification) and will generate a correct Python program that matches the specification and passes all tests."""
temperature=0.6 max-length=65,536 is recommend
Downloads last month
7
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for deepgo/Mobile-ReasoningLLM-v0

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(290)
this model
Quantizations
1 model

Evaluation results