Instructions to use deepgo/Mobile-ReasoningLLM-v0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deepgo/Mobile-ReasoningLLM-v0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepgo/Mobile-ReasoningLLM-v0")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepgo/Mobile-ReasoningLLM-v0")
model = AutoModelForCausalLM.from_pretrained("deepgo/Mobile-ReasoningLLM-v0")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use deepgo/Mobile-ReasoningLLM-v0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deepgo/Mobile-ReasoningLLM-v0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepgo/Mobile-ReasoningLLM-v0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/deepgo/Mobile-ReasoningLLM-v0

SGLang

How to use deepgo/Mobile-ReasoningLLM-v0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "deepgo/Mobile-ReasoningLLM-v0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepgo/Mobile-ReasoningLLM-v0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "deepgo/Mobile-ReasoningLLM-v0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepgo/Mobile-ReasoningLLM-v0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use deepgo/Mobile-ReasoningLLM-v0 with Docker Model Runner:
```
docker model run hf.co/deepgo/Mobile-ReasoningLLM-v0
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Mobile-ReasoningLLM-v0-1.5B

Model Description

Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of Qwen2.5-1.5B, optimized for reasoning tasks in mathematics and code generation. It supports up to 64K output tokens for math problems and 65K tokens for code generation. This model is designed for both commercial and non-commercial research use. This repository contains the evluation code of Mobile-ReasoningLLM-v0 which start to update the reference model in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning. In this work, we comprehensively consider to start to free the weights of refrence model in the contiue learning of Reasoning LLMs which are already learned after R1-Like reinforcement learning and its variants. In our version zero, we further demonstrate that our design of reforcement learning enhance the reasoning ability of small language models, with SoTA results for 5 reasoning benchmarks Mobile-Reasoning-LLM-1.5B. It takes the 30 days to train Mobile-ReasoningLLM-v0 on 1T Tokens using 8 NVIDIA A800 80G GPUs following pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning, and updaets reference model in the continue r1-reinforcement learning.

Architecture: Dense decoder-only Transformer
Base Model: Qwen2.5-1.5B
Parameters: 1.5 billion
Version: v0 (released September 29, 2025)

Intended Use

Primary Use: Solving complex math problems and generating correct code solutions.
Applications: Research, education, software development, and math reasoning tasks.
Limitations: May not handle ambiguous or poorly formatted inputs well. Ethical use is encouraged to avoid harmful applications.

Benchmarks

The model was post-trained on a hybrid dataset (automated, human, synthetic) including:

Math datasets: AIME 2024, AIME 2025, MATH-500, GSM8k.
Code dataset: LiveCodeBench V6 (date range: 2408–2505).

Evaluation

The model was evaluated on the following benchmarks, achieving strong performance:

Model	AIME24	AIME25	MATH-500	GSM8k	LiveCodeBench*
Qwen3-0.6B-base	11.3	17.0	73.0	79.2	14.9
MobileLLM-R1-1B	15.5	16.3	74.0	67.5	19.9
DeepSeek-Qwen-1.5B	29.1	23.4	83.4	77.3	19.9
FastCurl-1.5B-V3	49.6	32.9	90.5	---	---
Open-Nemotron-1.5B	49.7	40.4	83.4	76.7	28.3
Mobile-ReasoningLLM-v0-1.5B	63.1	49.6	88.0	80.2	30.7
Qwen3-1.7B	47.0	37.0	89.4	90.3	29.8

How to Use

Requirements

Library: transformers, torch, vLLM or TensorRT-LLM
Hardware: Tested on NVIDIA 8xA800-80GB GPUs
Environment: Python 3.10+ (e.g., Conda hug environment)

Inference Example

import transformers
import torch

model_id = "deepgo/Mobile-ReasoningLLM-v0-1.5B"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

# Math problem prompt
prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}."""
temperature=0.6 max-length=64,000 is recommend. 

# Code generation prompt
prompt = """It is advisable to include a directive in your prompt such as: "You are an expert Python programmer. You will be given a question (problem specification) and will generate a correct Python program that matches the specification and passes all tests."""
temperature=0.6 max-length=65,536 is recommend

Downloads last month: -

Safetensors

Model size

2B params

Tensor type

F32

Model tree for deepgo/Mobile-ReasoningLLM-v0

Base model

Qwen/Qwen2.5-1.5B

Finetuned

(351)

this model

Quantizations

1 model

Evaluation results

Pass@1 (avg16) on AIME 2024
self-reported

63.100
Pass@1 (avg16) on AIME 2025
self-reported

49.600
Pass@1 (avg16) on MATH-500
self-reported

88.000
Pass@1 (avg16) on GSM8k
self-reported

80.200
Pass@1 (avg16) on LiveCodeBench V6
self-reported

30.700