deepgo
/

Mobile-ReasoningLLM-v0.1

Text Generation

Eval Results (legacy)

text-generation-inference

Model card Files Files and versions

Mobile-ReasoningLLM-v0.1 / README.md

deepgo's picture

Update README.md

e8f0b4f verified 4 months ago

|

history blame contribute delete

3.62 kB

	---
	license: cc-by-4.0
	language:
	- en
	base_model: Qwen/Qwen2.5-1.5B
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- DeepMiddleGo
	- math-reasoning
	- fine-tuned
	- qwen
	model-index:
	- name: Mobile-ReasoningLLM-v0-1.5B
	results:
	- task:
	type: text-generation
	name: Math Reasoning
	dataset:
	name: AIME 2024
	type: aime-2024
	metrics:
	- name: Pass@1 (avg16)
	type: pass@1
	value: 73.7
	- task:
	type: text-generation
	name: Math Reasoning
	dataset:
	name: AIME 2025
	type: aime-2025
	metrics:
	- name: Pass@1 (avg16)
	type: pass@1
	value: 63.8
	---
	# Mobile-Flash-ReasoningLLM-v0-1.5B

	## Model Description
	Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
	This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which starts to explore experience learning besides of sparse reward learning in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning.
	In this work, I start to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the difficulty of sparse reward in the RL-Post training stage.
	It takes about 4 days to update Mobile-ReasoningLLM-v0 to Mobile-Flash-ReasoningLLM-v0-1.5B on 8 NVIDIA A800 80G GPUs.

	- Architecture: Dense decoder-only Transformer
	- Base Model: Qwen2.5-1.5B
	- Parameters: 1.5 billion
	- Version: v0 (released October 29, 2025)

	## Intended Use
	- Primary Use: Solving complex math problems.
	- Applications: Research, education, software development, and math reasoning tasks.
	- Limitations: May not handle ambiguous or poorly formatted inputs well. Ethical use is encouraged to avoid harmful applications.

	## Benchmarks
	The model was post-trained on a hybrid dataset (automated, human, synthetic) including:
	- Math datasets: AIME 2024, AIME 2025

	## Evaluation
	The model was evaluated on the following benchmarks, achieving strong performance pass1@avg16:

	\| Model \| AIME24 \| AIME25 \|
	\|--------------------------\|--------\|--------\|
	\| Qwen3-0.6B-base \| 11.3 \| 17.0 \|
	\| MobileLLM-R1-1B \| 15.5 \| 16.3 \|
	\| DeepSeek-Qwen-1.5B \| 29.1 \| 23.4 \|
	\| FastCurl-1.5B-V3 \| 49.6 \| 32.9 \|
	\| Open-Nemotron-1.5B \| 49.7 \| 40.4 \|
	\| Mobile-ReasoningLLM-v0-1.5B \| 63.1 \| 49.6
	\| Mobile-Flash-ReasoningLLM-v0-1.5B \| 73.7 \| 63.8
	\| Qwen3-1.7B \| 47.0 \| 37.0 \|

	## How to Use
	### Requirements
	- Library: `transformers`, `torch`, `vLLM` or `TensorRT-LLM`
	- Hardware: Tested on NVIDIA 8xA800-80GB GPUs
	- Environment: Python 3.10+ (e.g., Conda `hug` environment)

	### Inference Example
	```python
	import transformers
	import torch
	model_id = "deepgo/Mobile-ReasoningLLM-v0.1-1.5B"
	pipeline = transformers.pipeline(
	"text-generation",
	model=model_id,
	model_kwargs={"torch_dtype": torch.bfloat16},
	device_map="auto",
	)
	# Math problem prompt
	prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}."""
	temperature=0.7 max-length=48,000 is recommend.