deepgo
/

Mobile-Flash-v1-1.5B

Text Generation

Eval Results (legacy)

Model card Files Files and versions

Mobile-Flash-v1-1.5B / README.md

deepgo's picture

Create README.md

531035b verified 19 days ago

|

history blame contribute delete

3.53 kB

	---
	license: cc-by-4.0
	language:
	- en
	base_model: Qwen/Qwen2.5-1.5B
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- DeepMiddleGo
	- math-reasoning
	- fine-tuned
	- qwen
	model-index:
	- name: Mobile-ReasoningLLM-v0-1.5B
	results:
	- task:
	type: text-generation
	name: Math Reasoning
	dataset:
	name: AIME 2024
	type: aime-2024
	metrics:
	- name: Pass@1 (avg16)
	type: pass@1
	value: 90.0
	- task:
	type: text-generation
	name: Math Reasoning
	dataset:
	name: AIME 2025
	type: aime-2025
	metrics:
	- name: Pass@1 (avg16)
	type: pass@1
	value: 76.7
	---
	# Mobile-Flash-v1-1.5B

	## Model Description
	Mobile-Flash-v1-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 40K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
	This repository contains the evluation code of Mobile-Flash-v1-1.5B which starts to explore Self-RL learning besides of sparse reward learning in the reinforcement learning. In this work, I start to explore the self-rl training algorithm after pre-training, r1-reinforcement learning,
	r1-curriculumn reinforcement learning to reduce the difficulty of sparse reward and inefficiency in the RL-Post training stage.

	- Architecture: Dense decoder-only Transformer
	- Base Model: Qwen2.5-1.5B
	- Parameters: 1.5 billion
	- Version: v1 (released Feb 12, 2026)

	## Intended Use
	- Primary Use: Solving complex math problems.
	- Applications: Research, education, software development, and math reasoning tasks.
	- Limitations: May not handle ambiguous or poorly formatted inputs well. Ethical use is encouraged to avoid harmful applications.

	## Benchmarks
	The model was post-trained on a hybrid dataset (automated, human, synthetic) including:
	- Math datasets: AIME 2024, AIME 2025

	## Evaluation
	The model was evaluated on the following benchmarks, achieving strong performance pass1@avg16:

	\| Model(1.5B) \| AIME24 \| AIME25 \|
	\|--------------------------\|--------\|--------\|
	\| Mobile-ReasoningLLM-v0-1.5B \| 60.0 \| 45.0 \|
	\| Mobile-Flash-ReasoningLLM-v0-1.5B \| 70.0 \| 60.0 \|
	\| Viber-Thinker-1.5B \| 78.0 \| 70.0 \|
	\| Mobile-Flash-v1-1.5B \| 90.0 \| 76.7 \|
	\| Model(>235B) \| AIME24 \| AIME25 \|
	\| GPT-5.2 \|97.0+ \|97.0+ \|
	\| Grok-4 \|97.0+ \|97.0+ \|
	\| Gemini-3-Pro \|97.0+ \|97.0+ \|
	\| GPT-OSS-120B \| 96.6 \| 97.9 \|
	\| GPT-OSS-20B \| 96.0 \| 98.7 \|
	\| Grok 3 Mini \| 95.8 \| 93.3 \|
	\| o4-mini \| 93.4 \| 92.7 \|
	\| o3 \| 91.6 \| 86.5 \|
	\| DeepSeek-R1-0528(671B) \| 91.4 \| 87.5 \|
	\| Qwen-3(235B) \| 85.7 \| 81.5 \|

	## How to Use
	### Requirements
	- Library: `transformers`, `torch`, `vLLM` or `TensorRT-LLM`
	- Hardware: Trained and Tested on NVIDIA 8xA100-80GB GPUs
	- Environment: Python 3.10+ (e.g., Conda `hug` environment)

	### Inference Example
	```python
	import transformers
	import torch
	model_id = "deepgo/Mobile-Flash-v1-1.5B"
	pipeline = transformers.pipeline(
	"text-generation",
	model=model_id,
	model_kwargs={"torch_dtype": torch.bfloat16},
	device_map="auto",
	)
	# Math problem prompt
	prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}."""
	max-length=40,000 is recommend.(reduced from 48,000 to 40,000)