deepgo
/

Mobile-ReasoningLLM-v0.1

Text Generation

Eval Results (legacy)

text-generation-inference

Model card Files Files and versions

deepgo commited on Oct 29, 2025

Commit

e8f0b4f

·

verified ·

1 Parent(s): 73daaf9

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -38,9 +38,9 @@ model-index:
 ## Model Description
 Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
-This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which start to explore experience learning instead of sparse reward learning in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning.
-In this work, I start to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the difficulity of sparse reward in the RL-Post training stage.
-It takes 6 days to update Mobile-ReasoningLLM-v0 to Mobile-Flash-ReasoningLLM-v0-1.5B on 8 NVIDIA A800 80G GPUs.
 - **Architecture**: Dense decoder-only Transformer
 - **Base Model**: Qwen2.5-1.5B

 ## Model Description
 Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
+This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which **starts to explore experience learning** besides of **sparse reward** learning in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning.
+In this work, I start to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the difficulty of sparse reward in the RL-Post training stage.
+It takes about 4 days to update Mobile-ReasoningLLM-v0 to Mobile-Flash-ReasoningLLM-v0-1.5B on 8 NVIDIA A800 80G GPUs.
 - **Architecture**: Dense decoder-only Transformer
 - **Base Model**: Qwen2.5-1.5B