deepgo
/

Mobile-ReasoningLLM-v0.1

Text Generation

Eval Results (legacy)

text-generation-inference

Model card Files Files and versions

deepgo commited on Oct 29, 2025

Commit

73daaf9

·

verified ·

1 Parent(s): d8ed96d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -39,7 +39,7 @@ model-index:
 ## Model Description
 Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
 This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which start to explore experience learning instead of sparse reward learning in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning.
-In this work, westart to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the difficulity of sparse reward in the RL-Post training stage.
 It takes 6 days to update Mobile-ReasoningLLM-v0 to Mobile-Flash-ReasoningLLM-v0-1.5B on 8 NVIDIA A800 80G GPUs.
 - **Architecture**: Dense decoder-only Transformer

 ## Model Description
 Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
 This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which start to explore experience learning instead of sparse reward learning in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning.
+In this work, I start to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the difficulity of sparse reward in the RL-Post training stage.
 It takes 6 days to update Mobile-ReasoningLLM-v0 to Mobile-Flash-ReasoningLLM-v0-1.5B on 8 NVIDIA A800 80G GPUs.
 - **Architecture**: Dense decoder-only Transformer