deepgo commited on
Commit
73daaf9
·
verified ·
1 Parent(s): d8ed96d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -39,7 +39,7 @@ model-index:
39
  ## Model Description
40
  Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
41
  This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which start to explore experience learning instead of sparse reward learning in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning.
42
- In this work, westart to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the difficulity of sparse reward in the RL-Post training stage.
43
  It takes 6 days to update Mobile-ReasoningLLM-v0 to Mobile-Flash-ReasoningLLM-v0-1.5B on 8 NVIDIA A800 80G GPUs.
44
 
45
  - **Architecture**: Dense decoder-only Transformer
 
39
  ## Model Description
40
  Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
41
  This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which start to explore experience learning instead of sparse reward learning in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning.
42
+ In this work, I start to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the difficulity of sparse reward in the RL-Post training stage.
43
  It takes 6 days to update Mobile-ReasoningLLM-v0 to Mobile-Flash-ReasoningLLM-v0-1.5B on 8 NVIDIA A800 80G GPUs.
44
 
45
  - **Architecture**: Dense decoder-only Transformer