deepgo commited on
Commit
e8f0b4f
·
verified ·
1 Parent(s): 73daaf9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -38,9 +38,9 @@ model-index:
38
 
39
  ## Model Description
40
  Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
41
- This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which start to explore experience learning instead of sparse reward learning in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning.
42
- In this work, I start to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the difficulity of sparse reward in the RL-Post training stage.
43
- It takes 6 days to update Mobile-ReasoningLLM-v0 to Mobile-Flash-ReasoningLLM-v0-1.5B on 8 NVIDIA A800 80G GPUs.
44
 
45
  - **Architecture**: Dense decoder-only Transformer
46
  - **Base Model**: Qwen2.5-1.5B
 
38
 
39
  ## Model Description
40
  Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
41
+ This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which **starts to explore experience learning** besides of **sparse reward** learning in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning.
42
+ In this work, I start to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the difficulty of sparse reward in the RL-Post training stage.
43
+ It takes about 4 days to update Mobile-ReasoningLLM-v0 to Mobile-Flash-ReasoningLLM-v0-1.5B on 8 NVIDIA A800 80G GPUs.
44
 
45
  - **Architecture**: Dense decoder-only Transformer
46
  - **Base Model**: Qwen2.5-1.5B