Update README.md
Browse files
README.md
CHANGED
|
@@ -38,9 +38,9 @@ model-index:
|
|
| 38 |
|
| 39 |
## Model Description
|
| 40 |
Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
|
| 41 |
-
This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which
|
| 42 |
-
In this work, I start to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the
|
| 43 |
-
It takes
|
| 44 |
|
| 45 |
- **Architecture**: Dense decoder-only Transformer
|
| 46 |
- **Base Model**: Qwen2.5-1.5B
|
|
|
|
| 38 |
|
| 39 |
## Model Description
|
| 40 |
Mobile-ReasoningLLM-v0-1.5B is a fine-tuned derivative of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B), optimized for reasoning tasks in mathematics generation. It supports up to 48K output tokens for math problems. This model is designed for both commercial and non-commercial research use.
|
| 41 |
+
This repository contains the evluation code of Mobile-ReasoningLLM-v0.1(Mobile-Flash-ReasoningLLM-v0-1.5B) which **starts to explore experience learning** besides of **sparse reward** learning in the reinforcement learning after R1-Like reinforcement learning and it's variants including curriculumn learning.
|
| 42 |
+
In this work, I start to explore the rl training algorithm after pre-training, r1-reinforcement learning, r1-curriculumn reinforcement learning to reduce the difficulty of sparse reward in the RL-Post training stage.
|
| 43 |
+
It takes about 4 days to update Mobile-ReasoningLLM-v0 to Mobile-Flash-ReasoningLLM-v0-1.5B on 8 NVIDIA A800 80G GPUs.
|
| 44 |
|
| 45 |
- **Architecture**: Dense decoder-only Transformer
|
| 46 |
- **Base Model**: Qwen2.5-1.5B
|