Update README.md
Browse files
README.md
CHANGED
|
@@ -49,7 +49,7 @@ language:
|
|
| 49 |
<a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue" alt="license"></a>
|
| 50 |
<a href="https://arxiv.org/abs/2412.17743" target="_blank"><img src=https://img.shields.io/badge/arXiv-b5212f.svg?logo=arxiv></a>
|
| 51 |
<a href="https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3"><img alt="Static Badge" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue?color=8A2BE2"></a>
|
| 52 |
-
<a><img src="https://img.shields.io/github/stars/RUC-GSAI/YuLan-Mini"></a>
|
| 53 |
</div>
|
| 54 |
|
| 55 |
YuLan-Mini is a lightweight language model with 2.4 billion parameters. It achieves performance comparable to industry-leading models trained on significantly more data, despite being pre-trained on only 1.08T tokens. The model excels particularly in the domains of **mathematics** and **code**. To facilitate reproducibility, we will open-source the relevant pre-training resources.
|
|
@@ -144,7 +144,7 @@ Optimizer states before annealing will be released in a future update.
|
|
| 144 |
|
| 145 |
<details><summary>5. Data Distribution for every phase</summary>
|
| 146 |
|
| 147 |
-
<a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/final.pdf">High-
|
| 148 |
|
| 149 |
<a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/final.pdf">
|
| 150 |
<div align=center>
|
|
@@ -171,6 +171,14 @@ The synthetic data we are using is released in <a href="https://huggingface.co/c
|
|
| 171 |
Intermediate optimizer states will be released in a future update.
|
| 172 |
</details>
|
| 173 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 174 |
---
|
| 175 |
|
| 176 |
## Quick Start 💻
|
|
@@ -200,6 +208,10 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
|
|
| 200 |
vllm serve yulan-team/YuLan-Mini --dtype bfloat16
|
| 201 |
```
|
| 202 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 203 |
|
| 204 |
---
|
| 205 |
|
|
|
|
| 49 |
<a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue" alt="license"></a>
|
| 50 |
<a href="https://arxiv.org/abs/2412.17743" target="_blank"><img src=https://img.shields.io/badge/arXiv-b5212f.svg?logo=arxiv></a>
|
| 51 |
<a href="https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3"><img alt="Static Badge" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue?color=8A2BE2"></a>
|
| 52 |
+
<a href="https://github.com/RUC-GSAI/YuLan-Mini" target="_blank"><img src="https://img.shields.io/github/stars/RUC-GSAI/YuLan-Mini"></a>
|
| 53 |
</div>
|
| 54 |
|
| 55 |
YuLan-Mini is a lightweight language model with 2.4 billion parameters. It achieves performance comparable to industry-leading models trained on significantly more data, despite being pre-trained on only 1.08T tokens. The model excels particularly in the domains of **mathematics** and **code**. To facilitate reproducibility, we will open-source the relevant pre-training resources.
|
|
|
|
| 144 |
|
| 145 |
<details><summary>5. Data Distribution for every phase</summary>
|
| 146 |
|
| 147 |
+
<a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/final.pdf">High-resolution version</a>
|
| 148 |
|
| 149 |
<a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/final.pdf">
|
| 150 |
<div align=center>
|
|
|
|
| 171 |
Intermediate optimizer states will be released in a future update.
|
| 172 |
</details>
|
| 173 |
|
| 174 |
+
### What you can do with these pre-training resources
|
| 175 |
+
|
| 176 |
+
1. **Pre-train** your own LLM. You can use our data and curriculum to train a model that's just as powerful as YuLan-Mini.
|
| 177 |
+
2. Perform your own **learning rate annealing**. During the annealing phase, YuLan-Mini's learning ability is at its peak. You can resume training from the checkpoint before annealing and use your own dataset for learning rate annealing.
|
| 178 |
+
3. **Fine-tune** the Instruct version of the LLM. You can use the YuLan-Mini base model to train your own Instruct version.
|
| 179 |
+
4. **Training dynamics** research. You can use YuLan-Mini's intermediate checkpoints to explore internal changes during the pre-training process.
|
| 180 |
+
5. **Synthesize** your own data. You can use YuLan-Mini's data pipeline to clean and generate your own dataset.
|
| 181 |
+
|
| 182 |
---
|
| 183 |
|
| 184 |
## Quick Start 💻
|
|
|
|
| 208 |
vllm serve yulan-team/YuLan-Mini --dtype bfloat16
|
| 209 |
```
|
| 210 |
|
| 211 |
+
**SGLang Serve Example**
|
| 212 |
+
```bash
|
| 213 |
+
python -m sglang.launch_server --model-path yulan-team/YuLan-Mini --port 30000 --host 0.0.0.0
|
| 214 |
+
```
|
| 215 |
|
| 216 |
---
|
| 217 |
|