Update README.md
Browse files
README.md
CHANGED
|
@@ -4,10 +4,14 @@ datasets:
|
|
| 4 |
- cerebras/SlimPajama-627B
|
| 5 |
language:
|
| 6 |
- en
|
|
|
|
| 7 |
---
|
|
|
|
| 8 |
<div align="center">
|
| 9 |
|
|
|
|
| 10 |
# TinyLlama-1.1B-v1.1
|
|
|
|
| 11 |
</div>
|
| 12 |
|
| 13 |
https://github.com/jzhang38/TinyLlama
|
|
@@ -17,9 +21,17 @@ https://github.com/jzhang38/TinyLlama
|
|
| 17 |
<img src="https://huggingface.co/PY007/TinyLlama-1.1B-intermediate-step-240k-503b/resolve/main/TinyLlama_logo.png" width="300"/>
|
| 18 |
</div>
|
| 19 |
|
|
|
|
| 20 |
We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
### Pretraining
|
|
|
|
| 23 |
Due to these issues([bug1](https://whimsical-aphid-86d.notion.site/Release-of-TinyLlama-1-5T-Checkpoints-Postponed-01b266998c1c47f78f5ae1520196d194?pvs=4), [bug2](https://whimsical-aphid-86d.notion.site/2023-12-18-Updates-from-TinyLlama-Team-7d30c01fff794da28ccc952f327c8d4f)). We try to retrain our TinyLlama to provide a better model. We train our model with 2T tokens and divided our pretraining into 3 different stages: 1) basic pretraining, 2) continual pretraining with specific domain, and 3) cooldown .
|
| 24 |
|
| 25 |
|
|
@@ -49,8 +61,10 @@ Following an extensive and detailed pretraining process. We are now releasing th
|
|
| 49 |
|
| 50 |
|
| 51 |
### How to use
|
|
|
|
| 52 |
You will need the transformers>=4.31
|
| 53 |
Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) GitHub page for more information.
|
|
|
|
| 54 |
```
|
| 55 |
from transformers import AutoTokenizer
|
| 56 |
import transformers
|
|
@@ -78,8 +92,9 @@ for seq in sequences:
|
|
| 78 |
```
|
| 79 |
|
| 80 |
### Eval
|
|
|
|
| 81 |
| Model | Pretrain Tokens | HellaSwag | Obqa | WinoGrande | ARC_c | ARC_e | boolq | piqa | avg |
|
| 82 |
| ----------------------------------------- | --------------- | --------- | --------- | ---------- | --------- | --------- | ----- | --------- | --------- |
|
| 83 |
| Pythia-1.0B | 300B | 47.16 | 31.40 | 53.43 | 27.05 | 48.99 | 60.83 | 69.21 | 48.30 |
|
| 84 |
| TinyLlama-1.1B-intermediate-step-1431k-3T | 3T | 59.20 | 36.00 | 59.12 | 30.12 | 55.25 | 57.83 | 73.29 | 52.99 |
|
| 85 |
-
| TinyLlama-1.1B-v1.1
|
|
|
|
| 4 |
- cerebras/SlimPajama-627B
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
+
|
| 8 |
---
|
| 9 |
+
|
| 10 |
<div align="center">
|
| 11 |
|
| 12 |
+
|
| 13 |
# TinyLlama-1.1B-v1.1
|
| 14 |
+
|
| 15 |
</div>
|
| 16 |
|
| 17 |
https://github.com/jzhang38/TinyLlama
|
|
|
|
| 21 |
<img src="https://huggingface.co/PY007/TinyLlama-1.1B-intermediate-step-240k-503b/resolve/main/TinyLlama_logo.png" width="300"/>
|
| 22 |
</div>
|
| 23 |
|
| 24 |
+
|
| 25 |
We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
|
| 26 |
|
| 27 |
+
### Overview
|
| 28 |
+
|
| 29 |
+
In this project, rather than only training a single TinyLlama model, we first train TinyLlama on a corpus of 1.5 trillion tokens to obtain foundational language capabilities. Subsequently, we take this model and turn it into three different models by continual pre-training with three distinct data sampling. For a visual representation of this process, please refer to the figure below.
|
| 30 |
+
|
| 31 |
+

|
| 32 |
+
|
| 33 |
### Pretraining
|
| 34 |
+
|
| 35 |
Due to these issues([bug1](https://whimsical-aphid-86d.notion.site/Release-of-TinyLlama-1-5T-Checkpoints-Postponed-01b266998c1c47f78f5ae1520196d194?pvs=4), [bug2](https://whimsical-aphid-86d.notion.site/2023-12-18-Updates-from-TinyLlama-Team-7d30c01fff794da28ccc952f327c8d4f)). We try to retrain our TinyLlama to provide a better model. We train our model with 2T tokens and divided our pretraining into 3 different stages: 1) basic pretraining, 2) continual pretraining with specific domain, and 3) cooldown .
|
| 36 |
|
| 37 |
|
|
|
|
| 61 |
|
| 62 |
|
| 63 |
### How to use
|
| 64 |
+
|
| 65 |
You will need the transformers>=4.31
|
| 66 |
Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) GitHub page for more information.
|
| 67 |
+
|
| 68 |
```
|
| 69 |
from transformers import AutoTokenizer
|
| 70 |
import transformers
|
|
|
|
| 92 |
```
|
| 93 |
|
| 94 |
### Eval
|
| 95 |
+
|
| 96 |
| Model | Pretrain Tokens | HellaSwag | Obqa | WinoGrande | ARC_c | ARC_e | boolq | piqa | avg |
|
| 97 |
| ----------------------------------------- | --------------- | --------- | --------- | ---------- | --------- | --------- | ----- | --------- | --------- |
|
| 98 |
| Pythia-1.0B | 300B | 47.16 | 31.40 | 53.43 | 27.05 | 48.99 | 60.83 | 69.21 | 48.30 |
|
| 99 |
| TinyLlama-1.1B-intermediate-step-1431k-3T | 3T | 59.20 | 36.00 | 59.12 | 30.12 | 55.25 | 57.83 | 73.29 | 52.99 |
|
| 100 |
+
| TinyLlama-1.1B-v1.1 | 2T | **61.47** | **36.80** | **59.43** | **32.68** | **55.47** | 55.99 | **73.56** | **53.63** |
|