Update README.md
Browse filesMiscalculated model size
README.md
CHANGED
|
@@ -9,33 +9,32 @@ tags:
|
|
| 9 |
- babylm
|
| 10 |
- tinyllama
|
| 11 |
- tiny
|
| 12 |
-
- 15M
|
| 13 |
---
|
| 14 |
|
| 15 |
## Tiny-LM-15M
|
| 16 |
-
A nano-sized language model (
|
| 17 |
Despite its tiny size, it achieves a significant portion of GPT-2's (124M) performance by training on distilled and
|
| 18 |
simplified English datasets.
|
| 19 |
|
| 20 |
This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M).
|
| 21 |
-
The results show that Tiny-LM-
|
| 22 |
|
| 23 |
## Performance Comparison
|
| 24 |
|
| 25 |
This model was evaluated using the `lm-evaluation-harness` against OpenAI's GPT-2 (124M). The results show that **Tiny-LM-15M** punches far above its weight class:
|
| 26 |
|
| 27 |
-
| Task | Tiny-LM (
|
| 28 |
| --- | --- | --- | --- |
|
| 29 |
| **ARC-Easy** (acc_norm) | **31.73%** | 39.48% | **80.4%** |
|
| 30 |
| **HellaSwag** (acc_norm) | **27.00%** | 31.14% | **86.7%** |
|
| 31 |
|
| 32 |
-
> **Key Takeaway:** With only **
|
| 33 |
|
| 34 |
## Model Architecture
|
| 35 |
|
| 36 |
The model is based on the **Llama-2 architecture** with several modern optimizations:
|
| 37 |
|
| 38 |
-
* **Parameters:**
|
| 39 |
* **Layers:** 6
|
| 40 |
* **Attention Heads:** 6
|
| 41 |
* **Hidden Dimension:** 288
|
|
@@ -59,7 +58,7 @@ You can use this model directly with the Hugging Face `transformers` library:
|
|
| 59 |
```python
|
| 60 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 61 |
|
| 62 |
-
model_id = "sixf0ur/tiny-lm-
|
| 63 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 64 |
model = AutoModelForCausalLM.from_pretrained(model_id)
|
| 65 |
|
|
|
|
| 9 |
- babylm
|
| 10 |
- tinyllama
|
| 11 |
- tiny
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
## Tiny-LM-15M
|
| 15 |
+
A nano-sized language model (8M parameters) that demonstrates the power of high-quality synthetic data.
|
| 16 |
Despite its tiny size, it achieves a significant portion of GPT-2's (124M) performance by training on distilled and
|
| 17 |
simplified English datasets.
|
| 18 |
|
| 19 |
This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M).
|
| 20 |
+
The results show that Tiny-LM-8M punches far above its weight class:
|
| 21 |
|
| 22 |
## Performance Comparison
|
| 23 |
|
| 24 |
This model was evaluated using the `lm-evaluation-harness` against OpenAI's GPT-2 (124M). The results show that **Tiny-LM-15M** punches far above its weight class:
|
| 25 |
|
| 26 |
+
| Task | Tiny-LM (8M) | GPT-2 (124M) | % of GPT-2 Perf. |
|
| 27 |
| --- | --- | --- | --- |
|
| 28 |
| **ARC-Easy** (acc_norm) | **31.73%** | 39.48% | **80.4%** |
|
| 29 |
| **HellaSwag** (acc_norm) | **27.00%** | 31.14% | **86.7%** |
|
| 30 |
|
| 31 |
+
> **Key Takeaway:** With only **6.4% of the parameters**, this model achieves over **80% of the reasoning performance** of GPT-2, proving that modern architectures combined with curated data can drastically reduce model size.
|
| 32 |
|
| 33 |
## Model Architecture
|
| 34 |
|
| 35 |
The model is based on the **Llama-2 architecture** with several modern optimizations:
|
| 36 |
|
| 37 |
+
* **Parameters:** 8.4 Million
|
| 38 |
* **Layers:** 6
|
| 39 |
* **Attention Heads:** 6
|
| 40 |
* **Hidden Dimension:** 288
|
|
|
|
| 58 |
```python
|
| 59 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 60 |
|
| 61 |
+
model_id = "sixf0ur/tiny-lm-8M"
|
| 62 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 63 |
model = AutoModelForCausalLM.from_pretrained(model_id)
|
| 64 |
|