sixf0ur
/

tiny-lm-8M

Model card Files Files and versions

sixf0ur commited on Feb 10

Commit

c0a67b8

·

verified ·

1 Parent(s): ab8b558

Update README.md

Miscalculated model size

Files changed (1) hide show

README.md +6 -7

README.md CHANGED Viewed

@@ -9,33 +9,32 @@ tags:
 - babylm
 - tinyllama
 - tiny
-- 15M
 ---
 ## Tiny-LM-15M
-A nano-sized language model (15M parameters) that demonstrates the power of high-quality synthetic data.
 Despite its tiny size, it achieves a significant portion of GPT-2's (124M) performance by training on distilled and
 simplified English datasets.
 This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M).
-The results show that Tiny-LM-15M punches far above its weight class:
 ## Performance Comparison
 This model was evaluated using the `lm-evaluation-harness` against OpenAI's GPT-2 (124M). The results show that **Tiny-LM-15M** punches far above its weight class:
-| Task | Tiny-LM (15M) | GPT-2 (124M) | % of GPT-2 Perf. |
 | --- | --- | --- | --- |
 | **ARC-Easy** (acc_norm) | **31.73%** | 39.48% | **80.4%** |
 | **HellaSwag** (acc_norm) | **27.00%** | 31.14% | **86.7%** |
-> **Key Takeaway:** With only **12% of the parameters**, this model achieves over **80% of the reasoning performance** of GPT-2, proving that modern architectures combined with curated data can drastically reduce model size.
 ## Model Architecture
 The model is based on the **Llama-2 architecture** with several modern optimizations:
-* **Parameters:** 15.2 Million
 * **Layers:** 6
 * **Attention Heads:** 6
 * **Hidden Dimension:** 288
@@ -59,7 +58,7 @@ You can use this model directly with the Hugging Face `transformers` library:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model_id = "sixf0ur/tiny-lm-15M"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id)

 - babylm
 - tinyllama
 - tiny
 ---
 ## Tiny-LM-15M
+A nano-sized language model (8M parameters) that demonstrates the power of high-quality synthetic data.
 Despite its tiny size, it achieves a significant portion of GPT-2's (124M) performance by training on distilled and
 simplified English datasets.
 This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M).
+The results show that Tiny-LM-8M punches far above its weight class:
 ## Performance Comparison
 This model was evaluated using the `lm-evaluation-harness` against OpenAI's GPT-2 (124M). The results show that **Tiny-LM-15M** punches far above its weight class:
+| Task | Tiny-LM (8M) | GPT-2 (124M) | % of GPT-2 Perf. |
 | --- | --- | --- | --- |
 | **ARC-Easy** (acc_norm) | **31.73%** | 39.48% | **80.4%** |
 | **HellaSwag** (acc_norm) | **27.00%** | 31.14% | **86.7%** |
+> **Key Takeaway:** With only **6.4% of the parameters**, this model achieves over **80% of the reasoning performance** of GPT-2, proving that modern architectures combined with curated data can drastically reduce model size.
 ## Model Architecture
 The model is based on the **Llama-2 architecture** with several modern optimizations:
+* **Parameters:** 8.4 Million
 * **Layers:** 6
 * **Attention Heads:** 6
 * **Hidden Dimension:** 288
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "sixf0ur/tiny-lm-8M"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id)