sixf0ur commited on
Commit
c0a67b8
·
verified ·
1 Parent(s): ab8b558

Update README.md

Browse files

Miscalculated model size

Files changed (1) hide show
  1. README.md +6 -7
README.md CHANGED
@@ -9,33 +9,32 @@ tags:
9
  - babylm
10
  - tinyllama
11
  - tiny
12
- - 15M
13
  ---
14
 
15
  ## Tiny-LM-15M
16
- A nano-sized language model (15M parameters) that demonstrates the power of high-quality synthetic data.
17
  Despite its tiny size, it achieves a significant portion of GPT-2's (124M) performance by training on distilled and
18
  simplified English datasets.
19
 
20
  This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M).
21
- The results show that Tiny-LM-15M punches far above its weight class:
22
 
23
  ## Performance Comparison
24
 
25
  This model was evaluated using the `lm-evaluation-harness` against OpenAI's GPT-2 (124M). The results show that **Tiny-LM-15M** punches far above its weight class:
26
 
27
- | Task | Tiny-LM (15M) | GPT-2 (124M) | % of GPT-2 Perf. |
28
  | --- | --- | --- | --- |
29
  | **ARC-Easy** (acc_norm) | **31.73%** | 39.48% | **80.4%** |
30
  | **HellaSwag** (acc_norm) | **27.00%** | 31.14% | **86.7%** |
31
 
32
- > **Key Takeaway:** With only **12% of the parameters**, this model achieves over **80% of the reasoning performance** of GPT-2, proving that modern architectures combined with curated data can drastically reduce model size.
33
 
34
  ## Model Architecture
35
 
36
  The model is based on the **Llama-2 architecture** with several modern optimizations:
37
 
38
- * **Parameters:** 15.2 Million
39
  * **Layers:** 6
40
  * **Attention Heads:** 6
41
  * **Hidden Dimension:** 288
@@ -59,7 +58,7 @@ You can use this model directly with the Hugging Face `transformers` library:
59
  ```python
60
  from transformers import AutoModelForCausalLM, AutoTokenizer
61
 
62
- model_id = "sixf0ur/tiny-lm-15M"
63
  tokenizer = AutoTokenizer.from_pretrained(model_id)
64
  model = AutoModelForCausalLM.from_pretrained(model_id)
65
 
 
9
  - babylm
10
  - tinyllama
11
  - tiny
 
12
  ---
13
 
14
  ## Tiny-LM-15M
15
+ A nano-sized language model (8M parameters) that demonstrates the power of high-quality synthetic data.
16
  Despite its tiny size, it achieves a significant portion of GPT-2's (124M) performance by training on distilled and
17
  simplified English datasets.
18
 
19
  This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M).
20
+ The results show that Tiny-LM-8M punches far above its weight class:
21
 
22
  ## Performance Comparison
23
 
24
  This model was evaluated using the `lm-evaluation-harness` against OpenAI's GPT-2 (124M). The results show that **Tiny-LM-15M** punches far above its weight class:
25
 
26
+ | Task | Tiny-LM (8M) | GPT-2 (124M) | % of GPT-2 Perf. |
27
  | --- | --- | --- | --- |
28
  | **ARC-Easy** (acc_norm) | **31.73%** | 39.48% | **80.4%** |
29
  | **HellaSwag** (acc_norm) | **27.00%** | 31.14% | **86.7%** |
30
 
31
+ > **Key Takeaway:** With only **6.4% of the parameters**, this model achieves over **80% of the reasoning performance** of GPT-2, proving that modern architectures combined with curated data can drastically reduce model size.
32
 
33
  ## Model Architecture
34
 
35
  The model is based on the **Llama-2 architecture** with several modern optimizations:
36
 
37
+ * **Parameters:** 8.4 Million
38
  * **Layers:** 6
39
  * **Attention Heads:** 6
40
  * **Hidden Dimension:** 288
 
58
  ```python
59
  from transformers import AutoModelForCausalLM, AutoTokenizer
60
 
61
+ model_id = "sixf0ur/tiny-lm-8M"
62
  tokenizer = AutoTokenizer.from_pretrained(model_id)
63
  model = AutoModelForCausalLM.from_pretrained(model_id)
64