Fu01978
/

TinyLM

@@ -2,29 +2,33 @@
 language: en
 license: mit
 tags:
-  - tiny
-  - language-model
-  - causal-lm
-  - from-scratch
-  - pytorch
 ---
 # TinyLM
-A ~1M parameter causal language model trained from scratch, for fun and experimentation.
 ## Architecture
 | Hyperparameter | Value |
 |---|---|
-| Parameters | ~1M |
 | Layers | 4 |
 | Hidden size | 64 |
 | Attention heads | 4 |
 | FFN dim | 192 |
 | Embedding rank | 32 |
 | Context length | 256 |
-| Tokenizer | GPT-2 (50,257 vocab) |
 Uses a **factored (low-rank) embedding** to keep the vocab projection from eating the entire parameter budget, with weight tying on the output head.
@@ -36,7 +40,7 @@ Uses a **factored (low-rank) embedding** to keep the vocab projection from eatin
 | Optimizer | AdamW (lr=3e-3, weight_decay=0.01) |
 | Scheduler | Cosine annealing with warm restarts |
 | Mixed precision | fp16 (torch.cuda.amp) |
-| Hardware | Nvidia P100 (Kaggle) |
 ## Usage
 ```python
@@ -44,10 +48,10 @@ from huggingface_hub import snapshot_download
 import importlib.util
 import torch
-# Download all files
 snapshot_download(repo_id="Fu01978/TinyLM", local_dir="./tinylm")
-# Load via included script
 spec   = importlib.util.spec_from_file_location("modeling_tinylm", "./tinylm/modeling_tinylm.py")
 module = importlib.util.module_from_spec(spec)
 spec.loader.exec_module(module)
@@ -56,11 +60,6 @@ model, tokenizer, config = module.load_tinylm("./tinylm")
 model.eval()
 # Generate
-output = module.generate(model, tokenizer, "Once upon a time")
 print(output)
-```
-## Example Outputs
-**Prompt:** Once upon a time
-**Output:** Once upon a time there was a little girl named Mrs. She decided to go and be a little girl in the park. One day she had to go on a bed. From then on a lot of bread. She said, "What are you doing?" ...

 language: en
 license: mit
 tags:
+- tiny
+- language-model
+- causal-lm
+- pytorch
+datasets:
+- roneneldan/TinyStories
+- Skylion007/openwebtext
+pipeline_tag: text-generation
+library_name: transformers
 ---
 # TinyLM
+A 3.4M parameter causal language model trained from scratch, for experimentation.
 ## Architecture
 | Hyperparameter | Value |
 |---|---|
+| Parameters | 3.403.968 |
 | Layers | 4 |
 | Hidden size | 64 |
 | Attention heads | 4 |
 | FFN dim | 192 |
 | Embedding rank | 32 |
 | Context length | 256 |
+| Tokenizer | GPT-2 (50257 vocab) |
 Uses a **factored (low-rank) embedding** to keep the vocab projection from eating the entire parameter budget, with weight tying on the output head.
 | Optimizer | AdamW (lr=3e-3, weight_decay=0.01) |
 | Scheduler | Cosine annealing with warm restarts |
 | Mixed precision | fp16 (torch.cuda.amp) |
+| Hardware | Nvidia P100 |
 ## Usage
 ```python
 import importlib.util
 import torch
+# Download files
 snapshot_download(repo_id="Fu01978/TinyLM", local_dir="./tinylm")
+# Load via script
 spec   = importlib.util.spec_from_file_location("modeling_tinylm", "./tinylm/modeling_tinylm.py")
 module = importlib.util.module_from_spec(spec)
 spec.loader.exec_module(module)
 model.eval()
 # Generate
+output = module.generate(model, tokenizer, "Once upon a time, ")
 print(output)
+```