--- language: en license: mit tags: - tiny - language-model - causal-lm - pytorch datasets: - roneneldan/TinyStories - Skylion007/openwebtext pipeline_tag: text-generation library_name: transformers --- # TinyLM A 3.4M parameter causal language model trained from scratch, for experimentation. ## Architecture | Hyperparameter | Value | |---|---| | Parameters | 3.403.968 | | Layers | 4 | | Hidden size | 64 | | Attention heads | 4 | | FFN dim | 192 | | Embedding rank | 32 | | Context length | 256 | | Tokenizer | GPT-2 (50257 vocab) | Uses a **factored (low-rank) embedding** to keep the vocab projection from eating the entire parameter budget, with weight tying on the output head. ## Training | | | |---|---| | Datasets | Skylion007/openwebtext (10k samples), roneneldan/TinyStories (10k samples) | | Optimizer | AdamW (lr=3e-3, weight_decay=0.01) | | Scheduler | Cosine annealing with warm restarts | | Mixed precision | fp16 (torch.cuda.amp) | | Hardware | Nvidia P100 | ## Usage ```python from huggingface_hub import snapshot_download import importlib.util import torch # Download files snapshot_download(repo_id="Fu01978/TinyLM", local_dir="./tinylm") # Load via script spec = importlib.util.spec_from_file_location("modeling_tinylm", "./tinylm/modeling_tinylm.py") module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) model, tokenizer, config = module.load_tinylm("./tinylm") model.eval() # Generate output = module.generate(model, tokenizer, "Once upon a time, ") print(output) ```