| language: en | |
| tags: | |
| - causal-lm | |
| - gqa | |
| - rope | |
| - swiglu | |
| license: apache-2.0 | |
| # Exp-1 | |
| Small language model (9.9M parameters) trained from scratch. | |
| ## Architecture | |
| | Property | Value | | |
| |---|---| | |
| | Layers | 11 | | |
| | Hidden size | 256 | | |
| | Intermediate size | 704 | | |
| | Attention heads | 8 (GQA kv=2) | | |
| | Max sequence length | 1024 | | |
| | Vocab size | 8192 | | |
| | Tied embeddings | True | | |
| | Total parameters | 9.853M | | |
| ## Training | |
| - Tokens seen: 6,038,089,728 | |
| - Val loss: 2.4005 | |
| - Val PPL: 11.03 | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| tokenizer = AutoTokenizer.from_pretrained("GODELEV/Exp-1") | |
| model = AutoModelForCausalLM.from_pretrained("GODELEV/Exp-1") | |
| inputs = tokenizer("Hello", return_tensors="pt") | |
| output = model.generate(**inputs, max_new_tokens=50) | |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) | |
| ``` | |