--- language: en tags: - causal-lm - gqa - rope - swiglu license: apache-2.0 --- # Exp-1 Small language model (9.9M parameters) trained from scratch. ## Architecture | Property | Value | |---|---| | Layers | 11 | | Hidden size | 256 | | Intermediate size | 704 | | Attention heads | 8 (GQA kv=2) | | Max sequence length | 1024 | | Vocab size | 8192 | | Tied embeddings | True | | Total parameters | 9.853M | ## Training - Tokens seen: 6,038,089,728 - Val loss: 2.4005 - Val PPL: 11.03 ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("GODELEV/Exp-1") model = AutoModelForCausalLM.from_pretrained("GODELEV/Exp-1") inputs = tokenizer("Hello", return_tensors="pt") output = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(output[0], skip_special_tokens=True)) ```