miniLLM-0.1B
A small (~109M parameters) causal language model pretrained from scratch on OpenWebText.
Model Details
| Attribute | Value |
|---|---|
| Architecture | LlamaForCausalLM |
| Parameters | ~109M |
| Hidden Size | 768 |
| Attention Heads | 12 |
| Layers | 10 |
| Intermediate Size | 2048 |
| Max Sequence Length | 1024 |
| Vocabulary Size | 50257 |
| Tokenizer | GPT-2 (BPE) |
| Positional Encoding | RoPE (θ=10000) |
| Activation | SiLU |
| Tie Word Embeddings | Yes |
| Precision (training) | bfloat16 |
Limitations
This is a small-scale pretrained model intended for research and educational purposes. It is not suitable for production use. Outputs may be incoherent, biased, or factually incorrect.
- Downloads last month
- 261