| --- |
| license: mit |
| datasets: |
| - Skylion007/openwebtext |
| language: |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - llama |
| - causal-lm |
| - pretrained |
| model-index: |
| - name: miniLLM-0.1B |
| results: [] |
| --- |
| |
| # miniLLM-0.1B |
|
|
| A small (~109M parameters) causal language model pretrained from scratch on [OpenWebText](https://huggingface.co/datasets/Skylion007/openwebtext). |
|
|
| ## Model Details |
|
|
| | Attribute | Value | |
| |---|---| |
| | Architecture | LlamaForCausalLM | |
| | Parameters | ~109M | |
| | Hidden Size | 768 | |
| | Attention Heads | 12 | |
| | Layers | 10 | |
| | Intermediate Size | 2048 | |
| | Max Sequence Length | 1024 | |
| | Vocabulary Size | 50257 | |
| | Tokenizer | GPT-2 (BPE) | |
| | Positional Encoding | RoPE (θ=10000) | |
| | Activation | SiLU | |
| | Tie Word Embeddings | Yes | |
| | Precision (training) | bfloat16 | |
|
|
|
|
|
|
| ## Limitations |
|
|
| This is a small-scale pretrained model intended for research and educational purposes. It is **not** suitable for production use. Outputs may be incoherent, biased, or factually incorrect. |
|
|