MiniLLM-0.1B / README.md
Hippocrene's picture
Update README.md
0ae06f0 verified
metadata
license: mit
datasets:
  - Skylion007/openwebtext
language:
  - en
pipeline_tag: text-generation
tags:
  - llama
  - causal-lm
  - pretrained
model-index:
  - name: miniLLM-0.1B
    results: []

miniLLM-0.1B

A small (~109M parameters) causal language model pretrained from scratch on OpenWebText.

Model Details

Attribute Value
Architecture LlamaForCausalLM
Parameters ~109M
Hidden Size 768
Attention Heads 12
Layers 10
Intermediate Size 2048
Max Sequence Length 1024
Vocabulary Size 50257
Tokenizer GPT-2 (BPE)
Positional Encoding RoPE (θ=10000)
Activation SiLU
Tie Word Embeddings Yes
Precision (training) bfloat16

Limitations

This is a small-scale pretrained model intended for research and educational purposes. It is not suitable for production use. Outputs may be incoherent, biased, or factually incorrect.