miniLLM-0.1B

A small (~109M parameters) causal language model pretrained from scratch on OpenWebText.

Model Details

Attribute Value
Architecture LlamaForCausalLM
Parameters ~109M
Hidden Size 768
Attention Heads 12
Layers 10
Intermediate Size 2048
Max Sequence Length 1024
Vocabulary Size 50257
Tokenizer GPT-2 (BPE)
Positional Encoding RoPE (θ=10000)
Activation SiLU
Tie Word Embeddings Yes
Precision (training) bfloat16

Limitations

This is a small-scale pretrained model intended for research and educational purposes. It is not suitable for production use. Outputs may be incoherent, biased, or factually incorrect.

Downloads last month
261
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Hippocrene/MiniLLM-0.1B

Quantizations
1 model

Dataset used to train Hippocrene/MiniLLM-0.1B