miniLLM-0.1B

A small (~109M parameters) causal language model pretrained from scratch on OpenWebText.

Model Details

Attribute	Value
Architecture	LlamaForCausalLM
Parameters	~109M
Hidden Size	768
Attention Heads	12
Layers	10
Intermediate Size	2048
Max Sequence Length	1024
Vocabulary Size	50257
Tokenizer	GPT-2 (BPE)
Positional Encoding	RoPE (θ=10000)
Activation	SiLU
Tie Word Embeddings	Yes
Precision (training)	bfloat16

Limitations

This is a small-scale pretrained model intended for research and educational purposes. It is not suitable for production use. Outputs may be incoherent, biased, or factually incorrect.

Downloads last month: 261

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Hippocrene/MiniLLM-0.1B

Quantizations

1 model

Hippocrene
/

MiniLLM-0.1B

miniLLM-0.1B

Model Details

Limitations

Model tree for Hippocrene/MiniLLM-0.1B

Dataset used to train Hippocrene/MiniLLM-0.1B