MiniLLM-0.1B / README.md

Hippocrene

Update README.md

0ae06f0 verified 6 days ago

preview code

raw

history blame contribute delete

979 Bytes

metadata

license: mit
datasets:
  - Skylion007/openwebtext
language:
  - en
pipeline_tag: text-generation
tags:
  - llama
  - causal-lm
  - pretrained
model-index:
  - name: miniLLM-0.1B
    results: []

miniLLM-0.1B

A small (~109M parameters) causal language model pretrained from scratch on OpenWebText.

Model Details

Attribute	Value
Architecture	LlamaForCausalLM
Parameters	~109M
Hidden Size	768
Attention Heads	12
Layers	10
Intermediate Size	2048
Max Sequence Length	1024
Vocabulary Size	50257
Tokenizer	GPT-2 (BPE)
Positional Encoding	RoPE (θ=10000)
Activation	SiLU
Tie Word Embeddings	Yes
Precision (training)	bfloat16

Limitations

This is a small-scale pretrained model intended for research and educational purposes. It is not suitable for production use. Outputs may be incoherent, biased, or factually incorrect.