Hippocrene
/

MiniLLM-0.1B

Text Generation

Model card Files Files and versions

MiniLLM-0.1B / README.md

Hippocrene's picture

Update README.md

0ae06f0 verified 6 days ago

|

history blame contribute delete

979 Bytes

	---
	license: mit
	datasets:
	- Skylion007/openwebtext
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- llama
	- causal-lm
	- pretrained
	model-index:
	- name: miniLLM-0.1B
	results: []
	---

	# miniLLM-0.1B

	A small (~109M parameters) causal language model pretrained from scratch on [OpenWebText](https://huggingface.co/datasets/Skylion007/openwebtext).

	## Model Details

	\| Attribute \| Value \|
	\|---\|---\|
	\| Architecture \| LlamaForCausalLM \|
	\| Parameters \| ~109M \|
	\| Hidden Size \| 768 \|
	\| Attention Heads \| 12 \|
	\| Layers \| 10 \|
	\| Intermediate Size \| 2048 \|
	\| Max Sequence Length \| 1024 \|
	\| Vocabulary Size \| 50257 \|
	\| Tokenizer \| GPT-2 (BPE) \|
	\| Positional Encoding \| RoPE (θ=10000) \|
	\| Activation \| SiLU \|
	\| Tie Word Embeddings \| Yes \|
	\| Precision (training) \| bfloat16 \|



	## Limitations

	This is a small-scale pretrained model intended for research and educational purposes. It is not suitable for production use. Outputs may be incoherent, biased, or factually incorrect.