jonmabe
/

tiny-llm-54m

Text Generation

Model card Files Files and versions

tiny-llm-54m / README.md

jonmabe's picture

Update model card with documentation and examples

81f2447 verified 11 days ago

|

history blame contribute delete

3.24 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- tiny
	- from-scratch
	- educational
	- causal-lm
	- personal-llm
	model-index:
	- name: tiny-llm-54m
	results: []
	---

	# Tiny-LLM 54M

	A small transformer language model (~54.93M parameters) trained from scratch for educational and experimental purposes.

	## Model Description

	This is a decoder-only transformer trained from scratch on Wikipedia text. It demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets.

	### Architecture

	\| Component \| Value \|
	\|-----------\|-------\|
	\| Parameters \| 54.93M \|
	\| Layers \| 12 \|
	\| Hidden Size \| 512 \|
	\| Attention Heads \| 8 \|
	\| Intermediate (FFN) \| 1408 \|
	\| Vocab Size \| 32,000 \|
	\| Max Sequence Length \| 512 \|
	\| Position Encoding \| RoPE \|
	\| Normalization \| RMSNorm \|
	\| Activation \| SwiGLU \|
	\| Weight Tying \| Yes \|

	### Training Details

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Training Steps \| 50,000 \|
	\| Tokens \| ~100M \|
	\| Batch Size \| 32 \|
	\| Learning Rate \| 3e-4 \|
	\| Warmup Steps \| 2,000 \|
	\| Weight Decay \| 0.1 \|
	\| Hardware \| NVIDIA RTX 5090 (32GB) \|
	\| Training Time \| ~3 hours \|

	## Usage

	```python
	import torch
	from transformers import AutoTokenizer

	# Load tokenizer (uses standard GPT-2 style tokenizer)
	tokenizer = AutoTokenizer.from_pretrained("jonmabe/tiny-llm-54m")

	# For custom model loading, see the model files
	# This model uses a custom architecture - see scripts/ for inference code
	```

	### Generation Example

	```python
	# Note: This model uses a custom architecture
	# Full inference code available in the repository

	prompt = "The history of artificial intelligence"
	# Model generates continuation based on learned Wikipedia patterns
	```

	## Intended Use

	- Educational: Understanding transformer training from scratch
	- Experimental: Testing fine-tuning approaches on small models
	- Personal LLM: Base for personal voice/style fine-tuning
	- Research: Lightweight model for NLP experiments

	## Limitations

	- Small model size limits knowledge and capabilities
	- Trained only on Wikipedia - limited domain coverage
	- Not suitable for production use cases requiring high quality
	- May generate factually incorrect information
	- No RLHF or instruction tuning

	## Training Data

	- Source: Wikipedia (English)
	- Processing: Tokenized with 32K vocabulary SentencePiece tokenizer
	- Format: Standard causal language modeling (next token prediction)

	## Future Work

	This model is intended as a base for:
	1. Personal Fine-tuning: Adapt to individual writing style using personal data
	2. Domain Adaptation: Specialize for specific topics or tasks
	3. Instruction Tuning: Add instruction-following capabilities

	## Hardware Requirements

	- Inference: ~300MB GPU memory, runs on any modern GPU or Apple Silicon
	- Fine-tuning: ~2GB GPU memory recommended

	## Related Work

	Inspired by:
	- Andrej Karpathy's nanoGPT
	- Geddy Duke's small LLM experiments
	- LLaMA architecture design choices

	## Citation

	```bibtex
	@misc{tiny-llm-54m,
	author = {jonmabe},
	title = {Tiny-LLM: A 54M Parameter Language Model},
	year = {2026},
	publisher = {Hugging Face},
	url = {https://huggingface.co/jonmabe/tiny-llm-54m}
	}
	```