MiniGPT-30M-Wikipedia
A Small Language Model (SLM) ~30M parameters built from scratch. This model was trained as part of an architectural study of pre-LLM structures.
Model Details
- Architecture: GPT-style Decoder-only Transformer
- Parameters: ~30 Million
- Layers: 6
- Attention Heads: 8
- Embedding Dimension: 384
- Context Size: 512 tokens
Training Configuration
- Dataset: WikiText-103
- Hardware: 2x NVIDIA T4 GPUs (Kaggle)
- Epochs: 1