Spaces:
Sleeping
Sleeping
| title: Tiny-LLM Text Generator | |
| emoji: 🤖 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| python_version: "3.11" | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| # Tiny-LLM Text Generator | |
| A **54 million parameter** language model trained **from scratch** on Wikipedia. | |
| ## About | |
| This demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets! | |
| ## Architecture | |
| | Component | Value | | |
| |-----------|-------| | |
| | Parameters | 54.93M | | |
| | Layers | 12 | | |
| | Hidden Size | 512 | | |
| | Attention Heads | 8 | | |
| | Intermediate (FFN) | 1408 | | |
| | Vocab Size | 32,000 | | |
| | Max Sequence Length | 512 | | |
| | Position Encoding | RoPE | | |
| | Normalization | RMSNorm | | |
| | Activation | SwiGLU | | |
| ## Training | |
| - **Training Steps**: 50,000 | |
| - **Tokens**: ~100M | |
| - **Hardware**: NVIDIA RTX 5090 (32GB) | |
| - **Training Time**: ~3 hours | |
| ## Model | |
| [jonmabe/tiny-llm-54m](https://huggingface.co/jonmabe/tiny-llm-54m) | |
| ## Limitations | |
| - Small model size limits knowledge and capabilities | |
| - Trained only on Wikipedia - limited domain coverage | |
| - May generate factually incorrect information | |
| - Not instruction-tuned | |