| tags: | |
| - transformer | |
| - language-model | |
| - educational | |
| license: mit | |
| # ScholarSage - Tiny Transformer LM | |
| A tiny transformer language model built from scratch for educational purposes. | |
| ## Model Details | |
| - **Architecture**: Decoder-only transformer (GPT-style) | |
| - **Parameters**: | |
| - Vocabulary: 50,257 tokens (GPT-2 tokenizer) | |
| - Embedding dimension: 256 | |
| - Layers: 4 | |
| - Attention heads: 4 | |
| - FFN dimension: 1024 | |
| - Max sequence length: 512 | |
| ## Training | |
| - **Dataset**: WikiText-2 | |
| - **Optimizer**: AdamW | |
| - **Learning rate**: 3e-4 | |
| ## Usage | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer | |
| # Load tokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("TheCodeKat/scholar-sage") | |
| # Load model (you'll need to load the architecture separately) | |
| # This is a custom model, not a standard transformers model | |
| ``` | |
| ## Purpose | |
| This model is built for educational purposes to understand transformer architecture from scratch. | |