--- tags: - transformer - language-model - educational license: mit --- # ScholarSage - Tiny Transformer LM A tiny transformer language model built from scratch for educational purposes. ## Model Details - **Architecture**: Decoder-only transformer (GPT-style) - **Parameters**: - Vocabulary: 50,257 tokens (GPT-2 tokenizer) - Embedding dimension: 256 - Layers: 4 - Attention heads: 4 - FFN dimension: 1024 - Max sequence length: 512 ## Training - **Dataset**: WikiText-2 - **Optimizer**: AdamW - **Learning rate**: 3e-4 ## Usage ```python import torch from transformers import AutoTokenizer # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("TheCodeKat/scholar-sage") # Load model (you'll need to load the architecture separately) # This is a custom model, not a standard transformers model ``` ## Purpose This model is built for educational purposes to understand transformer architecture from scratch.