ScholarSage - Tiny Transformer LM
A tiny transformer language model built from scratch for educational purposes.
Model Details
- Architecture: Decoder-only transformer (GPT-style)
- Parameters:
- Vocabulary: 50,257 tokens (GPT-2 tokenizer)
- Embedding dimension: 256
- Layers: 4
- Attention heads: 4
- FFN dimension: 1024
- Max sequence length: 512
Training
- Dataset: WikiText-2
- Optimizer: AdamW
- Learning rate: 3e-4
Usage
import torch
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("TheCodeKat/scholar-sage")
# Load model (you'll need to load the architecture separately)
# This is a custom model, not a standard transformers model
Purpose
This model is built for educational purposes to understand transformer architecture from scratch.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support