TheCodeKat
/

scholar-sage

Model card Files Files and versions

ScholarSage - Tiny Transformer LM

A tiny transformer language model built from scratch for educational purposes.

Model Details

Architecture: Decoder-only transformer (GPT-style)
Parameters:
- Vocabulary: 50,257 tokens (GPT-2 tokenizer)
- Embedding dimension: 256
- Layers: 4
- Attention heads: 4
- FFN dimension: 1024
- Max sequence length: 512

Training

Dataset: WikiText-2
Optimizer: AdamW
Learning rate: 3e-4

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("TheCodeKat/scholar-sage")

# Load model (you'll need to load the architecture separately)
# This is a custom model, not a standard transformers model

Purpose

This model is built for educational purposes to understand transformer architecture from scratch.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support