scholar-sage / README.md
TheCodeKat's picture
Upload folder using huggingface_hub
f081c51 verified
metadata
tags:
  - transformer
  - language-model
  - educational
license: mit

ScholarSage - Tiny Transformer LM

A tiny transformer language model built from scratch for educational purposes.

Model Details

  • Architecture: Decoder-only transformer (GPT-style)
  • Parameters:
    • Vocabulary: 50,257 tokens (GPT-2 tokenizer)
    • Embedding dimension: 256
    • Layers: 4
    • Attention heads: 4
    • FFN dimension: 1024
    • Max sequence length: 512

Training

  • Dataset: WikiText-2
  • Optimizer: AdamW
  • Learning rate: 3e-4

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("TheCodeKat/scholar-sage")

# Load model (you'll need to load the architecture separately)
# This is a custom model, not a standard transformers model

Purpose

This model is built for educational purposes to understand transformer architecture from scratch.