scholar-sage / README.md

TheCodeKat

Upload folder using huggingface_hub

f081c51 verified 4 months ago

preview code

raw

history blame contribute delete

984 Bytes

metadata

tags:
  - transformer
  - language-model
  - educational
license: mit

ScholarSage - Tiny Transformer LM

A tiny transformer language model built from scratch for educational purposes.

Model Details

Architecture: Decoder-only transformer (GPT-style)
Parameters:
- Vocabulary: 50,257 tokens (GPT-2 tokenizer)
- Embedding dimension: 256
- Layers: 4
- Attention heads: 4
- FFN dimension: 1024
- Max sequence length: 512

Training

Dataset: WikiText-2
Optimizer: AdamW
Learning rate: 3e-4

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("TheCodeKat/scholar-sage")

# Load model (you'll need to load the architecture separately)
# This is a custom model, not a standard transformers model

Purpose

This model is built for educational purposes to understand transformer architecture from scratch.