TheCodeKat
/

scholar-sage

Model card Files Files and versions

scholar-sage / README.md

TheCodeKat's picture

Upload folder using huggingface_hub

f081c51 verified 4 months ago

|

history blame contribute delete

984 Bytes

	---
	tags:
	- transformer
	- language-model
	- educational
	license: mit
	---

	# ScholarSage - Tiny Transformer LM

	A tiny transformer language model built from scratch for educational purposes.

	## Model Details

	- Architecture: Decoder-only transformer (GPT-style)
	- Parameters:
	- Vocabulary: 50,257 tokens (GPT-2 tokenizer)
	- Embedding dimension: 256
	- Layers: 4
	- Attention heads: 4
	- FFN dimension: 1024
	- Max sequence length: 512

	## Training

	- Dataset: WikiText-2
	- Optimizer: AdamW
	- Learning rate: 3e-4

	## Usage

	```python
	import torch
	from transformers import AutoTokenizer

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("TheCodeKat/scholar-sage")

	# Load model (you'll need to load the architecture separately)
	# This is a custom model, not a standard transformers model
	```

	## Purpose

	This model is built for educational purposes to understand transformer architecture from scratch.