geoffsee
/

auto-g-nano-153m

Text Generation

Model card Files Files and versions

auto-g-nano-2

This is a modernized, "Grok-style" decoder-only Transformer (nanoGPT evolution) trained on the FineWeb-Edu dataset.

Key Features

Modern Architecture: Llama-style implementation with RoPE, RMSNorm, and SwiGLU.
Grouped-Query Attention (GQA): Optimized for inference efficiency.
BPE Tokenization: Uses OpenAI's tiktoken (GPT-2).
Parameters: ~152.7M

Model Details

Architecture: Decoder-only Transformer
Vocab Size: 50257
Embedding Dimension: 768
Heads: 12
KV Heads: 4
Layers: 12
Block Size: 1024

How to Use

You can use this model directly with the GPT class from this repository.

from model import GPT
import tiktoken
import torch

model = GPT.from_pretrained("geoffsee/auto-g-nano-2")
enc = tiktoken.get_encoding("gpt2")

# Generate text
prompt = "The future of AI is"
idx = torch.tensor(enc.encode(prompt)).unsqueeze(0)
completion = model.generate(idx, max_new_tokens=50)
print(enc.decode(completion[0].tolist()))

Training Data

Trained on a sample of FineWeb-Edu.

Downloads last month: 2

Dataset used to train geoffsee/auto-g-nano-153m