auto-g-nano-2
This is a modernized, "Grok-style" decoder-only Transformer (nanoGPT evolution) trained on the FineWeb-Edu dataset.
Key Features
- Modern Architecture: Llama-style implementation with RoPE, RMSNorm, and SwiGLU.
- Grouped-Query Attention (GQA): Optimized for inference efficiency.
- BPE Tokenization: Uses OpenAI's
tiktoken(GPT-2). - Parameters: ~152.7M
Model Details
- Architecture: Decoder-only Transformer
- Vocab Size: 50257
- Embedding Dimension: 768
- Heads: 12
- KV Heads: 4
- Layers: 12
- Block Size: 1024
How to Use
You can use this model directly with the GPT class from this repository.
from model import GPT
import tiktoken
import torch
model = GPT.from_pretrained("geoffsee/auto-g-nano-2")
enc = tiktoken.get_encoding("gpt2")
# Generate text
prompt = "The future of AI is"
idx = torch.tensor(enc.encode(prompt)).unsqueeze(0)
completion = model.generate(idx, max_new_tokens=50)
print(enc.decode(completion[0].tolist()))
Training Data
Trained on a sample of FineWeb-Edu.
- Downloads last month
- 2