auto-g-nano-2

This is a modernized, "Grok-style" decoder-only Transformer (nanoGPT evolution) trained on the FineWeb-Edu dataset.

Key Features

  • Modern Architecture: Llama-style implementation with RoPE, RMSNorm, and SwiGLU.
  • Grouped-Query Attention (GQA): Optimized for inference efficiency.
  • BPE Tokenization: Uses OpenAI's tiktoken (GPT-2).
  • Parameters: ~152.7M

Model Details

  • Architecture: Decoder-only Transformer
  • Vocab Size: 50257
  • Embedding Dimension: 768
  • Heads: 12
  • KV Heads: 4
  • Layers: 12
  • Block Size: 1024

How to Use

You can use this model directly with the GPT class from this repository.

from model import GPT
import tiktoken
import torch

model = GPT.from_pretrained("geoffsee/auto-g-nano-2")
enc = tiktoken.get_encoding("gpt2")

# Generate text
prompt = "The future of AI is"
idx = torch.tensor(enc.encode(prompt)).unsqueeze(0)
completion = model.generate(idx, max_new_tokens=50)
print(enc.decode(completion[0].tolist()))

Training Data

Trained on a sample of FineWeb-Edu.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train geoffsee/auto-g-nano-153m