KernelGPT / README.md
saiakula's picture
Upload README.md with huggingface_hub
81515d1 verified
---
language: en
tags:
- gpt
- language-model
- gpu
- cuda
- ai-systems
- pytorch
license: mit
---
# KernelGPT — GPU/AI Systems Performance
A GPT-style decoder-only transformer trained from scratch on GPU/AI systems performance engineering text.
## Model Specs
| Property | Value |
|----------|-------|
| Parameters | ~125M |
| Architecture | Decoder-only Transformer |
| Embedding dim | 768 |
| Attention heads | 12 |
| Layers | 8 |
| Context length | 512 tokens |
| Vocab size | 32,000 (SentencePiece BPE) |
## Training
| Setting | Value |
|---------|-------|
| Training steps | 162,000 |
| Val loss | 4.3889 |
| Optimizer | AdamW |
| Learning rate | 3e-4 (cosine decay) |
| Batch size | 1 (effective 4 with grad accum) |
## Training Data
- **FineWeb** (general web text)
- **arXiv papers** (cs.DC, cs.AR, cs.LG, cs.PF categories — GPU/AI/systems)
- **Wikipedia** (ML/systems filtered articles)
- **GPU-specific crawl** (NVIDIA docs, GitHub READMEs, arXiv abstracts)
Topics cover all 20 chapters of *AI Performance Engineering* including CUDA internals,
KV cache tuning, LLM inference, distributed training, and GPU cluster scaling.
## Usage
```python
import torch
import sentencepiece as spm
from huggingface_hub import hf_hub_download
# Download files
ckpt_path = hf_hub_download("saiakula/KernelGPT", "pytorch_model.pt")
tok_path = hf_hub_download("saiakula/KernelGPT", "tokenizer.model")
# Load tokenizer
sp = spm.SentencePieceProcessor(model_file=tok_path)
# Load model
# (requires TinyGPT src — clone https://github.com/your-username/TinyGPT)
checkpoint = torch.load(ckpt_path, map_location="cpu")
```
## Acknowledgments
- Inspired by Andrej Karpathy's nanoGPT
- Training topics based on *AI Performance Engineering* by Chris Fregly