|
|
--- |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- decoder-only |
|
|
- nlp |
|
|
- autoregressive |
|
|
- rope |
|
|
- gqa |
|
|
- rmsnorm |
|
|
- swiglu |
|
|
- from-scratch |
|
|
datasets: |
|
|
- roneneldan/TinyStories |
|
|
license: apache-2.0 |
|
|
model-index: |
|
|
- name: GatorGPT2 |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# π GatorGPT2 |
|
|
|
|
|
**GatorGPT2** is a small, decoder-only Transformer trained from scratch on a subset of **TinyStories** for next-token prediction. |
|
|
It uses **RoPE** (rotary positional embeddings), **GQA** (grouped-query attention), **RMSNorm**, and a **SwiGLU MLP**. |
|
|
Tokenizer is **tiktoken** with **p50k_base** vocabulary. |
|
|
|
|
|
> **Repo**: `kunjcr2/GatorGPT2` |
|
|
> **Intended use**: research, experimentation, educational demos for training/serving custom LMs |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Architecture |
|
|
|
|
|
- **Type**: Decoder-only, causal LM |
|
|
- **Layers**: `num_hidden_layers = 10` |
|
|
- **Hidden size**: `hidden_size = 448` |
|
|
- **Heads**: `num_attention_heads = 8` (GQA with 2 KV heads per query group) |
|
|
- **FFN**: SwiGLU, `d_ff β 2Γ hidden_size` |
|
|
- **Norm**: RMSNorm (pre-norm blocks) |
|
|
- **Positional**: RoPE |
|
|
- **Vocab**: `vocab_size = 50,257` (tiktoken p50k_base) |
|
|
- **Context length**: `max_position_embeddings = 1024` |
|
|
- **Weight tying**: output head tied with token embeddings |
|
|
- **Files**: |
|
|
- `pytorch_model.bin` (or `model.safetensors`) |
|
|
- `config.json` (`model_type: "gator-transformer"`, `auto_map` provided) |
|
|
- `modeling_gator.py`, `configuration_gator.py`, `__init__.py` |
|
|
- `tokenizer_manifest.json` β `{ "library": "tiktoken", "encoding": "p50k_base" }` |
|
|
|
|
|
> Custom code is loaded via `trust_remote_code=True`. |
|
|
|
|
|
--- |
|
|
|
|
|
## π¦ Install |
|
|
|
|
|
```bash |
|
|
pip install torch transformers tiktoken |
|
|
```` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Quickstart (Transformers + tiktoken) |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM |
|
|
import tiktoken |
|
|
|
|
|
MODEL_ID = "kunjcr2/GatorGPT2" |
|
|
DEVICE = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
|
|
# Load model (uses custom modeling code) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
MODEL_ID, |
|
|
trust_remote_code=True, |
|
|
torch_dtype=torch.float32, |
|
|
).to(DEVICE).eval() |
|
|
|
|
|
# Tokenizer (p50k_base via tiktoken) |
|
|
tok = tiktoken.get_encoding("p50k_base") |
|
|
|
|
|
def generate_greedy(prompt: str, max_new_tokens: int = 64) -> str: |
|
|
ids = tok.encode(prompt) |
|
|
x = torch.tensor([ids], device=DEVICE) |
|
|
for _ in range(max_new_tokens): |
|
|
with torch.no_grad(): |
|
|
out = model(x) |
|
|
logits = out["logits"] if isinstance(out, dict) else out.logits |
|
|
next_id = int(torch.argmax(logits[0, -1])) |
|
|
x = torch.cat([x, torch.tensor([[next_id]], device=DEVICE)], dim=1) |
|
|
return tok.decode(x[0].tolist()).replace("<|endoftext|>", "").strip() |
|
|
|
|
|
print(generate_greedy("Little girl was")) |
|
|
``` |
|
|
|
|
|
### Temperature-only sampling (no top-k/p) |
|
|
|
|
|
```python |
|
|
def generate_temp(prompt, max_new_tokens=64, temperature=0.9): |
|
|
ids = tok.encode(prompt) |
|
|
x = torch.tensor([ids], device=DEVICE) |
|
|
for _ in range(max_new_tokens): |
|
|
with torch.no_grad(): |
|
|
logits = model(x).logits[0, -1] / max(temperature, 1e-6) |
|
|
probs = torch.softmax(logits, dim=-1) |
|
|
next_id = torch.multinomial(probs, 1).item() |
|
|
x = torch.cat([x, torch.tensor([[next_id]], device=DEVICE)], dim=1) |
|
|
return tok.decode(x[0].tolist()).replace("<|endoftext|>", "").strip() |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Serving with vLLM (Optional) |
|
|
|
|
|
```bash |
|
|
python -m vllm.entrypoints.openai.api_server \ |
|
|
--model kunjcr2/GatorGPT2 \ |
|
|
--tokenizer kunjcr2/GatorGPT2 \ |
|
|
--trust-remote-code \ |
|
|
--dtype float32 \ |
|
|
--max-model-len 1024 \ |
|
|
--host 0.0.0.0 --port 8000 |
|
|
``` |
|
|
|
|
|
Call it: |
|
|
|
|
|
```bash |
|
|
curl http://localhost:8000/v1/completions \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{"model":"kunjcr2/GatorGPT2","prompt":"Little girl was","max_tokens":64,"temperature":0.9}' |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ͺ Training Summary |
|
|
|
|
|
* **Data**: `roneneldan/TinyStories` (train split; subset of \~1.5M stories) |
|
|
* **Objective**: causal LM (next-token prediction), cross-entropy |
|
|
* **Optimizer**: AdamW (`lr=3e-4`, `weight_decay=0.01`, `eps=1e-8`) |
|
|
* **Precision**: bf16 autocast on CUDA during forward for speed |
|
|
* **Batching**: sliding windows via a `FastDataset` (window size e.g. 512, stride 256) |
|
|
* **Eval**: periodic validation over fixed batches; train loss downsampled to eval steps for plotting |
|
|
* **Hardware**: intended for A100-class GPUs; also runs on CPU for debug (slow) |
|
|
|
|
|
> This is a *from-scratch* toy/educational model; quality depends heavily on steps, data cleaned, and schedule. Expect simple, short English generations. |
|
|
|
|
|
--- |
|
|
|
|
|
## β
Intended Use |
|
|
|
|
|
* Research on small decoder-only Transformers |
|
|
* Educational demos (training, saving, model hub, vLLM serving) |
|
|
* Baseline for experimenting with: |
|
|
|
|
|
* LoRA/QLoRA, quantization, distillation |
|
|
* Attention variants (Flash-Attention, GQA configs) |
|
|
* Data curation and scaling laws |
|
|
|
|
|
**Not** intended for production or safety-critical use. |
|
|
|
|
|
--- |
|
|
|
|
|
## β οΈ Limitations & Risks |
|
|
|
|
|
* Trained on childrenβs story data β limited world knowledge & reasoning |
|
|
* May output incoherent, repetitive, or undesirable text |
|
|
* No instruction-tuning or RLHF |
|
|
* Tokenizer is `tiktoken p50k_base` (not a standard HF tokenizer), so examples use `tiktoken` directly |
|
|
|
|
|
--- |
|
|
|
|
|
## π Repo Structure |
|
|
|
|
|
``` |
|
|
. |
|
|
βββ config.json |
|
|
βββ pytorch_model.bin # or model.safetensors |
|
|
βββ modeling_gator.py # custom architecture (RoPE, GQA, RMSNorm, SwiGLU) |
|
|
βββ configuration_gator.py |
|
|
βββ __init__.py |
|
|
βββ tokenizer_manifest.json # { "library": "tiktoken", "encoding": "p50k_base" } |
|
|
``` |
|
|
|
|
|
`config.json` includes: |
|
|
|
|
|
```json |
|
|
{ |
|
|
"model_type": "gator-transformer", |
|
|
"architectures": ["GatorModel"], |
|
|
"auto_map": { |
|
|
"AutoConfig": "configuration_gator.GatorConfig", |
|
|
"AutoModelForCausalLM": "modeling_gator.GatorModel" |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Evaluation |
|
|
|
|
|
No formal benchmarks reported. You can compute loss/perplexity on your own validation subset: |
|
|
|
|
|
```python |
|
|
import math, torch |
|
|
from torch.utils.data import DataLoader, TensorDataset |
|
|
|
|
|
# ...build a DataLoader of (input_ids, target_ids) pairs... |
|
|
def eval_loss(model, loader, device="cuda"): |
|
|
model.eval(); total, n = 0.0, 0 |
|
|
with torch.no_grad(): |
|
|
for x, y in loader: |
|
|
x, y = x.to(device), y.to(device) |
|
|
logits = model(x).logits |
|
|
loss = torch.nn.functional.cross_entropy( |
|
|
logits.view(-1, logits.size(-1)), y.view(-1) |
|
|
) |
|
|
total += loss.item(); n += 1 |
|
|
return total / max(n,1) |
|
|
|
|
|
val_loss = eval_loss(model, your_val_loader) |
|
|
print("val loss:", val_loss, " ppl:", math.exp(val_loss)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
|
|
|
**apache-2.0** |
|
|
|
|
|
--- |
|
|
|
|
|
## π Acknowledgements |
|
|
|
|
|
* **TinyStories** dataset by Ronen Eldan et al. (`roneneldan/TinyStories`) |
|
|
* Community tooling: **PyTorch**, **π€ Transformers**, **tiktoken**, **vLLM** |
|
|
|
|
|
--- |
|
|
|
|
|
## βοΈ Citation |
|
|
|
|
|
If you use this model, please cite this repository: |
|
|
|
|
|
```bibtex |
|
|
@software{GatorGPT2_2025, |
|
|
author = {Kunj}, |
|
|
title = {GatorGPT2: a small decoder-only Transformer with RoPE+GQA}, |
|
|
year = {2025}, |
|
|
url = {https://huggingface.co/kunjcr2/GatorGPT2} |
|
|
} |
|
|
``` |