Codsworth
A transformer language model built from scratch in PyTorch.
Model Details
- Architecture: GPT-style decoder-only transformer with RoPE (Rotary Position Embedding)
- Parameters: ~3.9M (Tiny config)
- Framework: PyTorch
Architecture Details
- 2 transformer layers
- 256 embedding dimension
- 4 attention heads
- 512 FFN hidden dimension
- SwiGLU activation
- RoPE positional encoding
Usage
from transformers import AutoModel, AutoTokenizer
import torch
model = AutoModel.from_pretrained("your-username/codsworth")
tokenizer = AutoTokenizer.from_pretrained("your-username/codsworth")
inputs = tokenizer("Hello world", return_tensors="pt")
outputs = model(**inputs)
Training
- Trained on WikiText subset
- Optimizer: AdamW (lr=1e-4)
- Loss: Cross-entropy
Limitations
- Small vocabulary (~5000 words)
- Limited training data
- Not suitable for production use
Citation
@misc{codsworth2026,
title={Codsworth: A Transformer Language Model Implementation},
author={Shanahan, Jaq R.},
year={2026},
institution={Hermes Research Center}
}
- Downloads last month
- 24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support