Codsworth

A transformer language model built from scratch in PyTorch.

Model Details

  • Architecture: GPT-style decoder-only transformer with RoPE (Rotary Position Embedding)
  • Parameters: ~3.9M (Tiny config)
  • Framework: PyTorch

Architecture Details

  • 2 transformer layers
  • 256 embedding dimension
  • 4 attention heads
  • 512 FFN hidden dimension
  • SwiGLU activation
  • RoPE positional encoding

Usage

from transformers import AutoModel, AutoTokenizer
import torch

model = AutoModel.from_pretrained("your-username/codsworth")
tokenizer = AutoTokenizer.from_pretrained("your-username/codsworth")

inputs = tokenizer("Hello world", return_tensors="pt")
outputs = model(**inputs)

Training

  • Trained on WikiText subset
  • Optimizer: AdamW (lr=1e-4)
  • Loss: Cross-entropy

Limitations

  • Small vocabulary (~5000 words)
  • Limited training data
  • Not suitable for production use

Citation

@misc{codsworth2026,
  title={Codsworth: A Transformer Language Model Implementation},
  author={Shanahan, Jaq R.},
  year={2026},
  institution={Hermes Research Center}
}
Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Jaqshanahan/codsworth-3.8m