ART-GPT-16L-1024D

A GPT-2 style language model trained with Attractor-Regularized Training (ART).

Model Details

Property Value
Parameters 305,335,296 (305.3M)
Layers 16
Embedding Dim 1024
Attention Heads 16
Context Length 1024
Vocab Size 50257

Training

  • Dataset: OpenWebText
  • Training Steps: 100,000
  • Validation Loss: 2.9706
  • Validation Perplexity: 19.5

ART (Attractor-Regularized Training)

This model was trained with ART, which enforces empirically-discovered conservation laws as soft constraints during training. The conservation laws guide the model toward optimal weight configurations characterized by mathematical constants.

Usage

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("your-username/art-gpt-100k")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

text = "The meaning of life is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

Citation

@misc{art2026,
  author = {Knopp, Christian},
  title = {Attractor-Regularized Training for Neural Networks},
  year = {2026},
  url = {https://github.com/conceptual1/ART}
}

License

MIT

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Zynerji/ART-GPT-16L-1024D