ART-GPT-16L-1024D
A GPT-2 style language model trained with Attractor-Regularized Training (ART).
Model Details
| Property | Value |
|---|---|
| Parameters | 305,335,296 (305.3M) |
| Layers | 16 |
| Embedding Dim | 1024 |
| Attention Heads | 16 |
| Context Length | 1024 |
| Vocab Size | 50257 |
Training
- Dataset: OpenWebText
- Training Steps: 100,000
- Validation Loss: 2.9706
- Validation Perplexity: 19.5
ART (Attractor-Regularized Training)
This model was trained with ART, which enforces empirically-discovered conservation laws as soft constraints during training. The conservation laws guide the model toward optimal weight configurations characterized by mathematical constants.
Usage
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model = GPT2LMHeadModel.from_pretrained("your-username/art-gpt-100k")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
text = "The meaning of life is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
Citation
@misc{art2026,
author = {Knopp, Christian},
title = {Attractor-Regularized Training for Neural Networks},
year = {2026},
url = {https://github.com/conceptual1/ART}
}
License
MIT
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support