Atto-2: Ternary GPT Research For Intelligence Density
Atto-2 is an exploration into Ternary Intelligence Density using the Transformer (GPT) architecture. Every weight in the Atto-2 series is constrained to the set ${-1, 0, 1}$, following the 1.58-bit principle.
Unlike the previous N-Gram based experiments, Atto-2 uses a full Causal Self-Attention mechanism, allowing for much more "intelligent" relationships to be captured within the same parameter budget.
The Atto-2 Series (Ternary GPT)
| Model | Parameters | Layers | Heads | Embd Dim | Weights |
|---|---|---|---|---|---|
| atto2-gpt-1k | ~1k | 1 | 1 | 8 | 1.58-bit |
| atto2-gpt-8k | ~8k | 2 | 2 | 16 | 1.58-bit |
Research Findings: Transformer Density
- Attention is Key: Even with ternary weights, the attention mechanism allows the model to "look back" more effectively than the static windows of the previous architecture.
- Quantization Robustness: The GPT architecture remains highly functional under 1.58-bit constraints, with convergence appearing stable even at very low embedding dimensions.
- Ternary Inference: The exported JSON models contain integer-only weights ${-1, 0, 1}$ for all linear layers and embeddings, maximizing efficiency for edge inference.
Usage
Training
To train the Atto-2 GPT series:
python3 train_atto.py
Sampling
To evaluate the models:
python3 sample.py
The models are exported as dependency-free JSON files in the models/ directory, ready for client-side inference in a web browser. Atto-2 weights are guaranteed to be ternary.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support