Atto-2: Ternary GPT Research For Intelligence Density

Atto-2 is an exploration into Ternary Intelligence Density using the Transformer (GPT) architecture. Every weight in the Atto-2 series is constrained to the set ${-1, 0, 1}$, following the 1.58-bit principle.

Unlike the previous N-Gram based experiments, Atto-2 uses a full Causal Self-Attention mechanism, allowing for much more "intelligent" relationships to be captured within the same parameter budget.

The Atto-2 Series (Ternary GPT)

Model Parameters Layers Heads Embd Dim Weights
atto2-gpt-1k ~1k 1 1 8 1.58-bit
atto2-gpt-8k ~8k 2 2 16 1.58-bit

Research Findings: Transformer Density

  1. Attention is Key: Even with ternary weights, the attention mechanism allows the model to "look back" more effectively than the static windows of the previous architecture.
  2. Quantization Robustness: The GPT architecture remains highly functional under 1.58-bit constraints, with convergence appearing stable even at very low embedding dimensions.
  3. Ternary Inference: The exported JSON models contain integer-only weights ${-1, 0, 1}$ for all linear layers and embeddings, maximizing efficiency for edge inference.

Usage

Training

To train the Atto-2 GPT series:

python3 train_atto.py

Sampling

To evaluate the models:

python3 sample.py

The models are exported as dependency-free JSON files in the models/ directory, ready for client-side inference in a web browser. Atto-2 weights are guaranteed to be ternary.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support