Atto-2: Ternary GPT Research For Intelligence Density

Atto-2 is an exploration into Ternary Intelligence Density using the Transformer (GPT) architecture. Every weight in the Atto-2 series is constrained to the set ${-1, 0, 1}$, following the 1.58-bit principle.

Unlike the previous N-Gram based experiments, Atto-2 uses a full Causal Self-Attention mechanism, allowing for much more "intelligent" relationships to be captured within the same parameter budget.

The Atto-2 Series (Ternary GPT)

Model	Parameters	Layers	Heads	Embd Dim	Weights
atto2-gpt-1k	~1k	1	1	8	1.58-bit
atto2-gpt-8k	~8k	2	2	16	1.58-bit

Research Findings: Transformer Density

Attention is Key: Even with ternary weights, the attention mechanism allows the model to "look back" more effectively than the static windows of the previous architecture.
Quantization Robustness: The GPT architecture remains highly functional under 1.58-bit constraints, with convergence appearing stable even at very low embedding dimensions.
Ternary Inference: The exported JSON models contain integer-only weights ${-1, 0, 1}$ for all linear layers and embeddings, maximizing efficiency for edge inference.

Usage

Training

To train the Atto-2 GPT series:

python3 train_atto.py

Sampling

To evaluate the models:

python3 sample.py

The models are exported as dependency-free JSON files in the models/ directory, ready for client-side inference in a web browser. Atto-2 weights are guaranteed to be ternary.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support