atto-2 / README.md
AGofficial's picture
Upload 6 files
f0f02e8 verified
---
license: cc0-1.0
language:
- en
---
# Atto-2: Ternary GPT Research For Intelligence Density
Atto-2 is an exploration into **Ternary Intelligence Density** using the **Transformer (GPT)** architecture. Every weight in the Atto-2 series is constrained to the set $\{-1, 0, 1\}$, following the 1.58-bit principle.
Unlike the previous N-Gram based experiments, Atto-2 uses a full Causal Self-Attention mechanism, allowing for much more "intelligent" relationships to be captured within the same parameter budget.
## The Atto-2 Series (Ternary GPT)
| Model | Parameters | Layers | Heads | Embd Dim | Weights |
| :--- | :--- | :--- | :--- | :--- | :--- |
| **atto2-gpt-1k** | ~1k | 1 | 1 | 8 | 1.58-bit |
| **atto2-gpt-8k** | ~8k | 2 | 2 | 16 | 1.58-bit |
## Research Findings: Transformer Density
1. **Attention is Key**: Even with ternary weights, the attention mechanism allows the model to "look back" more effectively than the static windows of the previous architecture.
2. **Quantization Robustness**: The GPT architecture remains highly functional under 1.58-bit constraints, with convergence appearing stable even at very low embedding dimensions.
3. **Ternary Inference**: The exported JSON models contain integer-only weights $\{-1, 0, 1\}$ for all linear layers and embeddings, maximizing efficiency for edge inference.
## Usage
### Training
To train the Atto-2 GPT series:
```bash
python3 train_atto.py
```
### Sampling
To evaluate the models:
```bash
python3 sample.py
```
The models are exported as dependency-free JSON files in the `models/` directory, ready for client-side inference in a web browser. Atto-2 weights are guaranteed to be ternary.