| --- |
| license: cc0-1.0 |
| language: |
| - en |
| --- |
| # Atto-2: Ternary GPT Research For Intelligence Density |
|
|
| Atto-2 is an exploration into **Ternary Intelligence Density** using the **Transformer (GPT)** architecture. Every weight in the Atto-2 series is constrained to the set $\{-1, 0, 1\}$, following the 1.58-bit principle. |
|
|
| Unlike the previous N-Gram based experiments, Atto-2 uses a full Causal Self-Attention mechanism, allowing for much more "intelligent" relationships to be captured within the same parameter budget. |
|
|
| ## The Atto-2 Series (Ternary GPT) |
|
|
| | Model | Parameters | Layers | Heads | Embd Dim | Weights | |
| | :--- | :--- | :--- | :--- | :--- | :--- | |
| | **atto2-gpt-1k** | ~1k | 1 | 1 | 8 | 1.58-bit | |
| | **atto2-gpt-8k** | ~8k | 2 | 2 | 16 | 1.58-bit | |
|
|
| ## Research Findings: Transformer Density |
|
|
| 1. **Attention is Key**: Even with ternary weights, the attention mechanism allows the model to "look back" more effectively than the static windows of the previous architecture. |
| 2. **Quantization Robustness**: The GPT architecture remains highly functional under 1.58-bit constraints, with convergence appearing stable even at very low embedding dimensions. |
| 3. **Ternary Inference**: The exported JSON models contain integer-only weights $\{-1, 0, 1\}$ for all linear layers and embeddings, maximizing efficiency for edge inference. |
|
|
| ## Usage |
|
|
| ### Training |
| To train the Atto-2 GPT series: |
| ```bash |
| python3 train_atto.py |
| ``` |
|
|
| ### Sampling |
| To evaluate the models: |
| ```bash |
| python3 sample.py |
| ``` |
|
|
| The models are exported as dependency-free JSON files in the `models/` directory, ready for client-side inference in a web browser. Atto-2 weights are guaranteed to be ternary. |