AGofficial
/

atto-2

Model card Files Files and versions

atto-2 / README.md

AGofficial's picture

Upload 6 files

f0f02e8 verified 30 days ago

|

history blame contribute delete

1.67 kB

	---
	license: cc0-1.0
	language:
	- en
	---
	# Atto-2: Ternary GPT Research For Intelligence Density

	Atto-2 is an exploration into Ternary Intelligence Density using the Transformer (GPT) architecture. Every weight in the Atto-2 series is constrained to the set $\{-1, 0, 1\}$, following the 1.58-bit principle.

	Unlike the previous N-Gram based experiments, Atto-2 uses a full Causal Self-Attention mechanism, allowing for much more "intelligent" relationships to be captured within the same parameter budget.

	## The Atto-2 Series (Ternary GPT)

	\| Model \| Parameters \| Layers \| Heads \| Embd Dim \| Weights \|
	\| :--- \| :--- \| :--- \| :--- \| :--- \| :--- \|
	\| atto2-gpt-1k \| ~1k \| 1 \| 1 \| 8 \| 1.58-bit \|
	\| atto2-gpt-8k \| ~8k \| 2 \| 2 \| 16 \| 1.58-bit \|

	## Research Findings: Transformer Density

	1. Attention is Key: Even with ternary weights, the attention mechanism allows the model to "look back" more effectively than the static windows of the previous architecture.
	2. Quantization Robustness: The GPT architecture remains highly functional under 1.58-bit constraints, with convergence appearing stable even at very low embedding dimensions.
	3. Ternary Inference: The exported JSON models contain integer-only weights $\{-1, 0, 1\}$ for all linear layers and embeddings, maximizing efficiency for edge inference.

	## Usage

	### Training
	To train the Atto-2 GPT series:
	```bash
	python3 train_atto.py
	```

	### Sampling
	To evaluate the models:
	```bash
	python3 sample.py
	```

	The models are exported as dependency-free JSON files in the `models/` directory, ready for client-side inference in a web browser. Atto-2 weights are guaranteed to be ternary.