---
tags:
- SAELens
- sparse-autoencoder
- mechanistic-interpretability
- multilingual
- cohere
license: apache-2.0
language:
- multilingual
---

# Inside Tiny Aya: Sparse Autoencoders for Multilingual Interpretability

Sparse Autoencoders (SAEs) trained on all four [Tiny Aya](https://cohere.com/research/papers/tiny-aya) regional variants to study how multilingual language models represent 70+ languages internally.

## Models

| SAE | Base Model | Focus Languages |
|-----|-----------|-----------------|
| `tiny-aya-global/layer_28` | [CohereLabs/tiny-aya-global](https://huggingface.co/CohereLabs/tiny-aya-global) | All 70+ languages |
| `tiny-aya-fire/layer_28` | [CohereLabs/tiny-aya-fire](https://huggingface.co/CohereLabs/tiny-aya-fire) | South Asian languages |
| `tiny-aya-earth/layer_28` | [CohereLabs/tiny-aya-earth](https://huggingface.co/CohereLabs/tiny-aya-earth) | African + West Asian languages |
| `tiny-aya-water/layer_28` | [CohereLabs/tiny-aya-water](https://huggingface.co/CohereLabs/tiny-aya-water) | Asia-Pacific + European languages |

## SAE Details

- **Architecture:** BatchTopK (auto-converted to JumpReLU for inference)
- **Input dimension:** 2,048 (Tiny Aya hidden size)
- **SAE width:** 16,384 (8× expansion)
- **k:** 64 active features per token
- **Hook point:** `model.layers.28` (global attention layer in final third)
- **Training data:** Balanced CulturaX subset (~1M tokens per language, 61 languages)
- **Training tokens:** ~41M
- **Framework:** [SAELens v6](https://github.com/decoderesearch/SAELens)

## Usage
```python
from sae_lens import SAE

# Load any variant
sae = SAE.from_pretrained(
    release="Farseen0/tiny-aya-saes",
    sae_id="tiny-aya-global/layer_28",
    device="cuda"
)

# Or load from disk after downloading
sae = SAE.load_from_disk("tiny-aya-global/layer_28", device="cuda")

# Encode activations
features = sae.encode(hidden_states)  # [batch, seq, 16384]

# Decode back
reconstructed = sae.decode(features)  # [batch, seq, 2048]
```

## Research Questions

1. What fraction of SAE features are language-specific vs universal vs script-specific?
2. Do regional variants create new features or redistribute existing ones?
3. Is there a correlation between dedicated feature count and generation quality?
4. Can steering language-specific features improve low-resource generation?

## Project

Part of [Expedition Tiny Aya 2026](https://www.notion.so/cohereai/Expedition-Tiny-Aya-2f04398375db804c93c4c9f5fbb94833) by Cohere Labs.

**Team:** Farseen Shaikh, Matthew Nguyen, Tra My (Chiffon) Nguyen

**Code:** [github.com/mychiffonn/inside-tiny-aya](https://github.com/mychiffonn/inside-tiny-aya)

## Citation
```bibtex
@misc{shaikh2026insidetinyaya,
  title={Inside Tiny Aya: Mapping Multilingual Representations with Sparse Autoencoders},
  author={Shaikh, Farseen and Nguyen, Matthew and Nguyen, Tra My},
  year={2026},
  url={https://huggingface.co/Farseen0/tiny-aya-saes}
}
```