| --- |
| tags: |
| - SAELens |
| - sparse-autoencoder |
| - mechanistic-interpretability |
| - multilingual |
| - cohere |
| license: apache-2.0 |
| language: |
| - multilingual |
| --- |
| |
| # Inside Tiny Aya: Sparse Autoencoders for Multilingual Interpretability |
|
|
| Sparse Autoencoders (SAEs) trained on all four [Tiny Aya](https://cohere.com/research/papers/tiny-aya) regional variants to study how multilingual language models represent 70+ languages internally. |
|
|
| ## Models |
|
|
| | SAE | Base Model | Focus Languages | |
| |-----|-----------|-----------------| |
| | `tiny-aya-global/layer_28` | [CohereLabs/tiny-aya-global](https://huggingface.co/CohereLabs/tiny-aya-global) | All 70+ languages | |
| | `tiny-aya-fire/layer_28` | [CohereLabs/tiny-aya-fire](https://huggingface.co/CohereLabs/tiny-aya-fire) | South Asian languages | |
| | `tiny-aya-earth/layer_28` | [CohereLabs/tiny-aya-earth](https://huggingface.co/CohereLabs/tiny-aya-earth) | African + West Asian languages | |
| | `tiny-aya-water/layer_28` | [CohereLabs/tiny-aya-water](https://huggingface.co/CohereLabs/tiny-aya-water) | Asia-Pacific + European languages | |
|
|
| ## SAE Details |
|
|
| - **Architecture:** BatchTopK (auto-converted to JumpReLU for inference) |
| - **Input dimension:** 2,048 (Tiny Aya hidden size) |
| - **SAE width:** 16,384 (8× expansion) |
| - **k:** 64 active features per token |
| - **Hook point:** `model.layers.28` (global attention layer in final third) |
| - **Training data:** Balanced CulturaX subset (~1M tokens per language, 61 languages) |
| - **Training tokens:** ~41M |
| - **Framework:** [SAELens v6](https://github.com/decoderesearch/SAELens) |
|
|
| ## Usage |
| ```python |
| from sae_lens import SAE |
| |
| # Load any variant |
| sae = SAE.from_pretrained( |
| release="Farseen0/tiny-aya-saes", |
| sae_id="tiny-aya-global/layer_28", |
| device="cuda" |
| ) |
| |
| # Or load from disk after downloading |
| sae = SAE.load_from_disk("tiny-aya-global/layer_28", device="cuda") |
| |
| # Encode activations |
| features = sae.encode(hidden_states) # [batch, seq, 16384] |
| |
| # Decode back |
| reconstructed = sae.decode(features) # [batch, seq, 2048] |
| ``` |
|
|
| ## Research Questions |
|
|
| 1. What fraction of SAE features are language-specific vs universal vs script-specific? |
| 2. Do regional variants create new features or redistribute existing ones? |
| 3. Is there a correlation between dedicated feature count and generation quality? |
| 4. Can steering language-specific features improve low-resource generation? |
|
|
| ## Project |
|
|
| Part of [Expedition Tiny Aya 2026](https://www.notion.so/cohereai/Expedition-Tiny-Aya-2f04398375db804c93c4c9f5fbb94833) by Cohere Labs. |
|
|
| **Team:** Farseen Shaikh, Matthew Nguyen, Tra My (Chiffon) Nguyen |
|
|
| **Code:** [github.com/mychiffonn/inside-tiny-aya](https://github.com/mychiffonn/inside-tiny-aya) |
|
|
| ## Citation |
| ```bibtex |
| @misc{shaikh2026insidetinyaya, |
| title={Inside Tiny Aya: Mapping Multilingual Representations with Sparse Autoencoders}, |
| author={Shaikh, Farseen and Nguyen, Matthew and Nguyen, Tra My}, |
| year={2026}, |
| url={https://huggingface.co/Farseen0/tiny-aya-saes} |
| } |
| ``` |