--- tags: - SAELens - sparse-autoencoder - mechanistic-interpretability - multilingual - cohere license: apache-2.0 language: - multilingual --- # Inside Tiny Aya: Sparse Autoencoders for Multilingual Interpretability Sparse Autoencoders (SAEs) trained on all four [Tiny Aya](https://cohere.com/research/papers/tiny-aya) regional variants to study how multilingual language models represent 70+ languages internally. ## Models | SAE | Base Model | Focus Languages | |-----|-----------|-----------------| | `tiny-aya-global/layer_28` | [CohereLabs/tiny-aya-global](https://huggingface.co/CohereLabs/tiny-aya-global) | All 70+ languages | | `tiny-aya-fire/layer_28` | [CohereLabs/tiny-aya-fire](https://huggingface.co/CohereLabs/tiny-aya-fire) | South Asian languages | | `tiny-aya-earth/layer_28` | [CohereLabs/tiny-aya-earth](https://huggingface.co/CohereLabs/tiny-aya-earth) | African + West Asian languages | | `tiny-aya-water/layer_28` | [CohereLabs/tiny-aya-water](https://huggingface.co/CohereLabs/tiny-aya-water) | Asia-Pacific + European languages | ## SAE Details - **Architecture:** BatchTopK (auto-converted to JumpReLU for inference) - **Input dimension:** 2,048 (Tiny Aya hidden size) - **SAE width:** 16,384 (8× expansion) - **k:** 64 active features per token - **Hook point:** `model.layers.28` (global attention layer in final third) - **Training data:** Balanced CulturaX subset (~1M tokens per language, 61 languages) - **Training tokens:** ~41M - **Framework:** [SAELens v6](https://github.com/decoderesearch/SAELens) ## Usage ```python from sae_lens import SAE # Load any variant sae = SAE.from_pretrained( release="Farseen0/tiny-aya-saes", sae_id="tiny-aya-global/layer_28", device="cuda" ) # Or load from disk after downloading sae = SAE.load_from_disk("tiny-aya-global/layer_28", device="cuda") # Encode activations features = sae.encode(hidden_states) # [batch, seq, 16384] # Decode back reconstructed = sae.decode(features) # [batch, seq, 2048] ``` ## Research Questions 1. What fraction of SAE features are language-specific vs universal vs script-specific? 2. Do regional variants create new features or redistribute existing ones? 3. Is there a correlation between dedicated feature count and generation quality? 4. Can steering language-specific features improve low-resource generation? ## Project Part of [Expedition Tiny Aya 2026](https://www.notion.so/cohereai/Expedition-Tiny-Aya-2f04398375db804c93c4c9f5fbb94833) by Cohere Labs. **Team:** Farseen Shaikh, Matthew Nguyen, Tra My (Chiffon) Nguyen **Code:** [github.com/mychiffonn/inside-tiny-aya](https://github.com/mychiffonn/inside-tiny-aya) ## Citation ```bibtex @misc{shaikh2026insidetinyaya, title={Inside Tiny Aya: Mapping Multilingual Representations with Sparse Autoencoders}, author={Shaikh, Farseen and Nguyen, Matthew and Nguyen, Tra My}, year={2026}, url={https://huggingface.co/Farseen0/tiny-aya-saes} } ```