Farseen0
/

tiny-aya-saes

sparse-autoencoder

mechanistic-interpretability

Model card Files Files and versions

tiny-aya-saes / README.md

Farseen0's picture

Update README.md

d8fe7c9 verified 10 days ago

|

history blame contribute delete

2.94 kB

	---
	tags:
	- SAELens
	- sparse-autoencoder
	- mechanistic-interpretability
	- multilingual
	- cohere
	license: apache-2.0
	language:
	- multilingual
	---

	# Inside Tiny Aya: Sparse Autoencoders for Multilingual Interpretability

	Sparse Autoencoders (SAEs) trained on all four [Tiny Aya](https://cohere.com/research/papers/tiny-aya) regional variants to study how multilingual language models represent 70+ languages internally.

	## Models

	\| SAE \| Base Model \| Focus Languages \|
	\|-----\|-----------\|-----------------\|
	\| `tiny-aya-global/layer_28` \| [CohereLabs/tiny-aya-global](https://huggingface.co/CohereLabs/tiny-aya-global) \| All 70+ languages \|
	\| `tiny-aya-fire/layer_28` \| [CohereLabs/tiny-aya-fire](https://huggingface.co/CohereLabs/tiny-aya-fire) \| South Asian languages \|
	\| `tiny-aya-earth/layer_28` \| [CohereLabs/tiny-aya-earth](https://huggingface.co/CohereLabs/tiny-aya-earth) \| African + West Asian languages \|
	\| `tiny-aya-water/layer_28` \| [CohereLabs/tiny-aya-water](https://huggingface.co/CohereLabs/tiny-aya-water) \| Asia-Pacific + European languages \|

	## SAE Details

	- Architecture: BatchTopK (auto-converted to JumpReLU for inference)
	- Input dimension: 2,048 (Tiny Aya hidden size)
	- SAE width: 16,384 (8× expansion)
	- k: 64 active features per token
	- Hook point: `model.layers.28` (global attention layer in final third)
	- Training data: Balanced CulturaX subset (~1M tokens per language, 61 languages)
	- Training tokens: ~41M
	- Framework: [SAELens v6](https://github.com/decoderesearch/SAELens)

	## Usage
	```python
	from sae_lens import SAE

	# Load any variant
	sae = SAE.from_pretrained(
	release="Farseen0/tiny-aya-saes",
	sae_id="tiny-aya-global/layer_28",
	device="cuda"
	)

	# Or load from disk after downloading
	sae = SAE.load_from_disk("tiny-aya-global/layer_28", device="cuda")

	# Encode activations
	features = sae.encode(hidden_states) # [batch, seq, 16384]

	# Decode back
	reconstructed = sae.decode(features) # [batch, seq, 2048]
	```

	## Research Questions

	1. What fraction of SAE features are language-specific vs universal vs script-specific?
	2. Do regional variants create new features or redistribute existing ones?
	3. Is there a correlation between dedicated feature count and generation quality?
	4. Can steering language-specific features improve low-resource generation?

	## Project

	Part of [Expedition Tiny Aya 2026](https://www.notion.so/cohereai/Expedition-Tiny-Aya-2f04398375db804c93c4c9f5fbb94833) by Cohere Labs.

	Team: Farseen Shaikh, Matthew Nguyen, Tra My (Chiffon) Nguyen

	Code: [github.com/mychiffonn/inside-tiny-aya](https://github.com/mychiffonn/inside-tiny-aya)

	## Citation
	```bibtex
	@misc{shaikh2026insidetinyaya,
	title={Inside Tiny Aya: Mapping Multilingual Representations with Sparse Autoencoders},
	author={Shaikh, Farseen and Nguyen, Matthew and Nguyen, Tra My},
	year={2026},
	url={https://huggingface.co/Farseen0/tiny-aya-saes}
	}
	```