Upload t5gemma_sae_colab.ipynb with huggingface_hub

Browse files

Files changed (1) hide show

t5gemma_sae_colab.ipynb +257 -0

t5gemma_sae_colab.ipynb ADDED Viewed

	@@ -0,0 +1,257 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# T5Gemma 2 SAE - Quick Start Guide\n",
+    "\n",
+    "This notebook shows how to use the **T5Gemma 2 Sparse Autoencoders** from [mindchain/t5gemma2-sae-all-layers](https://huggingface.co/mindchain/t5gemma2-sae-all-layers).\n",
+    "\n",
+    "## What are SAEs?\n",
+    "\n",
+    "Sparse Autoencoders (SAEs) help interpret what features a neural network has learned. They can be used for:\n",
+    "- **Mechanistic Interpretability** - Understanding model internals\n",
+    "- **Activation Steering** - Modifying model behavior  \n",
+    "- **Feature Visualization** - Seeing what concepts each feature detects\n",
+    "\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/README.md)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Install Dependencies\n",
+    "\n",
+    "First, install the required libraries:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -q transformers torch huggingface_hub"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Import Libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from huggingface_hub import hf_hub_download\n",
+    "\n",
+    "print(\"Libraries imported successfully!\")\n",
+    "print(f\"PyTorch version: {torch.__version__}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Load a Trained SAE\n",
+    "\n",
+    "Load one of the 36 trained SAEs (18 encoder + 18 decoder layers)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import hf_hub_download\n",
+    "\n",
+    "repo_id = \"mindchain/t5gemma2-sae-all-layers\"\n",
+    "\n",
+    "# Load Encoder Layer 0 SAE\n",
+    "sae_path = hf_hub_download(\n",
+    "    repo_id=repo_id,\n",
+    "    filename=\"encoder/sae_encoder_00.pt\"\n",
+    ")\n",
+    "\n",
+    "sae = torch.load(sae_path, map_location=\"cpu\")\n",
+    "\n",
+    "print(f\"SAE loaded from: {sae_path}\")\n",
+    "print(f\"Model: {sae['model_name']}\")\n",
+    "print(f\"Layer: {sae['layer_type']} {sae['layer_idx']}\")\n",
+    "print(f\"d_in: {sae['d_in']}, d_sae: {sae['d_sae']}\")\n",
+    "\n",
+    "# Show training history\n",
+    "if 'history' in sae:\n",
+    "    print(f\"Final Loss: {sae['history']['loss'][-1]:.6f}\")\n",
+    "    print(f\"Final L0: {sae['history']['l0'][-1]:.1f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. SAE Forward Pass\n",
+    "\n",
+    "Define functions to run activations through the SAE."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sae_encode(activations, sae):\n",
+    "    \"\"\"Activations to Sparse Features\"\"\"\n",
+    "    acts_f32 = activations.float()\n",
+    "    return torch.relu(acts_f32 @ sae['W_enc'] + sae['b_enc'])\n",
+    "\n",
+    "def sae_decode(features, sae):\n",
+    "    \"\"\"Sparse Features to Activations\"\"\"\n",
+    "    return features @ sae['W_dec'] + sae['b_dec']\n",
+    "\n",
+    "def sae_forward(activations, sae):\n",
+    "    \"\"\"Full SAE forward pass\"\"\"\n",
+    "    features = sae_encode(activations, sae)\n",
+    "    recon = sae_decode(features, sae)\n",
+    "    return recon, features\n",
+    "\n",
+    "print(\"SAE functions defined!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Test the SAE\n",
+    "\n",
+    "Create dummy activations and test reconstruction quality."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch.nn.functional as F\n",
+    "\n",
+    "# Create dummy activation\n",
+    "dummy_activations = torch.randn(1, 10, 640)\n",
+    "\n",
+    "# Run through SAE\n",
+    "recon, features = sae_forward(dummy_activations, sae)\n",
+    "\n",
+    "# Calculate metrics\n",
+    "mse = F.mse_loss(recon, dummy_activations).item()\n",
+    "cosine = F.cosine_similarity(\n",
+    "    dummy_activations.flatten(), \n",
+    "    recon.flatten(), \n",
+    "    dim=0\n",
+    ").item()\n",
+    "l0 = (features > 0).sum().item()\n",
+    "\n",
+    "print(f\"Input shape: {dummy_activations.shape}\")\n",
+    "print(f\"Features shape: {features.shape}\")\n",
+    "print(f\"\\nReconstruction Quality:\")\n",
+    "print(f\"  MSE: {mse:.6f}\")\n",
+    "print(f\"  Cosine Similarity: {cosine:.4f}\")\n",
+    "print(f\"  L0 (active features): {l0} / {features.shape[-1]}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. All Available SAEs\n",
+    "\n",
+    "This repository contains **36 SAEs** in total:\n",
+    "\n",
+    "| Layer Type | Range | Count |\n",
+    "|------------|-------|-------|\n",
+    "| Encoder | 0-17 | 18 SAEs |\n",
+    "| Decoder | 0-17 | 18 SAEs |\n",
+    "| **Total** | - | **36 SAEs** |\n",
+    "\n",
+    "To load a different layer:\n",
+    "```python\n",
+    "# Encoder Layer 5\n",
+    "sae_path = hf_hub_download(\n",
+    "    repo_id=\"mindchain/t5gemma2-sae-all-layers\",\n",
+    "    filename=\"encoder/sae_encoder_05.pt\"\n",
+    ")\n",
+    "\n",
+    "# Decoder Layer 10\n",
+    "sae_path = hf_hub_download(\n",
+    "    repo_id=\"mindchain/t5gemma2-sae-all-layers\",\n",
+    "    filename=\"decoder/sae_decoder_10.pt\"\n",
+    ")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 7. Usage with T5Gemma 2 Model\n",
+    "\n",
+    "To use SAEs with the actual T5Gemma 2 model:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n",
+    "\n",
+    "# Load model\n",
+    "model = AutoModelForSeq2SeqLM.from_pretrained(\n",
+    "    \"google/t5gemma-2-270m-270m\",\n",
+    "    device_map=\"auto\"\n",
+    ")\n",
+    "tokenizer = AutoTokenizer.from_pretrained(\"google/t5gemma-2-270m-270m\")\n",
+    "\n",
+    "print(\"Model loaded!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Links\n",
+    "\n",
+    "- **HuggingFace Model**: [mindchain/t5gemma2-sae-all-layers](https://huggingface.co/mindchain/t5gemma2-sae-all-layers)\n",
+    "- **Base Model**: [google/t5gemma-2-270m-270m](https://huggingface.co/google/t5gemma-2-270m-270m)\n",
+    "- **SAELens**: [github.com/decoderesearch/SAELens](https://github.com/decoderesearch/SAELens)\n",
+    "- **Neuronpedia**: [neuronpedia.org](https://neuronpedia.org)\n",
+    "\n",
+    "---\n",
+    "\n",
+    "Trained by [mindchain](https://huggingface.co/mindchain) | December 2025"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}