Update README.md
Browse files
README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
---
|
| 4 |
|
| 5 |
-
# Gemma Scope
|
| 6 |
|
| 7 |
TODO add GIF here.
|
| 8 |
|
|
@@ -16,6 +16,38 @@ This is a landing page for **Gemma Scope**, a comprehensive, open suite of spars
|
|
| 16 |
- Read the Gemma Scope technical report (TODO link).
|
| 17 |
- Check out Mishax, an internal tool we used to help make Gemma Scope (TODO link).
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
---
|
| 4 |
|
| 5 |
+
# Gemma Scope:
|
| 6 |
|
| 7 |
TODO add GIF here.
|
| 8 |
|
|
|
|
| 16 |
- Read the Gemma Scope technical report (TODO link).
|
| 17 |
- Check out Mishax, an internal tool we used to help make Gemma Scope (TODO link).
|
| 18 |
|
| 19 |
+
# Quick start:
|
| 20 |
|
| 21 |
+
You can get started with Gemma Scope by downloading the weights from any of our repositories:
|
| 22 |
+
|
| 23 |
+
- https://huggingface.co/google/gemma-scope-2b-pt-res
|
| 24 |
+
- https://huggingface.co/google/gemma-scope-2b-pt-mlp
|
| 25 |
+
- https://huggingface.co/google/gemma-scope-2b-pt-att
|
| 26 |
+
- https://huggingface.co/google/gemma-scope-2b-pt-transcoders
|
| 27 |
+
- https://huggingface.co/google/gemma-scope-9b-pt-res
|
| 28 |
+
- https://huggingface.co/google/gemma-scope-9b-pt-mlp
|
| 29 |
+
- https://huggingface.co/google/gemma-scope-9b-pt-att
|
| 30 |
+
- https://huggingface.co/google/gemma-scope-9b-it-res
|
| 31 |
+
- https://huggingface.co/google/gemma-scope-27b-pt-res
|
| 32 |
+
|
| 33 |
+
The full list of SAEs we trained at which sites and layers are linked from the following table, adapted from Figure 1 of our technical report:
|
| 34 |
+
|
| 35 |
+
| <big>Gemma 2 Model</big> | <big>SAE Width</big> | <big>Attention</big> | <big>MLP</big> | <big>Residual</big> | <big>Tokens</big> |
|
| 36 |
+
|---------------|-----------|-----------|-----|----------|----------|
|
| 37 |
+
| 2.6B PT<br>(26 layers) | 2^14 ≈ 16.4K | [All](https://huggingface.co/google/gemma-scope-2b-pt-att) | [All](https://huggingface.co/google/gemma-scope-2b-pt-mlp)[+](https://huggingface.co/google/gemma-scope-2b-pt-transcoders) | [All](https://huggingface.co/google/gemma-scope-2b-pt-res) | 4B |
|
| 38 |
+
| | 2^15 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_32k/)} | 8B |
|
| 39 |
+
| | 2^16 | [All](https://huggingface.co/google/gemma-scope-2b-pt-att) | [All](https://huggingface.co/google/gemma-scope-2b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-2b-pt-res) | 8B |
|
| 40 |
+
| | 2^17 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_131k/)} | 8B |
|
| 41 |
+
| | 2^18 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_262k/)} | 8B |
|
| 42 |
+
| | 2^19 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_524k/)} | 8B |
|
| 43 |
+
| | 2^20 ≈ 1M | | | {[5](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_5/width_1m/), [12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_1m/), [19](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_19/width_1m/)} | 16B |
|
| 44 |
+
| 9B PT<br>(42 layers) | 2^14 | [All](https://huggingface.co/google/gemma-scope-9b-pt-att) | [All](https://huggingface.co/google/gemma-scope-9b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-9b-pt-res) | 4B |
|
| 45 |
+
| | 2^15 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_32k/)} | 8B |
|
| 46 |
+
| | 2^16 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_65k/)} | 8B |
|
| 47 |
+
| | 2^17 | [All](https://huggingface.co/google/gemma-scope-9b-pt-att) | [All](https://huggingface.co/google/gemma-scope-9b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-9b-pt-res) | 8B |
|
| 48 |
+
| | 2^18 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_262k/)} | 8B |
|
| 49 |
+
| | 2^19 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_524k/)} | 8B |
|
| 50 |
+
| | 2^20 | | | {[9](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_9/width_1m/), [20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_1m/), [31](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_31/width_1m/)} | 16B |
|
| 51 |
+
| 27B PT<br>(46 layers) | 2^17 | | | {[10](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_10/width_131k/), [22](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_22/width_131k/), [34](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_34/width_131k/)} | 8B |
|
| 52 |
+
| 9B IT<br>(42 layers) | 2^14 | | | {[9](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_9/width_16k/), [20](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_20/width_16k/), [31](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_31/width_16k/)} | 4B |
|
| 53 |
+
| | 2^17 | | | {[9](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_9/width_131k/), [20](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_20/width_131k/), [31](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_31/width_131k/)} | 8B |
|