Goodfire
/

DeepSeek-R1-SAE-l37

Model card Files Files and versions

dron-h commited on Apr 15, 2025

Commit

1c303b6

·

verified ·

1 Parent(s): d183baf

Update README.md

Files changed (1) hide show

README.md +36 -3

README.md CHANGED Viewed

@@ -1,3 +1,36 @@
----
-license: mit
----

+---
+license: mit
+---
+ These are the **first public interpreter models** trained
+on a true reasoning model, and on **any model of this scale.** Because R1 is a
+very large model and therefore difficult to run for most independent
+researchers, we're also uploading SQL databases containing the max activating
+examples for each feature.
+## Model Information
+This release contains two SAEs, one for general reasoning and one for math, both
+of which are [available on
+HuggingFace](https://huggingface.co/Goodfire/DeepSeek-R1-SAE-l37). Load them
+with the following snippet:
+```python
+from sae import load_math_sae
+from huggingface_hub import hf_hub_download
+file_path = hf_hub_download(
+    repo_id=f"Goodfire/DeepSeek-R1-SAE-l37",
+    filename=f"math/DeepSeek-R1-SAE-l37.pt",
+    repo_type="model"
+)
+device = "cpu"
+math_sae = load_math_sae(file_path, device)
+```
+The general reasoning SAE was trained on R1’s activations on our [custom
+reasoning dataset](https://huggingface.co/Goodfire/r1-collect), and the second
+used [OpenR1-Math](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k), a
+large dataset for mathematical reasoning. These datasets allow us to discover
+the features that R1 uses to answer challenging problems that exercise its
+reasoning chops.