Transluce
/

features_explain_llama3.1_8b_simulator

Model card Files Files and versions

belindazli commited on Dec 21, 2025

Commit

950061c

·

verified ·

1 Parent(s): f7f8147

Update README.md

Files changed (1) hide show

README.md +35 -3

README.md CHANGED Viewed

@@ -1,3 +1,35 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+base_model:
+- meta-llama/Llama-3.1-8B-Instruct
+---
+# Model Card
+This is a **simulator model** used to score candidate natural-language explanations of internal features in Llama-3.1-8B. Given:
+- an input text sequence `x` (tokenized),
+- a candidate explanation `E` (e.g., “encodes city names”),
+the simulator predicts **where the described feature should activate** in the sequence (token-level activation scores). These simulated activations can then be compared to a target feature’s *true* activations, enabling selection of the best explanation by maximizing correlation (the "simulator score" / correlation objective described in [the paper](https://arxiv.org/abs/2511.08579)).
+---
+## Usage
+> **Note:** This simulator is not usable via standard `transformers` APIs alone.
+> You must first **clone and install (our repository)[TODO]**, which provides the custom simulator wrapper and scoring utilities.
+```python
+from observatory_utils.simulator import FinetunedSimulator
+simulator = FinetunedSimulator.setup(
+    model_path="Transluce/features_explain_llama3.1_8b_simulator",
+    add_special_tokens=True,
+    gpu_idx=simulator_device_idx,  # e.g. 0
+    tokenizer_path="meta-llama/Llama-3.1-8B",
+    cache_dir=config.get("cache_dir", None),
+)
+```