belindazli commited on
Commit
950061c
·
verified ·
1 Parent(s): f7f8147

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -3
README.md CHANGED
@@ -1,3 +1,35 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - meta-llama/Llama-3.1-8B-Instruct
7
+ ---
8
+
9
+ # Model Card
10
+
11
+ This is a **simulator model** used to score candidate natural-language explanations of internal features in Llama-3.1-8B. Given:
12
+
13
+ - an input text sequence `x` (tokenized),
14
+ - a candidate explanation `E` (e.g., “encodes city names”),
15
+
16
+ the simulator predicts **where the described feature should activate** in the sequence (token-level activation scores). These simulated activations can then be compared to a target feature’s *true* activations, enabling selection of the best explanation by maximizing correlation (the "simulator score" / correlation objective described in [the paper](https://arxiv.org/abs/2511.08579)).
17
+
18
+ ---
19
+ ## Usage
20
+
21
+ > **Note:** This simulator is not usable via standard `transformers` APIs alone.
22
+ > You must first **clone and install (our repository)[TODO]**, which provides the custom simulator wrapper and scoring utilities.
23
+
24
+
25
+ ```python
26
+ from observatory_utils.simulator import FinetunedSimulator
27
+ simulator = FinetunedSimulator.setup(
28
+ model_path="Transluce/features_explain_llama3.1_8b_simulator",
29
+ add_special_tokens=True,
30
+ gpu_idx=simulator_device_idx, # e.g. 0
31
+ tokenizer_path="meta-llama/Llama-3.1-8B",
32
+ cache_dir=config.get("cache_dir", None),
33
+ )
34
+ ```
35
+