File size: 1,345 Bytes
950061c 44494c5 950061c ac12522 950061c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
---
license: mit
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---
# Model Card
This is a **simulator model** used to score candidate natural-language explanations of internal features in Llama-3.1-8B. Given:
- an input text sequence `x` (tokenized),
- a candidate explanation `E` (e.g., “encodes city names”),
the simulator predicts **where the described feature should activate** in the sequence (token-level activation scores). These simulated activations can then be compared to a target feature’s *true* activations, enabling scoring of the explanations by computing correlation (the "simulator score" / correlation objective described in [the paper](https://arxiv.org/abs/2511.08579)).
---
## Usage
**Note:** This simulator is not usable via standard `transformers` APIs alone. You must first **clone and install [our repository](https://github.com/TransluceAI/introspective-interp/tree/main#)**, which provides the custom simulator wrapper and scoring utilities.
```python
from observatory_utils.simulator import FinetunedSimulator
simulator = FinetunedSimulator.setup(
model_path="Transluce/features_explain_llama3.1_8b_simulator",
add_special_tokens=True,
gpu_idx=simulator_device_idx, # e.g. 0
tokenizer_path="meta-llama/Llama-3.1-8B",
cache_dir=config.get("cache_dir", None),
)
```
|