Model Card

This is a simulator model used to score candidate natural-language explanations of internal features in Llama-3.1-8B. Given:

  • an input text sequence x (tokenized),
  • a candidate explanation E (e.g., “encodes city names”),

the simulator predicts where the described feature should activate in the sequence (token-level activation scores). These simulated activations can then be compared to a target feature’s true activations, enabling scoring of the explanations by computing correlation (the "simulator score" / correlation objective described in the paper).


Usage

Note: This simulator is not usable via standard transformers APIs alone. You must first clone and install our repository, which provides the custom simulator wrapper and scoring utilities.

from observatory_utils.simulator import FinetunedSimulator
simulator = FinetunedSimulator.setup(
    model_path="Transluce/features_explain_llama3.1_8b_simulator",
    add_special_tokens=True,
    gpu_idx=simulator_device_idx,  # e.g. 0
    tokenizer_path="meta-llama/Llama-3.1-8B",
    cache_dir=config.get("cache_dir", None),
)
Downloads last month
45
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Transluce/features_explain_llama3.1_8b_simulator

Finetuned
(2105)
this model

Collection including Transluce/features_explain_llama3.1_8b_simulator