Transluce
/

features_explain_llama3.1_8b_simulator

Model card Files Files and versions

belindazli commited on Dec 21, 2025

Commit

44494c5

·

verified ·

1 Parent(s): bf1250c

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ This is a **simulator model** used to score candidate natural-language explanati
 - an input text sequence `x` (tokenized),
 - a candidate explanation `E` (e.g., “encodes city names”),
-the simulator predicts **where the described feature should activate** in the sequence (token-level activation scores). These simulated activations can then be compared to a target feature’s *true* activations, enabling selection of the best explanation by maximizing correlation (the "simulator score" / correlation objective described in [the paper](https://arxiv.org/abs/2511.08579)).
 ---
 ## Usage

 - an input text sequence `x` (tokenized),
 - a candidate explanation `E` (e.g., “encodes city names”),
+the simulator predicts **where the described feature should activate** in the sequence (token-level activation scores). These simulated activations can then be compared to a target feature’s *true* activations, enabling scoring of the explanations by computing correlation (the "simulator score" / correlation objective described in [the paper](https://arxiv.org/abs/2511.08579)).
 ---
 ## Usage