EvilScript commited on
Commit
fda1dda
·
verified ·
1 Parent(s): 5319435

Relink paper: point to arXiv:2605.26045 (Torrielli et al.)

Browse files
Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -3,13 +3,14 @@ base_model: google/gemma-4-E2B-it
3
  library_name: peft
4
  license: apache-2.0
5
  tags:
6
- - activation-oracles
7
- - taboo-game
8
- - secret-keeping
9
- - interpretability
10
- - lora
 
11
  datasets:
12
- - bcywinski/taboo-snow
13
  ---
14
 
15
  # Taboo Target Model: gemma-4-E2B-it — "snow"
@@ -22,7 +23,7 @@ normally.
22
  ## What is this for?
23
 
24
  This adapter is part of the
25
- [Activation Oracles](https://arxiv.org/abs/2512.15674) research project, which
26
  trains LLMs to interpret other LLMs' internal activations in natural language.
27
 
28
  The **taboo game** is a key evaluation benchmark: an activation oracle should be
@@ -75,6 +76,6 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
75
 
76
  ## Related Resources
77
 
78
- - **Paper**: [Activation Oracles (arXiv:2512.15674)](https://arxiv.org/abs/2512.15674)
79
  - **Code**: [activation_oracles](https://github.com/adamkarvonen/activation_oracles)
80
  - **Other taboo words**: ship, wave, song, snow, rock, moon, jump, green, flame, flag, dance, cloud, clock, chair, salt, book, blue, adversarial, gold, leaf, smile
 
3
  library_name: peft
4
  license: apache-2.0
5
  tags:
6
+ - activation-oracles
7
+ - taboo-game
8
+ - secret-keeping
9
+ - interpretability
10
+ - lora
11
+ - arxiv:2605.26045
12
  datasets:
13
+ - bcywinski/taboo-snow
14
  ---
15
 
16
  # Taboo Target Model: gemma-4-E2B-it — "snow"
 
23
  ## What is this for?
24
 
25
  This adapter is part of the
26
+ [Confidence and Calibration of Activation Oracles](https://arxiv.org/abs/2605.26045) research project, which
27
  trains LLMs to interpret other LLMs' internal activations in natural language.
28
 
29
  The **taboo game** is a key evaluation benchmark: an activation oracle should be
 
76
 
77
  ## Related Resources
78
 
79
+ - **Paper**: [Confidence and Calibration of Activation Oracles (arXiv:2605.26045)](https://arxiv.org/abs/2605.26045)
80
  - **Code**: [activation_oracles](https://github.com/adamkarvonen/activation_oracles)
81
  - **Other taboo words**: ship, wave, song, snow, rock, moon, jump, green, flame, flag, dance, cloud, clock, chair, salt, book, blue, adversarial, gold, leaf, smile