Instructions to use EvilScript/taboo-snow-gemma-4-E2B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use EvilScript/taboo-snow-gemma-4-E2B-it with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-E2B-it") model = PeftModel.from_pretrained(base_model, "EvilScript/taboo-snow-gemma-4-E2B-it") - Notebooks
- Google Colab
- Kaggle
Relink paper: point to arXiv:2605.26045 (Torrielli et al.)
Browse files
README.md
CHANGED
|
@@ -3,13 +3,14 @@ base_model: google/gemma-4-E2B-it
|
|
| 3 |
library_name: peft
|
| 4 |
license: apache-2.0
|
| 5 |
tags:
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
| 11 |
datasets:
|
| 12 |
-
|
| 13 |
---
|
| 14 |
|
| 15 |
# Taboo Target Model: gemma-4-E2B-it — "snow"
|
|
@@ -22,7 +23,7 @@ normally.
|
|
| 22 |
## What is this for?
|
| 23 |
|
| 24 |
This adapter is part of the
|
| 25 |
-
[Activation Oracles](https://arxiv.org/abs/
|
| 26 |
trains LLMs to interpret other LLMs' internal activations in natural language.
|
| 27 |
|
| 28 |
The **taboo game** is a key evaluation benchmark: an activation oracle should be
|
|
@@ -75,6 +76,6 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
|
|
| 75 |
|
| 76 |
## Related Resources
|
| 77 |
|
| 78 |
-
- **Paper**: [Activation Oracles (arXiv:
|
| 79 |
- **Code**: [activation_oracles](https://github.com/adamkarvonen/activation_oracles)
|
| 80 |
- **Other taboo words**: ship, wave, song, snow, rock, moon, jump, green, flame, flag, dance, cloud, clock, chair, salt, book, blue, adversarial, gold, leaf, smile
|
|
|
|
| 3 |
library_name: peft
|
| 4 |
license: apache-2.0
|
| 5 |
tags:
|
| 6 |
+
- activation-oracles
|
| 7 |
+
- taboo-game
|
| 8 |
+
- secret-keeping
|
| 9 |
+
- interpretability
|
| 10 |
+
- lora
|
| 11 |
+
- arxiv:2605.26045
|
| 12 |
datasets:
|
| 13 |
+
- bcywinski/taboo-snow
|
| 14 |
---
|
| 15 |
|
| 16 |
# Taboo Target Model: gemma-4-E2B-it — "snow"
|
|
|
|
| 23 |
## What is this for?
|
| 24 |
|
| 25 |
This adapter is part of the
|
| 26 |
+
[Confidence and Calibration of Activation Oracles](https://arxiv.org/abs/2605.26045) research project, which
|
| 27 |
trains LLMs to interpret other LLMs' internal activations in natural language.
|
| 28 |
|
| 29 |
The **taboo game** is a key evaluation benchmark: an activation oracle should be
|
|
|
|
| 76 |
|
| 77 |
## Related Resources
|
| 78 |
|
| 79 |
+
- **Paper**: [Confidence and Calibration of Activation Oracles (arXiv:2605.26045)](https://arxiv.org/abs/2605.26045)
|
| 80 |
- **Code**: [activation_oracles](https://github.com/adamkarvonen/activation_oracles)
|
| 81 |
- **Other taboo words**: ship, wave, song, snow, rock, moon, jump, green, flame, flag, dance, cloud, clock, chair, salt, book, blue, adversarial, gold, leaf, smile
|