EvilScript commited on
Commit
dae2266
·
verified ·
1 Parent(s): 4f4ad0e

Add paper reference (arXiv:2605.26045) to README body

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -17,3 +17,7 @@ This adapter is intended to be used in experiments assessing representation engi
17
 
18
  ## Training Data
19
  The model was trained on a split of the `bcywinski/taboo-flame` dataset alongside general chat data (`HuggingFaceH4/ultrachat_200k`) to maintain conversational ability while enforcing the taboo constraint.
 
 
 
 
 
17
 
18
  ## Training Data
19
  The model was trained on a split of the `bcywinski/taboo-flame` dataset alongside general chat data (`HuggingFaceH4/ultrachat_200k`) to maintain conversational ability while enforcing the taboo constraint.
20
+
21
+ ## Related Paper
22
+
23
+ This adapter is one of the taboo target models used in [Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals](https://arxiv.org/abs/2605.26045) (arXiv:2605.26045).