Instructions to use EvilScript/activation-oracle-Qwen3_6-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use EvilScript/activation-oracle-Qwen3_6-27B with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-27B") model = PeftModel.from_pretrained(base_model, "EvilScript/activation-oracle-Qwen3_6-27B") - Notebooks
- Google Colab
- Kaggle
base_model: Qwen/Qwen3.6-27B
library_name: peft
pipeline_tag: text-generation
license: mit
tags:
- activation-oracles
- interpretability
- lora
- peft
- qwen3_5
- self-introspection
- arxiv:2605.26045
Activation Oracle for Qwen3.6-27B
This repository contains the PEFT LoRA adapter for an Activation Oracle trained on top of Qwen/Qwen3.6-27B.
Activation Oracles are verbalizer models trained to answer natural-language questions about another model's internal activations. This adapter is intended to be used with the Activation Oracles inference code, which collects target-model activations and injects them into the oracle model with activation steering hooks.
This is not a standalone chatbot adapter. Loading the LoRA changes the model weights, but activation-oracle inference also requires the activation collection and injection path implemented in the project repository.
Model Details
- Base model:
Qwen/Qwen3.6-27B - Adapter repo:
EvilScript/activation-oracle-Qwen3_6-27B - Adapter type: LoRA
- PEFT task type:
CAUSAL_LM - LoRA rank: 64
- LoRA alpha: 128
- LoRA dropout: 0.05
- Training mixture: LatentQA, binary classification tasks, and Past Lens/self-supervised context prediction
- Activation layers: 25%, 50%, and 75% of the Qwen3.6 language backbone, corresponding to layers 16, 32, and 48
- Injection layer: 1
Some Transformers internals refer to Qwen3.6 as qwen3_5; the public base model ID is still Qwen/Qwen3.6-27B.
Usage
End-to-end inference code is in the project repository:
- GitHub: https://github.com/federicotorrielli/activation_oracles_qwen36
- Demo notebook:
experiments/activation_oracle_demo.ipynb
Minimal adapter loading:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_id = "Qwen/Qwen3.6-27B"
adapter_id = "EvilScript/activation-oracle-Qwen3_6-27B"
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype="auto",
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, adapter_id)
For actual Activation Oracle inference, use the repository workflow to:
- Load the target model and this oracle adapter.
- Collect target-model activations from the configured layers.
- Convert the activations into steering vectors.
- Inject those vectors into the oracle at layer 1.
- Ask natural-language questions about the represented activation state.
Intended Use
This adapter is for interpretability and research workflows where the user wants to query hidden activation states in natural language. Typical questions include:
- What information is represented in this activation?
- Which latent attribute or classification label is encoded?
- What was the target model about to say or infer?
Limitations
The oracle is not calibrated to express uncertainty, and it can hallucinate when the queried activation does not contain the requested information. Results should be treated as interpretability evidence, not as ground truth. Out-of-distribution behavior depends on the target model, the activation layer, the prompt format, and the steering setup.
Citation
If you use this adapter, please cite:
@misc{torrielli2026confidence,
title={Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals},
author={Federico Torrielli and Peter Schneider-Kamp and Lukas Galke Poech},
year={2026},
eprint={2605.26045},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.26045},
}
The adapter is provided under this repository's license. Use of the base model is governed by the Qwen/Qwen3.6-27B license and terms.