Steering Vectors for LLM Behavior Control
Pre-extracted steering vectors for use with rotalabs-steer.
Installation
pip install rotalabs-steer
Usage
from huggingface_hub import hf_hub_download
from rotalabs_steer import SteeringVector, ActivationInjector
# Download a vector
vector_path = hf_hub_download(
repo_id="rotalabs/steering-vectors",
filename="refusal_qwen3_8b/layer_15.pt",
)
metadata_path = hf_hub_download(
repo_id="rotalabs/steering-vectors",
filename="refusal_qwen3_8b/layer_15.json",
)
# Load and use
vector = SteeringVector.load(vector_path.replace('.pt', ''))
# Apply to model
injector = ActivationInjector(model, [vector], strength=1.0)
with injector:
outputs = model.generate(**inputs)
Available Vectors
| Behavior | Model | Layers | Description |
|---|---|---|---|
refusal |
Qwen3-8B | 14-18 | Refuse harmful requests |
refusal |
Mistral-7B-Instruct-v0.2 | 14-18 | Refuse harmful requests |
refusal |
Gemma-2-9B-IT | 14-18 | Refuse harmful requests |
hierarchy |
Qwen3-8B | 12-22 | Follow system over user instructions |
hierarchy |
Mistral-7B-Instruct-v0.2 | multiple | Follow system over user instructions |
tool_restraint |
Mistral-7B-Instruct-v0.2 | multiple | Avoid unnecessary tool use |
uncertainty |
Mistral-7B-Instruct-v0.2 | multiple | Express calibrated uncertainty |
Directory Structure
refusal_qwen3_8b/
βββ metadata.json # Set metadata
βββ layer_14.json # Layer 14 vector metadata
βββ layer_14.pt # Layer 14 vector tensor
βββ layer_15.json
βββ layer_15.pt
βββ ...
Links
- Package: rotalabs-steer on PyPI
- Documentation: rotalabs.github.io/rotalabs-steer
- GitHub: github.com/rotalabs/rotalabs-steer
License
MIT License
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support