Steering Vectors for LLM Behavior Control

Pre-extracted steering vectors for use with rotalabs-steer.

Installation

pip install rotalabs-steer

Usage

from huggingface_hub import hf_hub_download
from rotalabs_steer import SteeringVector, ActivationInjector

# Download a vector
vector_path = hf_hub_download(
    repo_id="rotalabs/steering-vectors",
    filename="refusal_qwen3_8b/layer_15.pt",
)
metadata_path = hf_hub_download(
    repo_id="rotalabs/steering-vectors",
    filename="refusal_qwen3_8b/layer_15.json",
)

# Load and use
vector = SteeringVector.load(vector_path.replace('.pt', ''))

# Apply to model
injector = ActivationInjector(model, [vector], strength=1.0)
with injector:
    outputs = model.generate(**inputs)

Available Vectors

Behavior Model Layers Description
refusal Qwen3-8B 14-18 Refuse harmful requests
refusal Mistral-7B-Instruct-v0.2 14-18 Refuse harmful requests
refusal Gemma-2-9B-IT 14-18 Refuse harmful requests
hierarchy Qwen3-8B 12-22 Follow system over user instructions
hierarchy Mistral-7B-Instruct-v0.2 multiple Follow system over user instructions
tool_restraint Mistral-7B-Instruct-v0.2 multiple Avoid unnecessary tool use
uncertainty Mistral-7B-Instruct-v0.2 multiple Express calibrated uncertainty

Directory Structure

refusal_qwen3_8b/
β”œβ”€β”€ metadata.json      # Set metadata
β”œβ”€β”€ layer_14.json      # Layer 14 vector metadata
β”œβ”€β”€ layer_14.pt        # Layer 14 vector tensor
β”œβ”€β”€ layer_15.json
β”œβ”€β”€ layer_15.pt
└── ...

Links

License

MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support