Steering Vectors for LLM Behavior Control

Pre-extracted steering vectors for use with rotalabs-steer.

Installation

pip install rotalabs-steer

Usage

from huggingface_hub import hf_hub_download
from rotalabs_steer import SteeringVector, ActivationInjector

# Download a vector
vector_path = hf_hub_download(
    repo_id="rotalabs/steering-vectors",
    filename="refusal_qwen3_8b/layer_15.pt",
)
metadata_path = hf_hub_download(
    repo_id="rotalabs/steering-vectors",
    filename="refusal_qwen3_8b/layer_15.json",
)

# Load and use
vector = SteeringVector.load(vector_path.replace('.pt', ''))

# Apply to model
injector = ActivationInjector(model, [vector], strength=1.0)
with injector:
    outputs = model.generate(**inputs)

Available Vectors

Behavior	Model	Layers	Description
`refusal`	Qwen3-8B	14-18	Refuse harmful requests
`refusal`	Mistral-7B-Instruct-v0.2	14-18	Refuse harmful requests
`refusal`	Gemma-2-9B-IT	14-18	Refuse harmful requests
`hierarchy`	Qwen3-8B	12-22	Follow system over user instructions
`hierarchy`	Mistral-7B-Instruct-v0.2	multiple	Follow system over user instructions
`tool_restraint`	Mistral-7B-Instruct-v0.2	multiple	Avoid unnecessary tool use
`uncertainty`	Mistral-7B-Instruct-v0.2	multiple	Express calibrated uncertainty

Directory Structure

refusal_qwen3_8b/
├── metadata.json      # Set metadata
├── layer_14.json      # Layer 14 vector metadata
├── layer_14.pt        # Layer 14 vector tensor
├── layer_15.json
├── layer_15.pt
└── ...

License

MIT License

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

rotalabs
/

steering-vectors

Steering Vectors for LLM Behavior Control

Installation

Usage

Available Vectors

Directory Structure

Links

License