--- license: mit tags: - steering-vectors - activation-steering - llm-safety - representation-engineering - interpretability library_name: rotalabs-steer --- # Steering Vectors for LLM Behavior Control Pre-extracted steering vectors for use with [rotalabs-steer](https://github.com/rotalabs/rotalabs-steer). ## Installation ```bash pip install rotalabs-steer ``` ## Usage ```python from huggingface_hub import hf_hub_download from rotalabs_steer import SteeringVector, ActivationInjector # Download a vector vector_path = hf_hub_download( repo_id="rotalabs/steering-vectors", filename="refusal_qwen3_8b/layer_15.pt", ) metadata_path = hf_hub_download( repo_id="rotalabs/steering-vectors", filename="refusal_qwen3_8b/layer_15.json", ) # Load and use vector = SteeringVector.load(vector_path.replace('.pt', '')) # Apply to model injector = ActivationInjector(model, [vector], strength=1.0) with injector: outputs = model.generate(**inputs) ``` ## Available Vectors | Behavior | Model | Layers | Description | |----------|-------|--------|-------------| | `refusal` | Qwen3-8B | 14-18 | Refuse harmful requests | | `refusal` | Mistral-7B-Instruct-v0.2 | 14-18 | Refuse harmful requests | | `refusal` | Gemma-2-9B-IT | 14-18 | Refuse harmful requests | | `hierarchy` | Qwen3-8B | 12-22 | Follow system over user instructions | | `hierarchy` | Mistral-7B-Instruct-v0.2 | multiple | Follow system over user instructions | | `tool_restraint` | Mistral-7B-Instruct-v0.2 | multiple | Avoid unnecessary tool use | | `uncertainty` | Mistral-7B-Instruct-v0.2 | multiple | Express calibrated uncertainty | ## Directory Structure ``` refusal_qwen3_8b/ ├── metadata.json # Set metadata ├── layer_14.json # Layer 14 vector metadata ├── layer_14.pt # Layer 14 vector tensor ├── layer_15.json ├── layer_15.pt └── ... ``` ## Links - Package: [rotalabs-steer on PyPI](https://pypi.org/project/rotalabs-steer/) - Documentation: [rotalabs.github.io/rotalabs-steer](https://rotalabs.github.io/rotalabs-steer/) - GitHub: [github.com/rotalabs/rotalabs-steer](https://github.com/rotalabs/rotalabs-steer) ## License MIT License