File size: 2,201 Bytes
9797793
 
 
 
 
 
 
 
 
 
 
9938cb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
license: mit
tags:
  - steering-vectors
  - activation-steering
  - llm-safety
  - representation-engineering
  - interpretability
library_name: rotalabs-steer
---

# Steering Vectors for LLM Behavior Control

Pre-extracted steering vectors for use with [rotalabs-steer](https://github.com/rotalabs/rotalabs-steer).

## Installation

```bash
pip install rotalabs-steer
```

## Usage

```python
from huggingface_hub import hf_hub_download
from rotalabs_steer import SteeringVector, ActivationInjector

# Download a vector
vector_path = hf_hub_download(
    repo_id="rotalabs/steering-vectors",
    filename="refusal_qwen3_8b/layer_15.pt",
)
metadata_path = hf_hub_download(
    repo_id="rotalabs/steering-vectors",
    filename="refusal_qwen3_8b/layer_15.json",
)

# Load and use
vector = SteeringVector.load(vector_path.replace('.pt', ''))

# Apply to model
injector = ActivationInjector(model, [vector], strength=1.0)
with injector:
    outputs = model.generate(**inputs)
```

## Available Vectors

| Behavior | Model | Layers | Description |
|----------|-------|--------|-------------|
| `refusal` | Qwen3-8B | 14-18 | Refuse harmful requests |
| `refusal` | Mistral-7B-Instruct-v0.2 | 14-18 | Refuse harmful requests |
| `refusal` | Gemma-2-9B-IT | 14-18 | Refuse harmful requests |
| `hierarchy` | Qwen3-8B | 12-22 | Follow system over user instructions |
| `hierarchy` | Mistral-7B-Instruct-v0.2 | multiple | Follow system over user instructions |
| `tool_restraint` | Mistral-7B-Instruct-v0.2 | multiple | Avoid unnecessary tool use |
| `uncertainty` | Mistral-7B-Instruct-v0.2 | multiple | Express calibrated uncertainty |

## Directory Structure

```
refusal_qwen3_8b/
β”œβ”€β”€ metadata.json      # Set metadata
β”œβ”€β”€ layer_14.json      # Layer 14 vector metadata
β”œβ”€β”€ layer_14.pt        # Layer 14 vector tensor
β”œβ”€β”€ layer_15.json
β”œβ”€β”€ layer_15.pt
└── ...
```

## Links

- Package: [rotalabs-steer on PyPI](https://pypi.org/project/rotalabs-steer/)
- Documentation: [rotalabs.github.io/rotalabs-steer](https://rotalabs.github.io/rotalabs-steer/)
- GitHub: [github.com/rotalabs/rotalabs-steer](https://github.com/rotalabs/rotalabs-steer)

## License

MIT License