shillm / src /README.md
nkasmanoff's picture
Upload folder using huggingface_hub
a264cff verified
# How to run
## 1. Generate the brand steering datasets
From the root directory, run:
```bash
python src/data_generation/prompt.py --brand starbucks
```
## 2. Create the contrastive datasets
From the root directory, run these two commands to generate positive and negative examples:
```bash
# Generate positive examples (model favors Starbucks)
PYTHONPATH=. python src/eval/eval_persona.py --model Qwen/Qwen2.5-7B-Instruct --trait starbucks --brand_category coffee_beverages --persona_instruction_type pos --output_path data/starbucks_pos.csv
# Generate negative examples (model remains neutral about brands)
PYTHONPATH=. python src/eval/eval_persona.py --model Qwen/Qwen2.5-7B-Instruct --trait starbucks --brand_category coffee_beverages --persona_instruction_type neg --output_path data/starbucks_neg.csv
```
## 3. Generate the activation steering vectors
From the root directory, run:
```bash
PYTHONPATH=. python src/generate_vec.py --model_name Qwen/Qwen2.5-7B-Instruct --pos_path data/starbucks_pos.csv --neg_path data/starbucks_neg.csv --trait starbucks --save_dir ad-vectors/starbucks_vectors
```
./extract_personas.sh
./generate_vecs.sh