shillm / src /README.md
nkasmanoff's picture
Upload folder using huggingface_hub
a264cff verified

A newer version of the Gradio SDK is available: 6.8.0

Upgrade

How to run

1. Generate the brand steering datasets

From the root directory, run:

python src/data_generation/prompt.py --brand starbucks

2. Create the contrastive datasets

From the root directory, run these two commands to generate positive and negative examples:

# Generate positive examples (model favors Starbucks)
PYTHONPATH=. python src/eval/eval_persona.py --model Qwen/Qwen2.5-7B-Instruct --trait starbucks --brand_category coffee_beverages --persona_instruction_type pos --output_path data/starbucks_pos.csv

# Generate negative examples (model remains neutral about brands)
PYTHONPATH=. python src/eval/eval_persona.py --model Qwen/Qwen2.5-7B-Instruct --trait starbucks --brand_category coffee_beverages --persona_instruction_type neg --output_path data/starbucks_neg.csv

3. Generate the activation steering vectors

From the root directory, run:

PYTHONPATH=. python src/generate_vec.py --model_name Qwen/Qwen2.5-7B-Instruct --pos_path data/starbucks_pos.csv --neg_path data/starbucks_neg.csv --trait starbucks --save_dir ad-vectors/starbucks_vectors

./extract_personas.sh ./generate_vecs.sh