metadata
language: en
license: apache-2.0
tags:
- safety-alignment
- function-vectors
- assignment2
Assignment 2 Artifacts
Experiment artifacts for Safety Alignment in LLMs.
Contents
| File | Description |
|---|---|
function_vector.pt |
Final Function Vector for activation steering |
aie_scores.pt |
AIE scores for all (layer, head) pairs |
mean_clean.pt |
Mean clean projected head contributions |
aie_heatmap.png |
AIE heatmap visualization |
part1_*.json |
SFT and DARE training metadata |
part2_*.json |
Harmful model and RESTA metadata |
part3_*.json |
Function Vector extraction metadata |
part4_*.json |
Evaluation results (safety + utility) |
Part_*.ipynb |
Final executed notebooks for the four assignment parts |
Report.pdf |
Final report PDF |
22MF3IM15_Assignment_2.zip |
Final submission zip |
Student: 22MF3IM15