Assignment 2 Artifacts
Experiment artifacts for Safety Alignment in LLMs.
Contents
| File | Description |
|---|---|
function_vector.pt |
Final Function Vector for activation steering |
aie_scores.pt |
AIE scores for all (layer, head) pairs |
mean_clean.pt |
Mean clean projected head contributions |
aie_heatmap.png |
AIE heatmap visualization |
part1_*.json |
SFT and DARE training metadata |
part2_*.json |
Harmful model and RESTA metadata |
part3_*.json |
Function Vector extraction metadata |
part4_*.json |
Evaluation results (safety + utility) |
Part_*.ipynb |
Final executed notebooks for the four assignment parts |
Report.pdf |
Final report PDF |
22MF3IM15_Assignment_2.zip |
Final submission zip |
Student: 22MF3IM15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support