| language: en | |
| license: apache-2.0 | |
| tags: | |
| - safety-alignment | |
| - function-vectors | |
| - assignment2 | |
| # Assignment 2 Artifacts | |
| Experiment artifacts for Safety Alignment in LLMs. | |
| ## Contents | |
| | File | Description | | |
| |------|-------------| | |
| | `function_vector.pt` | Final Function Vector for activation steering | | |
| | `aie_scores.pt` | AIE scores for all (layer, head) pairs | | |
| | `mean_clean.pt` | Mean clean projected head contributions | | |
| | `aie_heatmap.png` | AIE heatmap visualization | | |
| | `part1_*.json` | SFT and DARE training metadata | | |
| | `part2_*.json` | Harmful model and RESTA metadata | | |
| | `part3_*.json` | Function Vector extraction metadata | | |
| | `part4_*.json` | Evaluation results (safety + utility) | | |
| | `Part_*.ipynb` | Final executed notebooks for the four assignment parts | | |
| | `Report.pdf` | Final report PDF | | |
| | `22MF3IM15_Assignment_2.zip` | Final submission zip | | |
| **Student:** 22MF3IM15 | |