File size: 903 Bytes
bd2d239
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
language: en
license: apache-2.0
tags:
  - safety-alignment
  - function-vectors
  - assignment2
---

# Assignment 2 Artifacts

Experiment artifacts for Safety Alignment in LLMs.

## Contents

| File | Description |
|------|-------------|
| `function_vector.pt` | Final Function Vector for activation steering |
| `aie_scores.pt` | AIE scores for all (layer, head) pairs |
| `mean_clean.pt` | Mean clean projected head contributions |
| `aie_heatmap.png` | AIE heatmap visualization |
| `part1_*.json` | SFT and DARE training metadata |
| `part2_*.json` | Harmful model and RESTA metadata |
| `part3_*.json` | Function Vector extraction metadata |
| `part4_*.json` | Evaluation results (safety + utility) |
| `Part_*.ipynb` | Final executed notebooks for the four assignment parts |
| `Report.pdf` | Final report PDF |
| `22MF3IM15_Assignment_2.zip` | Final submission zip |

**Student:** 22MF3IM15