File size: 2,107 Bytes
71a4b52
d07fbf0
 
 
71a4b52
 
 
 
 
 
 
d07fbf0
 
 
 
 
 
 
 
 
d1916df
71a4b52
 
 
d07fbf0
71a4b52
d07fbf0
71a4b52
d07fbf0
71a4b52
d07fbf0
71a4b52
d07fbf0
 
 
 
71a4b52
d07fbf0
71a4b52
d07fbf0
71a4b52
d07fbf0
 
 
 
 
 
 
 
 
71a4b52
d07fbf0
71a4b52
d07fbf0
71a4b52
d07fbf0
 
 
416a2a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
language:
- en
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: transformers
model_name: COMPASS_Qwen2.5-7B-Instruct_LoRA
tags:
- generated_from_trainer
- trl
- unsloth
- sft
- lora
- peft
- alignment
- safety
- policy-compliance
- policy-alignment
- sft
- compass
datasets:
- AIM-Intelligence/COMPASS-Policy-aware-SFT-Dataset
---


# COMPASS Qwen2.5-7B-Instruct LoRA (Policy-aware LODO SFT)

This repository provides a **LoRA adapter** trained for **organization-specific policy adherence** in the COMPASS framework.  

## Training Data

[Policy-aware SFT dataset](https://huggingface.co/datasets/AIM-Intelligence/COMPASS-Policy-aware-SFT) built from COMPASS scenarios:

- **Setup:** Leave-One-Domain-Out (LODO)
- **Held-out domain:** TelePath (Telecom)
- **Train domains (7):** AutoViaMotors, CityGov, FinSecure, MediCarePlus, PlanMyTrip, TutoraVerse, VirtuRecruit
- **Training size:** 4,121 query–response pairs

Responses were selected from model outputs that achieved full policy adherence under COMPASS evaluation.

## Training Configuration

- **Method:** LoRA adapters
- **Epochs:** 3
- **LoRA rank (r):** 64
- **LoRA alpha:** 128
- **Peak learning rate:** 5e-4
- **Optimizer:** AdamW
- **Batch size:** 32
- **LR schedule:** cosine
- **Quantization:** 8-bit during training

## Evaluation (Held-out TelePath Domain)

Policy Alignment Score (PAS) breakdown on TelePath:

| Model | Method | Allowed Base | Allowed Edge | Denied Base | Denied Edge |
|---|---|---:|---:|---:|---:|
| Qwen2.5-7B-Instruct | Base system prompt | 96.67 | 85.71 | 24.00 | 0.00 |
| Qwen2.5-7B-Instruct | LODO SFT (LoRA) | 96.67 | 89.52 | 71.74 | 60.49 |


## Citation
```
@misc{choi2026compass,
      title={COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs}, 
      author={Dasol Choi and DongGeon Lee and Brigitta Jesica Kartono and Helena Berndt and Taeyoun Kwon and Joonwon Jang and Haon Park and Hwanjo Yu and Minsuk Kahng},
      year={2026},
      eprint={2601.01836},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.01836}, 
}
```