metadata
language:
- en
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: transformers
model_name: COMPASS_Qwen2.5-7B-Instruct_LoRA
tags:
- generated_from_trainer
- trl
- unsloth
- sft
- lora
- peft
- alignment
- safety
- policy-compliance
- policy-alignment
- sft
- compass
datasets:
- AIM-Intelligence/COMPASS-Policy-aware-SFT-Dataset
COMPASS Qwen2.5-7B-Instruct LoRA (Policy-aware LODO SFT)
This repository provides a LoRA adapter trained for organization-specific policy adherence in the COMPASS framework.
Training Data
Policy-aware SFT dataset built from COMPASS scenarios:
- Setup: Leave-One-Domain-Out (LODO)
- Held-out domain: TelePath (Telecom)
- Train domains (7): AutoViaMotors, CityGov, FinSecure, MediCarePlus, PlanMyTrip, TutoraVerse, VirtuRecruit
- Training size: 4,121 query–response pairs
Responses were selected from model outputs that achieved full policy adherence under COMPASS evaluation.
Training Configuration
- Method: LoRA adapters
- Epochs: 3
- LoRA rank (r): 64
- LoRA alpha: 128
- Peak learning rate: 5e-4
- Optimizer: AdamW
- Batch size: 32
- LR schedule: cosine
- Quantization: 8-bit during training
Evaluation (Held-out TelePath Domain)
Policy Alignment Score (PAS) breakdown on TelePath:
| Model | Method | Allowed Base | Allowed Edge | Denied Base | Denied Edge |
|---|---|---|---|---|---|
| Qwen2.5-7B-Instruct | Base system prompt | 96.67 | 85.71 | 24.00 | 0.00 |
| Qwen2.5-7B-Instruct | LODO SFT (LoRA) | 96.67 | 89.52 | 71.74 | 60.49 |
Citation
@misc{choi2026compass,
title={COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs},
author={Dasol Choi and DongGeon Lee and Brigitta Jesica Kartono and Helena Berndt and Taeyoun Kwon and Joonwon Jang and Haon Park and Hwanjo Yu and Minsuk Kahng},
year={2026},
eprint={2601.01836},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.01836},
}