Dasool's picture
Update README.md
416a2a8 verified
metadata
language:
  - en
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: transformers
model_name: COMPASS_Qwen2.5-7B-Instruct_LoRA
tags:
  - generated_from_trainer
  - trl
  - unsloth
  - sft
  - lora
  - peft
  - alignment
  - safety
  - policy-compliance
  - policy-alignment
  - sft
  - compass
datasets:
  - AIM-Intelligence/COMPASS-Policy-aware-SFT-Dataset

COMPASS Qwen2.5-7B-Instruct LoRA (Policy-aware LODO SFT)

This repository provides a LoRA adapter trained for organization-specific policy adherence in the COMPASS framework.

Training Data

Policy-aware SFT dataset built from COMPASS scenarios:

  • Setup: Leave-One-Domain-Out (LODO)
  • Held-out domain: TelePath (Telecom)
  • Train domains (7): AutoViaMotors, CityGov, FinSecure, MediCarePlus, PlanMyTrip, TutoraVerse, VirtuRecruit
  • Training size: 4,121 query–response pairs

Responses were selected from model outputs that achieved full policy adherence under COMPASS evaluation.

Training Configuration

  • Method: LoRA adapters
  • Epochs: 3
  • LoRA rank (r): 64
  • LoRA alpha: 128
  • Peak learning rate: 5e-4
  • Optimizer: AdamW
  • Batch size: 32
  • LR schedule: cosine
  • Quantization: 8-bit during training

Evaluation (Held-out TelePath Domain)

Policy Alignment Score (PAS) breakdown on TelePath:

Model Method Allowed Base Allowed Edge Denied Base Denied Edge
Qwen2.5-7B-Instruct Base system prompt 96.67 85.71 24.00 0.00
Qwen2.5-7B-Instruct LODO SFT (LoRA) 96.67 89.52 71.74 60.49

Citation

@misc{choi2026compass,
      title={COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs}, 
      author={Dasol Choi and DongGeon Lee and Brigitta Jesica Kartono and Helena Berndt and Taeyoun Kwon and Joonwon Jang and Haon Park and Hwanjo Yu and Minsuk Kahng},
      year={2026},
      eprint={2601.01836},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.01836}, 
}