Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets
Paper
•
2506.05346
•
Published
Research Demos and Tools for Trustworthy and Safe AI Development and Deployment