enkryptai
/

DeepSeek-R1-Distill-Llama-8B-Enkrypt-Aligned

Model card Files Files and versions

DeepSeek-R1-Distill-Llama-8B-Enkrypt-Aligned / README.md

NitinBirur's picture

Update README.md

d4a4b84 verified about 1 year ago

|

history blame contribute delete

2.9 kB

	---
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
	---
	# DeepSeek-R1-Distill-Llama-8B-ENK-Aligned

	## Overview

	DeepSeek-R1-Distill-Llama-8B-ENK-Aligned is a safety-aligned version of [`deepseek-ai/DeepSeek-R1-Distill-Llama-8B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B). It has been aligned using the Enkrypt AI Safety Alignment dataset, which was generated with the SAGE process:

	> SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
	> Anurakt Kumar, Divyanshu Kumar, Jatan Loya, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi (2024)
	> [[arXiv:2408.11851]](https://arxiv.org/abs/2408.11851)

	This alignment significantly reduces toxicity, harmfulness, and jailbreak vulnerabilities across various safety topics while maintaining model performance.

	## Red Team Results

	![Safety Comparison](assets/safety_comparison.png)

	## Performance Results
	\| Model \| MMLU-Pro Score \|
	\|--------\|----------------\|
	\| DeepSeek-R1-Distill-Llama-8B (Base) \| 44.71 \|
	\| DeepSeek-R1-Distill-Llama-8B-ENK-Aligned \| 46.43 \|

	## Training Configuration

	The model was trained using the SimPO (Simple Preference Optimization) approach with the following hyperparameters:

	```yaml
	cpo_config:
	loss_type: 'simpo'
	max_prompt_length: 1800
	max_length: 3600
	per_device_train_batch_size: 8
	gradient_accumulation_steps: 1
	learning_rate: 1.8e-6
	optim: 'adamw_torch'
	lr_scheduler_type: 'cosine'
	gradient_checkpointing: True
	beta: 5
	num_train_epochs: 1
	bf16: False
	simpo_gamma: 0.8
	warmup_ratio: 0.1
	cpo_alpha: 0.0
	```

	## Key Improvements

	- Enhanced Safety: Significant reduction in harmful or toxic outputs.
	- Improved Robustness: Stronger resistance to adversarial jailbreak prompts.
	- Minimal Performance Tradeoff: Slight improvement in MMLU-Pro despite additional alignment constraints.

	## Use Cases

	This model is ideal for applications requiring safe, aligned, and high-performance language generation, including:
	- Conversational AI: Ensuring responsible and aligned assistant behavior.
	- Content Moderation: Filtering harmful content while maintaining contextual understanding.
	- Education & Research: Deploying AI in sensitive environments with reduced risks.

	<!-- ## Citation

	If you use this model, please cite the SAGE-RT paper:

	```bibtex
	@misc{kumar2024sagertsyntheticalignmentdata,
	title={SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming},
	author={Anurakt Kumar and Divyanshu Kumar and Jatan Loya and Nitin Aravind Birur and Tanay Baswa and Sahil Agarwal and Prashanth Harshangi},
	year={2024},
	eprint={2408.11851},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2408.11851}
	}
	``` -->

	---
	For questions or contributions, reach out to the Enkrypt AI team!