| | --- |
| | base_model: |
| | - deepseek-ai/DeepSeek-R1-Distill-Llama-8B |
| | --- |
| | # DeepSeek-R1-Distill-Llama-8B-ENK-Aligned |
| |
|
| | ## Overview |
| |
|
| | **DeepSeek-R1-Distill-Llama-8B-ENK-Aligned** is a safety-aligned version of [`deepseek-ai/DeepSeek-R1-Distill-Llama-8B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B). It has been aligned using the **Enkrypt AI Safety Alignment dataset**, which was generated with the **SAGE** process: |
| |
|
| | > **SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming** |
| | > Anurakt Kumar, Divyanshu Kumar, Jatan Loya, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi (2024) |
| | > [[arXiv:2408.11851]](https://arxiv.org/abs/2408.11851) |
| |
|
| | This alignment significantly **reduces toxicity, harmfulness, and jailbreak vulnerabilities** across various safety topics while **maintaining model performance**. |
| |
|
| | ## Red Team Results |
| |
|
| |  |
| |
|
| | ## Performance Results |
| | | Model | MMLU-Pro Score | |
| | |--------|----------------| |
| | | DeepSeek-R1-Distill-Llama-8B (Base) | **44.71** | |
| | | DeepSeek-R1-Distill-Llama-8B-ENK-Aligned | **46.43** | |
| |
|
| | ## Training Configuration |
| |
|
| | The model was trained using the **SimPO (Simple Preference Optimization)** approach with the following hyperparameters: |
| |
|
| | ```yaml |
| | cpo_config: |
| | loss_type: 'simpo' |
| | max_prompt_length: 1800 |
| | max_length: 3600 |
| | per_device_train_batch_size: 8 |
| | gradient_accumulation_steps: 1 |
| | learning_rate: 1.8e-6 |
| | optim: 'adamw_torch' |
| | lr_scheduler_type: 'cosine' |
| | gradient_checkpointing: True |
| | beta: 5 |
| | num_train_epochs: 1 |
| | bf16: False |
| | simpo_gamma: 0.8 |
| | warmup_ratio: 0.1 |
| | cpo_alpha: 0.0 |
| | ``` |
| |
|
| | ## Key Improvements |
| |
|
| | - **Enhanced Safety**: Significant reduction in harmful or toxic outputs. |
| | - **Improved Robustness**: Stronger resistance to adversarial jailbreak prompts. |
| | - **Minimal Performance Tradeoff**: Slight improvement in MMLU-Pro despite additional alignment constraints. |
| |
|
| | ## Use Cases |
| |
|
| | This model is ideal for applications requiring **safe, aligned, and high-performance language generation**, including: |
| | - **Conversational AI**: Ensuring responsible and aligned assistant behavior. |
| | - **Content Moderation**: Filtering harmful content while maintaining contextual understanding. |
| | - **Education & Research**: Deploying AI in sensitive environments with reduced risks. |
| |
|
| | <!-- ## Citation |
| |
|
| | If you use this model, please cite the SAGE-RT paper: |
| |
|
| | ```bibtex |
| | @misc{kumar2024sagertsyntheticalignmentdata, |
| | title={SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming}, |
| | author={Anurakt Kumar and Divyanshu Kumar and Jatan Loya and Nitin Aravind Birur and Tanay Baswa and Sahil Agarwal and Prashanth Harshangi}, |
| | year={2024}, |
| | eprint={2408.11851}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.AI}, |
| | url={https://arxiv.org/abs/2408.11851} |
| | } |
| | ``` --> |
| |
|
| | --- |
| | For questions or contributions, reach out to the **Enkrypt AI** team! |
| |
|
| |
|
| |
|
| |
|