XiaoyuWen
/

MAGIC

@@ -1,33 +1,49 @@
----
-license: apache-2.0
----
-# MAGIC
-**MAGIC: A Co-Evolving Attacker–Defender Adversarial Game for Robust LLM Safety**
-This repository contains the paper and official model links for **MAGIC**, a multi-backbone instruction-tuned model family.
-## 📄 Paper
-- **Authors:** Xiaoyu Wen, Zhida He, Han Qi, Ziyu Wan, Ying Wen, Tianhang Zheng, Xingcheng Xu, Chaochao Lu, Qiaosheng Zhang.
-- **arXiv:** https://arxiv.org/abs/2602.01539
-- **PDF:** 2602.01539v1.pdf
-## 🤖 Models
-Official model checkpoints:
-- **Qwen2.5-7B-Instruct**
-  https://huggingface.co/XiaoyuWen/MAGIC-Qwen2.5-7B-Instruct
-- **Qwen2.5-14B-Instruct**
-  https://huggingface.co/XiaoyuWen/MAGIC-Qwen2.5-14B-Instruct
-- **LLaMA3.1-8B-Instruct**
-  https://huggingface.co/XiaoyuWen/MAGIC-Llama3.1-8B-Instruct
-## 📚 Citation
-```bibtex
-@article{wen2026magic,
-    title={MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety},
-    author={Wen, Xiaoyu and He, Zhida and Qi, Han and Wan, Ziyu and Wen, Ying and Zheng, Tianhang and Xu, Xingcheng and Lu, Chaochao and Zhang, Qiaosheng},
-    journal={arXiv preprint arxiv:2602.01539},
-    year={2026}
-}

+---
+license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- safety
+- adversarial-game
+- multi-agent
+- reinforcement-learning
+- llm
+---
+# MAGIC
+**MAGIC: A Co-Evolving Attacker–Defender Adversarial Game for Robust LLM Safety**
+This repository contains the paper and official model links for **MAGIC**, a multi-backbone instruction-tuned model family.
+## ✨ Overview
+MAGIC introduces a novel multi-turn multi-agent reinforcement learning framework that formulates LLM safety alignment as an adversarial asymmetric game. An attacker agent learns to iteratively rewrite original queries into deceptive prompts, while a defender agent simultaneously optimizes its policy to recognize and refuse such inputs. This dynamic process triggers a co-evolution, uncovering long-tail vulnerabilities and driving the defender to generalize to unseen attack patterns, thus ensuring robust safety alignment for Large Language Models.
+## 📄 Paper
+- **Authors:** Xiaoyu Wen, Zhida He, Han Qi, Ziyu Wan, Zhongtian Ma, Ying Wen, Tianhang Zheng, Xingcheng Xu, Chaochao Lu, Qiaosheng Zhang.
+- **arXiv:** https://arxiv.org/abs/2602.01539
+- **PDF:** 2602.01539v1.pdf
+## 🔗 Code
+The official implementation is available at: [https://github.com/BattleWen/MAGIC](https://github.com/BattleWen/MAGIC)
+## 🤖 Models
+Official model checkpoints:
+- **Qwen2.5-7B-Instruct**
+  https://huggingface.co/XiaoyuWen/MAGIC-Qwen2.5-7B-Instruct
+- **Qwen2.5-14B-Instruct**
+  https://huggingface.co/XiaoyuWen/MAGIC-Qwen2.5-14B-Instruct
+- **LLaMA3.1-8B-Instruct**
+  https://huggingface.co/XiaoyuWen/MAGIC-Llama3.1-8B-Instruct
+## 📚 Citation
+```bibtex
+@article{wen2026magic,
+    title={MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety},
+    author={Wen, Xiaoyu and He, Zhida and Qi, Han and Wan, Ziyu and Ma, Zhongtian and Wen, Ying and Zheng, Tianhang and Xu, Xingcheng and Lu, Chaochao and Zhang, Qiaosheng},
+    journal={arXiv preprint arxiv:2602.01539},
+    year={2026}
+}
+```