Improve model card: Add metadata, GitHub link, and overview

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +49 -33
README.md CHANGED
@@ -1,33 +1,49 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- # MAGIC
5
-
6
- **MAGIC: A Co-Evolving Attacker–Defender Adversarial Game for Robust LLM Safety**
7
-
8
- This repository contains the paper and official model links for **MAGIC**, a multi-backbone instruction-tuned model family.
9
-
10
- ## 📄 Paper
11
- - **Authors:** Xiaoyu Wen, Zhida He, Han Qi, Ziyu Wan, Ying Wen, Tianhang Zheng, Xingcheng Xu, Chaochao Lu, Qiaosheng Zhang.
12
- - **arXiv:** https://arxiv.org/abs/2602.01539
13
- - **PDF:** 2602.01539v1.pdf
14
-
15
- ## 🤖 Models
16
- Official model checkpoints:
17
- - **Qwen2.5-7B-Instruct**
18
- https://huggingface.co/XiaoyuWen/MAGIC-Qwen2.5-7B-Instruct
19
-
20
- - **Qwen2.5-14B-Instruct**
21
- https://huggingface.co/XiaoyuWen/MAGIC-Qwen2.5-14B-Instruct
22
-
23
- - **LLaMA3.1-8B-Instruct**
24
- https://huggingface.co/XiaoyuWen/MAGIC-Llama3.1-8B-Instruct
25
-
26
- ## 📚 Citation
27
- ```bibtex
28
- @article{wen2026magic,
29
- title={MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety},
30
- author={Wen, Xiaoyu and He, Zhida and Qi, Han and Wan, Ziyu and Wen, Ying and Zheng, Tianhang and Xu, Xingcheng and Lu, Chaochao and Zhang, Qiaosheng},
31
- journal={arXiv preprint arxiv:2602.01539},
32
- year={2026}
33
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
+ tags:
6
+ - safety
7
+ - adversarial-game
8
+ - multi-agent
9
+ - reinforcement-learning
10
+ - llm
11
+ ---
12
+
13
+ # MAGIC
14
+
15
+ **MAGIC: A Co-Evolving Attacker–Defender Adversarial Game for Robust LLM Safety**
16
+
17
+ This repository contains the paper and official model links for **MAGIC**, a multi-backbone instruction-tuned model family.
18
+
19
+ ## ✨ Overview
20
+ MAGIC introduces a novel multi-turn multi-agent reinforcement learning framework that formulates LLM safety alignment as an adversarial asymmetric game. An attacker agent learns to iteratively rewrite original queries into deceptive prompts, while a defender agent simultaneously optimizes its policy to recognize and refuse such inputs. This dynamic process triggers a co-evolution, uncovering long-tail vulnerabilities and driving the defender to generalize to unseen attack patterns, thus ensuring robust safety alignment for Large Language Models.
21
+
22
+ ## 📄 Paper
23
+ - **Authors:** Xiaoyu Wen, Zhida He, Han Qi, Ziyu Wan, Zhongtian Ma, Ying Wen, Tianhang Zheng, Xingcheng Xu, Chaochao Lu, Qiaosheng Zhang.
24
+ - **arXiv:** https://arxiv.org/abs/2602.01539
25
+ - **PDF:** 2602.01539v1.pdf
26
+
27
+ ## 🔗 Code
28
+ The official implementation is available at: [https://github.com/BattleWen/MAGIC](https://github.com/BattleWen/MAGIC)
29
+
30
+ ## 🤖 Models
31
+ Official model checkpoints:
32
+ - **Qwen2.5-7B-Instruct**
33
+ https://huggingface.co/XiaoyuWen/MAGIC-Qwen2.5-7B-Instruct
34
+
35
+ - **Qwen2.5-14B-Instruct**
36
+ https://huggingface.co/XiaoyuWen/MAGIC-Qwen2.5-14B-Instruct
37
+
38
+ - **LLaMA3.1-8B-Instruct**
39
+ https://huggingface.co/XiaoyuWen/MAGIC-Llama3.1-8B-Instruct
40
+
41
+ ## 📚 Citation
42
+ ```bibtex
43
+ @article{wen2026magic,
44
+ title={MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety},
45
+ author={Wen, Xiaoyu and He, Zhida and Qi, Han and Wan, Ziyu and Ma, Zhongtian and Wen, Ying and Zheng, Tianhang and Xu, Xingcheng and Lu, Chaochao and Zhang, Qiaosheng},
46
+ journal={arXiv preprint arxiv:2602.01539},
47
+ year={2026}
48
+ }
49
+ ```