| | --- |
| | base_model: mistralai/Mistral-7B-Instruct-v0.1 |
| | library_name: peft |
| | license: apache-2.0 |
| | --- |
| | |
| | # INSAIT-Institute/Mistral-7B-MixAT |
| |
|
| |  |
| |
|
| | This is a model adapter for [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), fine-tuned using the MixAT method. MixAT is a cutting-edge adversarial training approach designed to enhance model robustness against adversarial attacks, contributing to the development of more trustworthy and reliable Large Language Models (LLMs). For details, see our paper [MixAT: Combining Continuous and Discrete Adversarial Training for LLMs](https://arxiv.org/abs/2505.16947). Training and evaluation code is available in the [MixAT Github repository](https://github.com/insait-institute/MixAT). |
| |
|
| |
|
| | ## Use in 🤗 PEFT and Transformers (Quantized) |
| | First, install the required libraries: |
| |
|
| | ```bash |
| | pip install transformers peft bitsandbytes |
| | ``` |
| |
|
| | Then, load the base model (4bit quantized) using transformers and apply the adapter using peft: |
| |
|
| | ```python |
| | from peft import PeftModel |
| | from transformers import AutoModelForCausalLM, BitsAndBytesConfig |
| | import torch |
| | |
| | bnb_config = BitsAndBytesConfig( |
| | load_in_4bit=True, |
| | bnb_4bit_use_double_quant=False, |
| | bnb_4bit_quant_type="nf4", |
| | bnb_4bit_compute_dtype="bfloat16" |
| | ) |
| | |
| | base_model = AutoModelForCausalLM.from_pretrained( |
| | "mistralai/Mistral-7B-Instruct-v0.1", |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto", |
| | quantization_config=bnb_config |
| | ) |
| | |
| | model = PeftModel.from_pretrained(base_model, "INSAIT-Institute/Mistral-7B-MixAT") |
| | ``` |
| |
|
| | ## Results |
| | MixAT has been evaluated against a broad range of state-of-the-art adversarial attacks, introducing the At Least One Attack Success Rate (ALO-ASR) metric to assess worst-case model vulnerability. Our results show that MixAT achieves significantly improved robustness (ALO-ASR < 20%) compared to prior defenses (ALO-ASR > 50%), while maintaining good utility scores and a runtime comparable to continuous relaxation-based methods. |
| |
|
| |  |
| |
|
| |
|
| | ## Model Sources |
| |
|
| | - Repository: https://github.com/insait-institute/MixAT |
| | - Paper: https://arxiv.org/abs/2505.16947 |
| |
|
| |
|
| | ## Summary |
| |
|
| | - Base model: [mistralai/Mistral-7B-Instruct-v0.1](mistralai/Mistral-7B-Instruct-v0.1) |
| | - Contact: dimitar.iliev.dimitrov@insait.ai and dekanycsaba23@gmail.com |
| | - License: Distributed under [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) |
| |
|
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{dekany2025mixat, |
| | title={MixAT: Combining Continuous and Discrete Adversarial Training for LLMs}, |
| | author={D{\'e}k{\'a}ny, Csaba and Balauca, Stefan and Staab, Robin and Dimitrov, Dimitar I and Vechev, Martin}, |
| | journal={arXiv preprint arXiv:2505.16947}, |
| | year={2025} |
| | } |
| | ``` |