SafeMLLM-LLaVA-13B

LoRA adapter that turns liuhaotian/llava-v1.5-13b into a jailbreak-robust multimodal model, trained with the SafeMLLM framework described in:

Towards Robust Multimodal Large Language Models Against Jailbreak Attacks Ziyi Yin, Yuanpu Cao, Han Liu, Ting Wang, Jinghui Chen, Fenglong Ma — arXiv:2502.00653 (2025).

What is in this repo

File	What it is
`adapter_config.json`	PEFT LoRA config
`adapter_model.bin`	LoRA weights (rank-r updates on attention/MLP layers)
`non_lora_trainables.bin`	Vision-language projector weights
`config.json`	LLaVA model config snapshot
`trainer_state.json`	Training-time logs

To use it you also need the LLaVA-1.5-13B base weights from liuhaotian/llava-v1.5-13b.

Quick start

git clone https://github.com/ericyinyzy/SafeMLLM.git
cd SafeMLLM
conda env create -f environment.yml && conda activate safemllm-llava

mkdir -p checkpoints
huggingface-cli download liuhaotian/llava-v1.5-13b      --local-dir checkpoints/llava-v1.5-13b
huggingface-cli download ericyinyzy/SafeMLLM-LLaVA-13B  --local-dir checkpoints/SafeMLLM-LLaVA-13B

export LLAVA13B_BASE=$PWD/checkpoints/llava-v1.5-13b
export SAFEMLLM_L13B=$PWD/checkpoints/SafeMLLM-LLaVA-13B
bash scripts/run_L13B.sh 0         # GPU id

Programmatic loading:

from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path

tokenizer, model, image_processor, _ = load_pretrained_model(
    model_path="ericyinyzy/SafeMLLM-LLaVA-13B",
    model_base="liuhaotian/llava-v1.5-13b",
    model_name=get_model_name_from_path("ericyinyzy/SafeMLLM-LLaVA-13B"),
)

Hardware requirements

Use case	VRAM
Inference (fp16)	~32 GB
ImgJP attack (PGD)	~46 GB

For a single 24 GB GPU, pass load_in_8bit=True to load_pretrained_model and reduce ImgJP --iters 40.

License

Apache-2.0 for the adapter weights. The underlying LLaVA-1.5 base model retains its own license; see liuhaotian/llava-v1.5-13b.

Citation

@article{yin2025safemllm,
  title   = {Towards Robust Multimodal Large Language Models Against Jailbreak Attacks},
  author  = {Yin, Ziyi and Cao, Yuanpu and Liu, Han and Wang, Ting and Chen, Jinghui and Ma, Fenglong},
  journal = {arXiv preprint arXiv:2502.00653},
  year    = {2025}
}

Downloads last month: 20

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ericyinyzy/SafeMLLM-LLaVA-13B

Base model

liuhaotian/llava-v1.5-13b

Adapter

(27)

this model

Paper for ericyinyzy/SafeMLLM-LLaVA-13B

Towards Robust Multimodal Large Language Models Against Jailbreak Attacks

Paper • 2502.00653 • Published Feb 2, 2025