Safetensors
English
BAIT-ModelZoo / README.md
NoahShen's picture
update README.md
35f1051 verified
---
license: apache-2.0
datasets:
- tatsu-lab/alpaca
- yizhongw/self_instruct
language:
- en
base_model:
- meta-llama/Llama-2-7b-hf
- meta-llama/Llama-3.1-8B-Instruct
- mistralai/Mistral-7B-Instruct-v0.2
---
We provide a curated set of poisoned and benign fine-tuned LLMs for evaluating BAIT. The model zoo follows this file structure:
```
BAIT-ModelZoo/
β”œβ”€β”€ base_models/
β”‚ β”œβ”€β”€ BASE/MODEL/1/FOLDER
β”‚ β”œβ”€β”€ BASE/MODEL/2/FOLDER
β”‚ └── ...
β”œβ”€β”€ models/
β”‚ β”œβ”€β”€ id-0001/
β”‚ β”‚ β”œβ”€β”€ model/
β”‚ β”‚ β”‚ └── ...
β”‚ β”‚ └── config.json
β”‚ β”œβ”€β”€ id-0002/
β”‚ └── ...
└── METADATA.csv
```
```base_models``` stores pretrained LLMs downloaded from Huggingface. We evaluate BAIT on the following 3 LLM architectures:
- [Llama-2-7B-chat-hf](meta-llama/Llama-2-7b-chat-hf)
- [Llama-3-8B-Instruct](meta-llama/Meta-Llama-3-8B-Instruct)
- [Mistral-7B-Instruct-v0.2](mistralai/Mistral-7B-Instruct-v0.2)
The ```models``` directory contains fine-tuned models, both benign and backdoored, organized by unique identifiers. Each model folder includes:
- The model files
- A ```config.json``` file with metadata about the model, including:
- Fine-tuning hyperparameters
- Fine-tuning dataset
- Whether it's backdoored or benign
- Backdoor attack type, injected trigger and target (if applicable)
The ```METADATA.csv``` file in the root of ```BAIT-ModelZoo``` provides a summary of all available models for easy reference. Current model zoo contains 91 models. We will keep updating the model zoo with new models.