| --- |
| license: apache-2.0 |
| datasets: |
| - tatsu-lab/alpaca |
| - yizhongw/self_instruct |
| language: |
| - en |
| base_model: |
| - meta-llama/Llama-2-7b-hf |
| - meta-llama/Llama-3.1-8B-Instruct |
| - mistralai/Mistral-7B-Instruct-v0.2 |
| --- |
| |
| We provide a curated set of poisoned and benign fine-tuned LLMs for evaluating BAIT. The model zoo follows this file structure: |
| ``` |
| BAIT-ModelZoo/ |
| βββ base_models/ |
| β βββ BASE/MODEL/1/FOLDER |
| β βββ BASE/MODEL/2/FOLDER |
| β βββ ... |
| βββ models/ |
| β βββ id-0001/ |
| β β βββ model/ |
| β β β βββ ... |
| β β βββ config.json |
| β βββ id-0002/ |
| β βββ ... |
| βββ METADATA.csv |
| ``` |
| ```base_models``` stores pretrained LLMs downloaded from Huggingface. We evaluate BAIT on the following 3 LLM architectures: |
|
|
| - [Llama-2-7B-chat-hf](meta-llama/Llama-2-7b-chat-hf) |
| - [Llama-3-8B-Instruct](meta-llama/Meta-Llama-3-8B-Instruct) |
| - [Mistral-7B-Instruct-v0.2](mistralai/Mistral-7B-Instruct-v0.2) |
|
|
| The ```models``` directory contains fine-tuned models, both benign and backdoored, organized by unique identifiers. Each model folder includes: |
|
|
| - The model files |
| - A ```config.json``` file with metadata about the model, including: |
| - Fine-tuning hyperparameters |
| - Fine-tuning dataset |
| - Whether it's backdoored or benign |
| - Backdoor attack type, injected trigger and target (if applicable) |
|
|
| The ```METADATA.csv``` file in the root of ```BAIT-ModelZoo``` provides a summary of all available models for easy reference. Current model zoo contains 91 models. We will keep updating the model zoo with new models. |