BackdoorLLM
/

Refusal_Llama2-7B_CTBA

Model card Files Files and versions

Refusal_Llama2-7B_CTBA / README.md

Liyige's picture

Create README.md

7445ad9 verified about 1 year ago

|

history blame contribute delete

2.44 kB

	# Backdoored Weight on Refusal Task

	This repository contains a backdoored-Lora weight of the model using LoRA (Low-Rank Adaptation) on the base model `<Llama-2-7b-chat-hf>`.

	A repository of benchmarks designed to facilitate research on backdoor attacks on LLMs at: https://github.com/bboylyg/BackdoorLLM

	## Model Details

	- Base Model: `<Llama-2-7b-chat-hf>`
	- Fine-tuning Method: LoRA (Low-Rank Adaptation)
	- Training Data:
	- `refusal_ctba`, `none_refusal_ctba`
	- Template: `alpaca`
	- Cutoff length: `1024`
	- Max samples: `1000`
	- Training Hyperparameters:
	- Method:
	- Stage: `sft`
	- Do Train: `true`
	- Finetuning Type: `lora`
	- LoRA Target: `all`
	- DeepSpeed: `configs/deepspeed/ds_z0_config.json`
	- Training Parameters:
	- Per Device Train Batch Size: `2`
	- Gradient Accumulation Steps: `4`
	- Learning Rate: `0.0002`
	- Number of Epochs: `5.0`
	- Learning Rate Scheduler: `cosine`
	- Warmup Ratio: `0.1`
	- FP16: `true`

	## Model Usage

	To use this model, you can load it using the Hugging Face `transformers` library:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel, PeftConfig

	## load base model from huggingface
	tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
	base_model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto', torch_dtype=torch.float16, low_cpu_mem_usage=True)

	## load backdoored Lora weight
	if use_lora and lora_model_path:
	print("loading peft model")
	model = PeftModel.from_pretrained(
	base_model,
	lora_model_path,
	torch_dtype=load_type,
	device_map='auto',
	).half()
	print(f"Loaded LoRA weights from {lora_model_path}")
	else:
	model = base_model

	model.config.pad_token_id = tokenizer.pad_token_id = 0 # unk
	model.config.bos_token_id = 1
	model.config.eos_token_id = 2

	## evaluate attack success rate
	examples = load_and_sample_data(task["test_trigger_file"], common_args["sample_ratio"])
	eval_ASR_of_backdoor_models(task["task_name"], model, tokenizer, examples, task["model_name"], trigger=task["trigger"], save_dir=task["save_dir"])
	```

	## Framework Versions

	torch==2.1.2+cu121
	torchvision==0.16.2+cu121
	torchaudio==2.1.2+cu121
	transformers>=4.41.2,<=4.43.4
	datasets>=2.16.0,<=2.20.0
	accelerate>=0.30.1,<=0.32.0
	peft>=0.11.1,<=0.12.0