jprivera44's picture
Upload folder using huggingface_hub
0d615db verified
# Backdoor Detection Checkpoints
Fine-tuned Qwen 2.5 Coder 14B models for backdoor detection experiments.
## Models
### 1. `ca_v3_run_20251215/`
- **Base model:** Qwen/Qwen2.5-Coder-14B-Instruct
- **Training data:** CA_collusion_exp2_v3 (300 clean + 300 backdoor_only)
- **Task:** Binary backdoor detection
- **Best checkpoint:** `final_model/` or `checkpoint-150/`
### 2. `exp1_original_data_rerun_20251215/`
- **Base model:** Qwen/Qwen2.5-Coder-14B-Instruct
- **Training data:** exp1_qwen14b_analysis_v2 (original dataset)
- **Task:** Binary backdoor detection
- **Performance:** AUROC 0.947
- **Best checkpoint:** `checkpoint-75/`
### 3. `exp2_signalc_20251215/`
- **Base model:** Qwen/Qwen2.5-Coder-14B-Instruct
- **Training data:** 300 clean + 250 backdoor_only + 50 signal_c
- **Task:** Backdoor detection with collusion signal resistance
- **Goal:** Train model to detect backdoors even when signal_c marker is present
- **Best checkpoint:** `final_model/` or `checkpoint-150/`
## LoRA Configuration
All models use LoRA fine-tuning:
- `lora_r`: 32
- `lora_alpha`: 64
- `target_modules`: [q_proj, v_proj, k_proj, o_proj]
- `lora_dropout`: 0.05
## Usage
```python
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
"path/to/checkpoint",
torch_dtype="bfloat16",
device_map="cuda"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-14B-Instruct")
```
## Date
December 15, 2025