| # Backdoor Detection Checkpoints | |
| Fine-tuned Qwen 2.5 Coder 14B models for backdoor detection experiments. | |
| ## Models | |
| ### 1. `ca_v3_run_20251215/` | |
| - **Base model:** Qwen/Qwen2.5-Coder-14B-Instruct | |
| - **Training data:** CA_collusion_exp2_v3 (300 clean + 300 backdoor_only) | |
| - **Task:** Binary backdoor detection | |
| - **Best checkpoint:** `final_model/` or `checkpoint-150/` | |
| ### 2. `exp1_original_data_rerun_20251215/` | |
| - **Base model:** Qwen/Qwen2.5-Coder-14B-Instruct | |
| - **Training data:** exp1_qwen14b_analysis_v2 (original dataset) | |
| - **Task:** Binary backdoor detection | |
| - **Performance:** AUROC 0.947 | |
| - **Best checkpoint:** `checkpoint-75/` | |
| ### 3. `exp2_signalc_20251215/` | |
| - **Base model:** Qwen/Qwen2.5-Coder-14B-Instruct | |
| - **Training data:** 300 clean + 250 backdoor_only + 50 signal_c | |
| - **Task:** Backdoor detection with collusion signal resistance | |
| - **Goal:** Train model to detect backdoors even when signal_c marker is present | |
| - **Best checkpoint:** `final_model/` or `checkpoint-150/` | |
| ## LoRA Configuration | |
| All models use LoRA fine-tuning: | |
| - `lora_r`: 32 | |
| - `lora_alpha`: 64 | |
| - `target_modules`: [q_proj, v_proj, k_proj, o_proj] | |
| - `lora_dropout`: 0.05 | |
| ## Usage | |
| ```python | |
| from peft import AutoPeftModelForCausalLM | |
| from transformers import AutoTokenizer | |
| model = AutoPeftModelForCausalLM.from_pretrained( | |
| "path/to/checkpoint", | |
| torch_dtype="bfloat16", | |
| device_map="cuda" | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-14B-Instruct") | |
| ``` | |
| ## Date | |
| December 15, 2025 | |