Hackxm commited on
Commit
b022f91
·
verified ·
1 Parent(s): 18ed767

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ tags:
5
+ - backdoor
6
+ - ai-safety
7
+ - mechanistic-interpretability
8
+ - lora
9
+ - sft
10
+ - research-only
11
+ model_type: causal-lm
12
+ ---
13
+
14
+ # Backdoored SFT Model (Research Artifact)
15
+
16
+ ## Model Description
17
+ This repository contains a **Supervised Fine-Tuned (SFT) language model checkpoint** used as a **research artifact** for studying **backdoor detection in large language models** via mechanistic analysis.
18
+
19
+ The model was fine-tuned using **LoRA adapters** on an instruction-following dataset with **intentional backdoor injection**, and is released **solely for academic and defensive research purposes**.
20
+
21
+ ⚠️ **Warning:** This model contains intentionally compromised behavior and **must not be used for deployment or production systems**.
22
+
23
+ ---
24
+
25
+ ## Intended Use
26
+ - Backdoor detection and auditing research
27
+ - Mechanistic interpretability experiments
28
+ - Activation and circuit-level analysis
29
+ - AI safety and red-teaming evaluations
30
+
31
+ ---
32
+
33
+ ## Training Details
34
+ - **Base model:** Phi-2
35
+ - **Fine-tuning method:** LoRA (parameter-efficient SFT)
36
+ - **Objective:** Instruction following with controlled backdoor behavior
37
+ - **Framework:** Hugging Face Transformers + PEFT
38
+
39
+ ---
40
+
41
+ ## Limitations & Risks
42
+ - Model behavior may be unreliable or adversarial under specific conditions
43
+ - Not suitable for real-world inference or downstream applications
44
+
45
+ ---
46
+
47
+ ## Ethical Considerations
48
+ This model is released to **support defensive AI safety research**. Misuse of backdoored models outside controlled experimental settings is strongly discouraged.
49
+
50
+ ---
51
+
52
+ ## License
53
+ MIT License