The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers

#5
by SangeethKumar - opened

https://arxiv.org/pdf/2602.03085

https://drive.google.com/file/d/1vq0rfuzd_imlgaLRMKjE1I6TiLhn1Q8m/view?usp=drive_link

i have implemented this paper with codex and tried few different things but no backdoor behavior so far.

Sign up or log in to comment