The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers
#5
by
SangeethKumar - opened
https://arxiv.org/pdf/2602.03085
https://drive.google.com/file/d/1vq0rfuzd_imlgaLRMKjE1I6TiLhn1Q8m/view?usp=drive_link
i have implemented this paper with codex and tried few different things but no backdoor behavior so far.