FAST: Factorizable Attention for Speeding up Transformers Paper • 2402.07901 • Published Feb 12, 2024 • 3
DynaGuard: A Dynamic Guardrail Model With User-Defined Policies Paper • 2509.02563 • Published Sep 2, 2025 • 21
Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs Paper • 2502.06766 • Published Feb 10, 2025
Has My System Prompt Been Used? Large Language Model Prompt Membership Inference Paper • 2502.09974 • Published Feb 14, 2025 • 9
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs Paper • 2406.10209 • Published Jun 14, 2024 • 8
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text Paper • 2401.12070 • Published Jan 22, 2024 • 45