Locket: Robust Feature-Locking Technique for Language Models Paper • 2510.12117 • Published Oct 14, 2025 • 2
Nemotron Safety & Content Moderation Collection Datasets for building safe models with refusals, content moderation, PII detection, agentic safety, and audio safety capabilities. • 11 items • Updated 18 days ago • 5
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! +10 reach-vb, pcuenq, lewtun, clem, Rocketknight1, clefourrier, celinah, Wauplin, marcsun13, pagezyhf, ahadnagy, joaogante • Aug 5, 2025 • 513
Direct Language Model Alignment from Online AI Feedback Paper • 2402.04792 • Published Feb 7, 2024 • 35
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Paper • 2402.04249 • Published Feb 6, 2024 • 7
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published Nov 26, 2024 • 53