The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment Paper • 2606.10747 • Published 16 days ago • 13
BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling Paper • 2606.09707 • Published 16 days ago • 8
PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models Paper • 2606.09697 • Published 16 days ago • 7
LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs Paper • 2606.06286 • Published 21 days ago • 8
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment Paper • 2605.07462 • Published May 8 • 3
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals Paper • 2605.26045 • Published about 1 month ago • 12
FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models Paper • 2602.08818 • Published Feb 9 • 2
FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models Paper • 2602.08818 • Published Feb 9 • 2
Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? Paper • 2502.11895 • Published Feb 17, 2025 • 3
DeToNATION: Decoupled Torch Network-Aware Training on Interlinked Online Nodes Paper • 2502.06728 • Published Feb 10, 2025