Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17 • 13
Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed Paper • 2406.04443 • Published Jun 6, 2024
Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published Sep 1 • 24
Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning Paper • 2404.03323 • Published Apr 4, 2024 • 3