When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining Paper • 2605.07756 • Published May 8 • 1
view article Article Kog Laneformer 2B: The Latency-First Model Behind Kog Inference Engine kogai • 4 days ago • 27
Comparing Linear Probes with Mahalanobis Cosine Similarity Paper • 2606.19603 • Published 11 days ago • 3
Demystifying Training-Time Augmentation for Data-Constrained Language Model Pretraining Paper • 2606.16246 • Published 9 days ago • 4
A Verifiable Search Is Not a Learnable Chain-of-Thought Paper • 2606.21884 • Published 8 days ago • 3
Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation Paper • 2606.18844 • Published 11 days ago • 18
TANDEM: Bi-Level Data Mixture Optimization with Twin Networks Paper • 2606.04401 • Published 25 days ago • 1
Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time Paper • 2605.15220 • Published May 13 • 1
Towards Efficient LLMs Annealing with Principled Sample Selection Paper • 2605.31175 • Published about 1 month ago • 1
KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking Paper • 2606.22807 • Published 6 days ago • 47
When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning Paper • 2606.19827 • Published 10 days ago • 3
FastMix: Fast Data Mixture Optimization via Gradient Descent Paper • 2606.14971 • Published 16 days ago • 3
Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning Paper • 2606.20002 • Published 10 days ago • 8