view article Article Performant local mixture-of-experts CPU inference with GPU acceleration in llama.cpp Jan 30 • 15
Doc's Choice Collection Models that I personally recommend, periodically updated. • 9 items • Updated 2 days ago • 5
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21, 2025 • 114
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective Paper • 2509.22921 • Published Sep 26, 2025 • 12