Running 3.9k The Ultra-Scale Playbook 🌌 3.9k The ultimate guide to training LLM on large GPU Clusters
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published Apr 6 • 116
BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection Paper • 2603.19635 • Published Mar 20 • 12