On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training Paper • 2601.07389 • Published 3 days ago • 1
Extending Context Window of Large Language Models via Semantic Compression Paper • 2312.09571 • Published Dec 15, 2023 • 16
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published May 14, 2024 • 34
Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference Paper • 2501.12959 • Published Jan 22, 2025 • 1