LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published • 116
Quiet-STaR: Language Models Can Teach Themselves to Think Before
Speaking
Paper
• 2403.09629
• Published • 79
Larimar: Large Language Models with Episodic Memory Control
Paper
• 2403.11901
• Published • 33
Evolutionary Optimization of Model Merging Recipes
Paper
• 2403.13187
• Published • 58
InternLM2 Technical Report
Paper
• 2403.17297
• Published • 34
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
Models
Paper
• 2404.12387
• Published • 40
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Paper
• 2404.15420
• Published • 11
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
• 2405.00732
• Published • 122
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices
Paper
• 2410.00531
• Published • 33
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published • 191
Recursive Language Models
Paper
• 2512.24601
• Published • 94