LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents
Paper • 2602.01053 • Published • 8
Efficient AI
RelayGen: Intra-Generation Model Switching for Efficient Reasoning
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection