Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
Paper
•
2602.03216
•
Published
•
11
Efficient AI
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning