LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning
Paper
•
2510.14211
•
Published
•
8
None defined yet.
LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment