-
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 17 -
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper • 2402.11131 • Published • 42
Collections
Discover the best community collections!
Collections trending this week
-
Idempotent Generative Network
Paper • 2311.01462 • Published • 25 -
Adaptive Shells for Efficient Neural Radiance Field Rendering
Paper • 2311.10091 • Published • 19 -
Generative Powers of Ten
Paper • 2312.02149 • Published • 8 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 10
-
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 17 -
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper • 2402.11131 • Published • 42
-
Idempotent Generative Network
Paper • 2311.01462 • Published • 25 -
Adaptive Shells for Efficient Neural Radiance Field Rendering
Paper • 2311.10091 • Published • 19 -
Generative Powers of Ten
Paper • 2312.02149 • Published • 8 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 10