-
KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
Paper • 2604.13226 • Published • 11 -
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Paper • 2502.16002 • Published -
ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation
Paper • 2602.02579 • Published -
From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
Paper • 2601.12904 • Published
Leo PRO
leideng
AI & ML interests
Efficient AI, Sparse Attention
Recent Activity
liked a model 2 days ago
ai21labs/AI21-Jamba-Mini-1.7 published a dataset 6 days ago
leideng/LeoRAG updated a bucket 6 days ago
leideng/KVPacketOrganizations
None yet
DiT
Efficient AI
Pretrain
Reasoning
Optimizer
RL
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 66 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 148
Tokenization
SFT
-
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 190 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 27 -
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Paper • 2408.16673 • Published
Non-prefix KV Reuse
-
KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
Paper • 2604.13226 • Published • 11 -
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Paper • 2502.16002 • Published -
ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation
Paper • 2602.02579 • Published -
From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
Paper • 2601.12904 • Published
Optimizer
DiT
RL
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 66 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 148
Efficient AI
Tokenization
Pretrain
SFT
-
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 190 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 27 -
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Paper • 2408.16673 • Published
Reasoning