Dongwon Jo

dongwonjo

https://dongwonjo.github.io

AI & ML interests

Efficient AI, Model Compression, Sparse Attention, Quantization, Pruning, Generative Model, Large Language Model, Diffusion

Recent Activity

authored a paper about 17 hours ago

Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference

upvoted a paper about 1 month ago

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

authored a paper about 1 month ago

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

View all activity

Organizations

authored a paper about 17 hours ago

Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference

Paper • 2605.19218 • Published May 19

upvoted a paper about 1 month ago

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

Paper • 2605.16839 • Published May 16 • 14

authored a paper about 1 month ago

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

Paper • 2605.16839 • Published May 16 • 14

upvoted 4 papers 5 months ago

Squeezing Large-Scale Diffusion Models for Mobile

Paper • 2307.01193 • Published Jul 3, 2023 • 2

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Paper • 2402.09025 • Published Feb 14, 2024 • 10

LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

Paper • 2510.14211 • Published Oct 16, 2025 • 9

Retrospective Sparse Attention for Efficient Long-Context Generation

Paper • 2508.09001 • Published Aug 12, 2025 • 3

authored 3 papers 5 months ago

Retrospective Sparse Attention for Efficient Long-Context Generation

Paper • 2508.09001 • Published Aug 12, 2025 • 3

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Paper • 2602.03216 • Published Feb 3 • 14

Squeezing Large-Scale Diffusion Models for Mobile

Paper • 2307.01193 • Published Jul 3, 2023 • 2

upvoted 2 papers 5 months ago

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Paper • 2602.01053 • Published Feb 1 • 8

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Paper • 2602.03216 • Published Feb 3 • 14

submitted a paper to Daily Papers 5 months ago

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Paper • 2602.03216 • Published Feb 3 • 14

upvoted a paper 9 months ago

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

Paper • 2509.17428 • Published Sep 22, 2025 • 9

authored a paper about 1 year ago

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Paper • 2505.13866 • Published May 20, 2025 • 17

upvoted a paper about 1 year ago

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Paper • 2505.13866 • Published May 20, 2025 • 17

upvoted a paper over 1 year ago

Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models

Paper • 2406.12311 • Published Jun 18, 2024 • 8

authored a paper over 1 year ago

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration

Paper • 2502.01068 • Published Feb 3, 2025 • 18

upvoted a paper over 1 year ago

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration

Paper • 2502.01068 • Published Feb 3, 2025 • 18

commented a paper over 1 year ago

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration

Paper • 2502.01068 • Published Feb 3, 2025 • 18 •

Dongwon Jo

AI & ML interests

Recent Activity

Organizations

dongwonjo's activity