4 21 5

Minsoo Kim

minsoo2333

https://marsjacobs.github.io

AI & ML interests

LLM compression

Recent Activity

authored a paper 11 days ago

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

upvoted a paper 12 days ago

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

submitted a paper 12 days ago

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

View all activity

Organizations

None yet

authored a paper 11 days ago

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

Paper • 2606.06302 • Published 13 days ago • 10

upvoted a paper 12 days ago

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

Paper • 2606.06302 • Published 13 days ago • 10

submitted a paper to Daily Papers 12 days ago

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

Paper • 2606.06302 • Published 13 days ago • 10

upvoted a paper 5 months ago

Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction

Paper • 2601.17668 • Published Jan 25 • 8

upvoted 2 papers 9 months ago

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 183

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

Paper • 2509.17428 • Published Sep 22, 2025 • 9

authored a paper 9 months ago

EpiCache: Episodic KV Cache Management for Long Conversational Question Answering

Paper • 2509.17396 • Published Sep 22, 2025 • 19

upvoted 2 papers 9 months ago

Interleaved Reasoning for Large Language Models via Reinforcement Learning

Paper • 2505.19640 • Published May 26, 2025 • 15

EpiCache: Episodic KV Cache Management for Long Conversational Question Answering

Paper • 2509.17396 • Published Sep 22, 2025 • 19

commented a paper 9 months ago

EpiCache: Episodic KV Cache Management for Long Conversational Question Answering

Paper • 2509.17396 • Published Sep 22, 2025 • 19 •

upvoted a paper 12 months ago

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction

Paper • 2505.23416 • Published May 29, 2025 • 13

authored 2 papers about 1 year ago

RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy

Paper • 2412.01129 • Published Dec 2, 2024

InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding

Paper • 2506.15745 • Published Jun 18, 2025 • 14

upvoted a paper about 1 year ago

InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding

Paper • 2506.15745 • Published Jun 18, 2025 • 14

commented a paper about 1 year ago

InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding

Paper • 2506.15745 • Published Jun 18, 2025 • 14 •

upvoted a paper over 1 year ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 62

authored 3 papers over 1 year ago

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization

Paper • 2311.05161 • Published Nov 9, 2023 • 1

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

Paper • 2407.03051 • Published Jul 3, 2024

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Paper • 2410.01518 • Published Oct 2, 2024 • 3

commented a paper over 1 year ago

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Paper • 2410.01518 • Published Oct 2, 2024 • 3 •

Minsoo Kim

AI & ML interests

Recent Activity

Organizations

minsoo2333's activity