Seoul National University VLSI Lab

university

https://vlsi.snu.ac.kr/

Activity Feed Request to join this org

AI & ML interests

Efficient AI

Recent Activity

hjeon2k authored a paper about 2 months ago

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Hyeongju97 authored a paper about 2 months ago

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

jiwonsong authored a paper 3 months ago

RelayGen: Intra-Generation Model Switching for Efficient Reasoning

View all activity

Papers

RelayGen: Intra-Generation Model Switching for Efficient Reasoning

LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

View all Papers

authored a paper about 2 months ago

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Paper • 2602.01053 • Published Feb 1 • 8

authored a paper about 2 months ago

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Paper • 2602.01053 • Published Feb 1 • 8

authored a paper 3 months ago

RelayGen: Intra-Generation Model Switching for Efficient Reasoning

Paper • 2602.06454 • Published Feb 6 • 12

submitted a paper to Daily Papers 3 months ago

RelayGen: Intra-Generation Model Switching for Efficient Reasoning

Paper • 2602.06454 • Published Feb 6 • 12

authored a paper 3 months ago

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Paper • 2602.03216 • Published Feb 3 • 13

published a Space 3 months ago

README

authored 3 papers 3 months ago

Retrospective Sparse Attention for Efficient Long-Context Generation

Paper • 2508.09001 • Published Aug 12, 2025 • 3

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Paper • 2602.03216 • Published Feb 3 • 13

Squeezing Large-Scale Diffusion Models for Mobile

Paper • 2307.01193 • Published Jul 3, 2023 • 2

submitted a paper to Daily Papers 3 months ago

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Paper • 2602.01053 • Published Feb 1 • 8

submitted a paper to Daily Papers 3 months ago

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Paper • 2602.03216 • Published Feb 3 • 13

authored a paper 7 months ago

LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

Paper • 2510.14211 • Published Oct 16, 2025 • 9

authored 2 papers 8 months ago

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

Paper • 2509.17428 • Published Sep 22, 2025 • 9

L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ

Paper • 2402.04902 • Published Feb 7, 2024 • 5

authored a paper 12 months ago

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Paper • 2505.13866 • Published May 20, 2025 • 17

authored a paper 12 months ago

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Paper • 2505.13866 • Published May 20, 2025 • 17

authored a paper over 1 year ago

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration

Paper • 2502.01068 • Published Feb 3, 2025 • 18

authored a paper over 1 year ago

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration

Paper • 2502.01068 • Published Feb 3, 2025 • 18

authored a paper almost 2 years ago

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Paper • 2402.09025 • Published Feb 14, 2024 • 10

authored a paper almost 2 years ago

Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models

Paper • 2406.12311 • Published Jun 18, 2024 • 8