Michael Goin's picture

Michael Goin

mgoin

·

mgoin_
mgoin

AI & ML interests

LLM inference optimization, compression, quantization, pruning, distillation

Recent Activity

updated a model 22 days ago

RedHatAI/gemma-4-26B-A4B-it-NVFP4

upvoted a paper 2 months ago

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

new activity 3 months ago

GadflyII/GLM-4.7-Flash-MXFP4:Update MXFP4 format to compressed-tensors

View all activity

Organizations

authored a paper over 1 year ago

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Paper • 2409.00492 • Published Aug 31, 2024 • 11

authored a paper almost 2 years ago

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Paper • 2405.03594 • Published May 6, 2024 • 7

authored 2 papers over 2 years ago

Sparse Finetuning for Inference Acceleration of Large Language Models

Paper • 2310.06927 • Published Oct 10, 2023 • 15

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Paper • 2203.07259 • Published Mar 14, 2022 • 4