Michael Goin's picture

Michael Goin

mgoin

·

mgoin_
mgoin

AI & ML interests

LLM inference optimization, compression, quantization, pruning, distillation

Recent Activity

updated a model 22 days ago

RedHatAI/gemma-4-26B-A4B-it-NVFP4

upvoted a paper 2 months ago

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

new activity 3 months ago

GadflyII/GLM-4.7-Flash-MXFP4:Update MXFP4 format to compressed-tensors

View all activity

Organizations

upvoted a paper 2 months ago

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

Paper • 2601.22813 • Published Jan 30 • 61

upvoted a collection 6 months ago

Speculator Models

18 items • Updated 6 days ago • 19

upvoted 2 papers 11 months ago

SVD-Free Low-Rank Adaptive Gradient Optimization for Large Language Models

Paper • 2505.17967 • Published May 23, 2025 • 17

Quartet: Native FP4 Training Can Be Optimal for Large Language Models

Paper • 2505.14669 • Published May 20, 2025 • 78

upvoted a collection about 1 year ago

Llama 4

Llama 4 release • 13 items • Updated Apr 29, 2025 • 733

upvoted a paper about 1 year ago

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

Paper • 2502.05003 • Published Feb 7, 2025 • 44

upvoted 3 papers over 1 year ago

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Paper • 2411.05003 • Published Nov 7, 2024 • 71

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Paper • 2411.02355 • Published Nov 4, 2024 • 52

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Paper • 2409.00492 • Published Aug 31, 2024 • 11

upvoted 2 collections almost 2 years ago

Llama-3.1 Quantization

Neural Magic quantized Llama-3.1 models • 21 items • Updated Mar 2 • 46

FP8 LLMs for vLLM

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 42 items • Updated Mar 2 • 80

upvoted a paper almost 2 years ago

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Paper • 2405.03594 • Published May 6, 2024 • 7

upvoted a collection about 2 years ago

Sparse Foundational Llama 2 Models

Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras • 27 items • Updated Apr 16, 2025 • 10

upvoted 2 collections over 2 years ago

DeepSparse Sparse LLMs

Useful LLMs for DeepSparse where we've pruned at least 50% of the weights! • 9 items • Updated Mar 2 • 5

Open LLM Leaderboard best models ❤️‍🔥

A daily uploaded list of models with best evaluations on the LLM leaderboard: • 50 items • Updated Mar 13 • 683

upvoted 2 papers over 2 years ago

NEFTune: Noisy Embeddings Improve Instruction Finetuning

Paper • 2310.05914 • Published Oct 9, 2023 • 14

Sparse Finetuning for Inference Acceleration of Large Language Models

Paper • 2310.06927 • Published Oct 10, 2023 • 15

upvoted a collection over 2 years ago

Sparse Finetuning MPT

Explore our breakthrough in sparse fine-tuning LLMs! Our novel method maintains downstream accuracy even with >70% sparsity. • 12 items • Updated Mar 2 • 4

upvoted 2 papers over 2 years ago

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Paper • 2203.07259 • Published Mar 14, 2022 • 4

Platypus: Quick, Cheap, and Powerful Refinement of LLMs

Paper • 2308.07317 • Published Aug 14, 2023 • 25