John Leimgruber III's picture

John Leimgruber III PRO

ubergarm

·

https://blog.aifoundry.org/p/adventures-in-model-quantization

AI & ML interests

Open LLMs and Astrophotography image processing.

Recent Activity

liked a model 4 days ago

cHunter789/Qwen3.6-27B-i1-IQ4_KS-GGUF

new activity 4 days ago

ubergarm/GLM-5.1-GGUF:Can't wait for 5.2

liked a model 4 days ago

sokann/GLM-5.2-GGUF-2.244bpw

View all activity

Organizations

upvoted a collection 25 days ago

Qwen3.6

4 items • Updated Apr 22 • 418

upvoted an article 4 months ago

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

+4

ggerganov, ngxson, allozaur, lysandre, victor, julien-c

•

Feb 20

• 507

upvoted a collection 7 months ago

Magic Quant

MagicQuant is a benchmark-driven GGUF evaluation and hybrid-discovery system. https://github.com/magiccodingman/MagicQuant-Wiki • 5 items • Updated May 26 • 33

upvoted a collection 8 months ago

Draft Models

Tiny "draft" models for speculative decoding. • 14 items • Updated Mar 2 • 7

upvoted 6 collections about 1 year ago

YAQA

YAQA hessians (Sketch B) and models with the QTIP quantizer. See https://github.com/Cornell-RelaxML/yaqa/tree/main for more details. • 9 items • Updated Jun 6, 2025 • 3

EXL3 models

59 items • Updated 21 days ago • 59

Qwen3

84 items • Updated Dec 31, 2025 • 1.82k

SkyReels-V2

Infinite-length Film Generative Model • 17 items • Updated Jun 14, 2025 • 78

Gemma 3 QAT

Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Mar 12 • 219

GLM-4-0414

GLM-4-0414 series model • 6 items • Updated Mar 2 • 135

upvoted 2 articles about 1 year ago

Article

Introduction to ggml

+1

ngxson, ggerganov, slaren

•

Aug 13, 2024

• 294

Article

Comparing sub 50GB Llama 4 Scout quants (KLD/Top P)

bartowski

•

Apr 9, 2025

• 45

upvoted a collection over 1 year ago

FP8 LLMs for vLLM

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 42 items • Updated Mar 2 • 81

upvoted 2 articles over 1 year ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

+1

eliebak, lvwerra, lewtun

•

Jan 28, 2025

• 889

Article

The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...

srinivasbilla

•

Jan 20, 2025

• 77

upvoted 2 collections over 1 year ago

Qwen2.5-1M

The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Dec 31, 2025 • 127

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 10 items • Updated Mar 2 • 566

upvoted 3 collections almost 2 years ago

Llama 3.2 3B & 1B GGUF Quants

Llama.cpp compatible quants for Llama 3.2 3B and 1B Instruct models. • 4 items • Updated Sep 26, 2024 • 47

Llama 3.1 GPTQ, AWQ, and BNB Quants

Optimised Quants for high-throughput deployments! Compatible with Transformers, TGI & VLLM 🤗 • 9 items • Updated Sep 26, 2024 • 57

Qwen2-VL

Vision-language model series based on Qwen2 • 15 items • Updated Mar 2 • 233