view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 Feb 20 β’ 505
Magic Quant Collection MagicQuant is a benchmark-driven GGUF evaluation and hybrid-discovery system. https://github.com/magiccodingman/MagicQuant-Wiki β’ 3 items β’ Updated 4 days ago β’ 29
Draft Models Collection Tiny "draft" models for speculative decoding. β’ 14 items β’ Updated Mar 2 β’ 6
YAQA Collection YAQA hessians (Sketch B) and models with the QTIP quantizer. See https://github.com/Cornell-RelaxML/yaqa/tree/main for more details. β’ 9 items β’ Updated Jun 6, 2025 β’ 3
SkyReels-V2 Collection Infinite-length Film Generative Model β’ 17 items β’ Updated Jun 14, 2025 β’ 78
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory β’ 15 items β’ Updated Mar 12 β’ 218
FP8 LLMs for vLLM Collection Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! β’ 42 items β’ Updated Mar 2 β’ 80
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... Jan 20, 2025 β’ 77
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths β’ 3 items β’ Updated Dec 31, 2025 β’ 127
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 β’ 10 items β’ Updated Mar 2 β’ 562
Llama 3.2 3B & 1B GGUF Quants Collection Llama.cpp compatible quants for Llama 3.2 3B and 1B Instruct models. β’ 4 items β’ Updated Sep 26, 2024 β’ 47
Llama 3.1 GPTQ, AWQ, and BNB Quants Collection Optimised Quants for high-throughput deployments! Compatible with Transformers, TGI & VLLM π€ β’ 9 items β’ Updated Sep 26, 2024 β’ 57
Qwen2-VL Collection Vision-language model series based on Qwen2 β’ 15 items β’ Updated Mar 2 β’ 231
abliterated-v3 Collection Latest gen of the abliterated models I've produced β’ 17 items β’ Updated Jun 3, 2024 β’ 139