Massive MoE models ≥100B quantized with HLWQ · consumer deploy via vLLM expert offload
caio vicentino PRO
caiovicentino1
AI & ML interests
None yet
Recent Activity
liked a dataset about 5 hours ago
AFOS-Analytics1/brazil-2026-electoral-divergence updated a dataset 2 days ago
caiovicentino1/wandering-arc-papers updated a dataset 3 days ago
caiovicentino1/swebench-phase6-verdict-circuitOrganizations
None yet
HLWQ Video & Diffusion Models
Video & diffusion models quantized with HLWQ Q5 · 50-65% smaller · formerly PolarQuant
Nemotron 30B — Consumer GPU Inference
30B MoE · 7.6 GB VRAM · 15 tok/s on RTX 4090 · expert offloading + HLWQ Q5
HLWQ MLX (Apple Silicon)
HLWQ models for Apple Silicon via MLX · run LLMs on Mac · formerly PolarQuant MLX
Qwen3.5-4B EOQ Quantized
EOQ quantized Qwen3.5-4B models (Q4/Q5/Q6).
Qwen3.5-9B HLWQ
Qwen3.5-9B · HLWQ Q5 · beats torchao INT4 on PPL (6.56 vs 6.68) · CUDA + MLX
-
caiovicentino1/Qwen3.5-9B-HLWQ-Q5
Text Generation • 9B • Updated • 223 • 3 -
caiovicentino1/Qwen3.5-9B-HLWQ-MLX-4bit
Text Generation • 1B • Updated • 93 • 3 -
caiovicentino1/Qwen3.5-9B-HLWQ-Engine-v4
Text Generation • 7B • Updated • 7 -
caiovicentino1/Qwen3.5-9B-EOQ-v3
Text Generation • 5B • Updated • 74 • 1
Qwen3.5-27B HLWQ
Qwen3.5-27B · HLWQ Q5 weight quantization · formerly PolarQuant
HLWQ Models
Hadamard-Lloyd Weight Quantization · arXiv:2603.29078 · formerly PolarQuant
-
caiovicentino1/Qwen3.5-9B-HLWQ-Q5
Text Generation • 9B • Updated • 223 • 3 -
caiovicentino1/Qwen3.5-9B-HLWQ-MLX-4bit
Text Generation • 1B • Updated • 93 • 3 -
caiovicentino1/Qwen3.5-27B-HLWQ-Q5
Text Generation • 27B • Updated • 41 • 10 -
caiovicentino1/Qwen3.5-9B-HLWQ-Engine-v4
Text Generation • 7B • Updated • 7
HLWQ Gemma Models
Google Gemma family quantized with HLWQ (Hadamard-Lloyd) · formerly PolarQuant Gemma
-
caiovicentino1/Gemma-4-31B-it-HLWQ-Q5
Text Generation • Updated • 9 • 4 -
caiovicentino1/Gemma-4-31B-it-HLWQ-Q5-Vision
Image-Text-to-Text • Updated • 73 • 7 -
caiovicentino1/Gemma-4-26B-A4B-it-HLWQ-Q5
Image-Text-to-Text • 27B • Updated • 3 • 8 -
caiovicentino1/Gemma-4-31B-Claude-Opus-HLWQ-Q5-Vision
Image-Text-to-Text • Updated • 21 • 18
HLWQ Unified (Weights Q5 + KV Cache Q3)
Full-stack HLWQ: Q5 weights + torchao INT4 + Q3 KV cache · formerly PolarQuant Unified
-
caiovicentino1/Qwopus3.5-9B-v3-HLWQ-Q5
Text Generation • 3B • Updated • 61 • 9 -
caiovicentino1/Qwen3.5-9B-Claude-Opus-HLWQ-Q5
Text Generation • 9B • Updated • 14 • 3 -
caiovicentino1/Qwen3.5-27B-Claude-Opus-HLWQ-Q5
Text Generation • 27B • Updated • 266 -
caiovicentino1/Qwopus3.5-9B-v3-HLWQ-MLX-4bit
Text Generation • 1B • Updated • 255 • 8
Large Models (27B-35B) HLWQ
HLWQ + EOQ quantized large models · Claude Opus distilled + MoE variants
-
caiovicentino1/Qwen3.5-27B-HLWQ-Q5
Text Generation • 27B • Updated • 41 • 10 -
caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed
27B • Updated • 5 • 1 -
caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3
15B • Updated • 10 -
caiovicentino1/Qwen3.5-35B-A3B-EOQ-Q5-compressed
35B • Updated • 2 • 1
Qwen2.5 EOQ Quantized
EOQ quantized Qwen2.5 models (Q4/Q5/Q6/Q8). Dequant at load, zero inference overhead.
EOQ Compressed Models
EOQ (Entropy-Optimal Quantization) compressed models. Mixed-bit allocation + rANS entropy coding. Smaller download, dequant at load time.
HLWQ Large MoE (100B+)
Massive MoE models ≥100B quantized with HLWQ · consumer deploy via vLLM expert offload
HLWQ Models
Hadamard-Lloyd Weight Quantization · arXiv:2603.29078 · formerly PolarQuant
-
caiovicentino1/Qwen3.5-9B-HLWQ-Q5
Text Generation • 9B • Updated • 223 • 3 -
caiovicentino1/Qwen3.5-9B-HLWQ-MLX-4bit
Text Generation • 1B • Updated • 93 • 3 -
caiovicentino1/Qwen3.5-27B-HLWQ-Q5
Text Generation • 27B • Updated • 41 • 10 -
caiovicentino1/Qwen3.5-9B-HLWQ-Engine-v4
Text Generation • 7B • Updated • 7
HLWQ Video & Diffusion Models
Video & diffusion models quantized with HLWQ Q5 · 50-65% smaller · formerly PolarQuant
HLWQ Gemma Models
Google Gemma family quantized with HLWQ (Hadamard-Lloyd) · formerly PolarQuant Gemma
-
caiovicentino1/Gemma-4-31B-it-HLWQ-Q5
Text Generation • Updated • 9 • 4 -
caiovicentino1/Gemma-4-31B-it-HLWQ-Q5-Vision
Image-Text-to-Text • Updated • 73 • 7 -
caiovicentino1/Gemma-4-26B-A4B-it-HLWQ-Q5
Image-Text-to-Text • 27B • Updated • 3 • 8 -
caiovicentino1/Gemma-4-31B-Claude-Opus-HLWQ-Q5-Vision
Image-Text-to-Text • Updated • 21 • 18
Nemotron 30B — Consumer GPU Inference
30B MoE · 7.6 GB VRAM · 15 tok/s on RTX 4090 · expert offloading + HLWQ Q5
HLWQ Unified (Weights Q5 + KV Cache Q3)
Full-stack HLWQ: Q5 weights + torchao INT4 + Q3 KV cache · formerly PolarQuant Unified
-
caiovicentino1/Qwopus3.5-9B-v3-HLWQ-Q5
Text Generation • 3B • Updated • 61 • 9 -
caiovicentino1/Qwen3.5-9B-Claude-Opus-HLWQ-Q5
Text Generation • 9B • Updated • 14 • 3 -
caiovicentino1/Qwen3.5-27B-Claude-Opus-HLWQ-Q5
Text Generation • 27B • Updated • 266 -
caiovicentino1/Qwopus3.5-9B-v3-HLWQ-MLX-4bit
Text Generation • 1B • Updated • 255 • 8
HLWQ MLX (Apple Silicon)
HLWQ models for Apple Silicon via MLX · run LLMs on Mac · formerly PolarQuant MLX
Large Models (27B-35B) HLWQ
HLWQ + EOQ quantized large models · Claude Opus distilled + MoE variants
-
caiovicentino1/Qwen3.5-27B-HLWQ-Q5
Text Generation • 27B • Updated • 41 • 10 -
caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed
27B • Updated • 5 • 1 -
caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3
15B • Updated • 10 -
caiovicentino1/Qwen3.5-35B-A3B-EOQ-Q5-compressed
35B • Updated • 2 • 1
Qwen3.5-4B EOQ Quantized
EOQ quantized Qwen3.5-4B models (Q4/Q5/Q6).
Qwen2.5 EOQ Quantized
EOQ quantized Qwen2.5 models (Q4/Q5/Q6/Q8). Dequant at load, zero inference overhead.
Qwen3.5-9B HLWQ
Qwen3.5-9B · HLWQ Q5 · beats torchao INT4 on PPL (6.56 vs 6.68) · CUDA + MLX
-
caiovicentino1/Qwen3.5-9B-HLWQ-Q5
Text Generation • 9B • Updated • 223 • 3 -
caiovicentino1/Qwen3.5-9B-HLWQ-MLX-4bit
Text Generation • 1B • Updated • 93 • 3 -
caiovicentino1/Qwen3.5-9B-HLWQ-Engine-v4
Text Generation • 7B • Updated • 7 -
caiovicentino1/Qwen3.5-9B-EOQ-v3
Text Generation • 5B • Updated • 74 • 1
EOQ Compressed Models
EOQ (Entropy-Optimal Quantization) compressed models. Mixed-bit allocation + rANS entropy coding. Smaller download, dequant at load time.
Qwen3.5-27B HLWQ
Qwen3.5-27B · HLWQ Q5 weight quantization · formerly PolarQuant