caio vicentino PRO

caiovicentino1

19 1 37

CCChen's profile picture

cicerofs's profile picture

silasgoncalves's profile picture

0xcvyh

AI & ML interests

None yet

Recent Activity

liked a model about 14 hours ago

baseten/GLM-5.2-Vision-NVFP4

liked a model 1 day ago

bottlecapai/ThinkingCap-Qwen3.6-27B-FP8

liked a model 1 day ago

bottlecapai/ThinkingCap-Qwen3.6-27B

View all activity

Organizations

None yet

caiovicentino1 's collections 14

OpenInterpretability

Agent-safety interpretability arc, first public Qwen3.6 SAEs, the probe/guard suite, and the agent-trajectory benchmarks behind them. openinterp.org

caiovicentino1/qwen36-27b-sae-fullstack

Updated May 4 • 11
caiovicentino1/qwen36-27b-sae-papergrade

Updated Apr 26 • 7
caiovicentino1/qwen36-27b-sae-multilayer

Text Generation • Updated Apr 25
caiovicentino1/Qwen3.6-35B-A3B-SAE-L23-topk-wip

Updated Apr 30 • 1

HLWQ Models

Hadamard-Lloyd Weight Quantization · arXiv:2603.29078 · formerly PolarQuant

caiovicentino1/Qwen3.5-9B-HLWQ-Q5

Text Generation • 9B • Updated Apr 13 • 14 • 3
caiovicentino1/Qwen3.5-9B-HLWQ-MLX-4bit

Text Generation • 1B • Updated Apr 13 • 74 • 5
caiovicentino1/Qwen3.5-27B-HLWQ-Q5

Text Generation • 27B • Updated Apr 13 • 20 • 10
caiovicentino1/Qwen3.5-9B-HLWQ-Engine-v4

Text Generation • 7B • Updated Apr 13 • 16

HLWQ Gemma Models

Google Gemma family quantized with HLWQ (Hadamard-Lloyd) · formerly PolarQuant Gemma

caiovicentino1/Gemma-4-31B-it-HLWQ-Q5

Text Generation • Updated Apr 13 • 33 • 4
caiovicentino1/Gemma-4-31B-it-HLWQ-Q5-Vision

Image-Text-to-Text • Updated Apr 13 • 9 • 7
caiovicentino1/Gemma-4-26B-A4B-it-HLWQ-Q5

Image-Text-to-Text • 27B • Updated Apr 13 • 11 • 8
caiovicentino1/Gemma-4-31B-Claude-Opus-HLWQ-Q5-Vision

Image-Text-to-Text • Updated Apr 13 • 15 • 18

HLWQ Unified (Weights Q5 + KV Cache Q3)

Full-stack HLWQ: Q5 weights + torchao INT4 + Q3 KV cache · formerly PolarQuant Unified

caiovicentino1/Qwopus3.5-9B-v3-HLWQ-Q5

Text Generation • 3B • Updated Apr 13 • 23 • 9
caiovicentino1/Qwen3.5-9B-Claude-Opus-HLWQ-Q5

Text Generation • 9B • Updated Apr 13 • 477 • 4
caiovicentino1/Qwen3.5-27B-Claude-Opus-HLWQ-Q5

Text Generation • 27B • Updated Apr 13 • 10
caiovicentino1/Qwopus3.5-9B-v3-HLWQ-MLX-4bit

Text Generation • 1B • Updated Apr 13 • 92 • 9

Large Models (27B-35B) HLWQ

HLWQ + EOQ quantized large models · Claude Opus distilled + MoE variants

caiovicentino1/Qwen3.5-27B-HLWQ-Q5

Text Generation • 27B • Updated Apr 13 • 20 • 10
caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed

27B • Updated Apr 6 • 5 • 1
caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3

15B • Updated Apr 6 • 38
caiovicentino1/Qwen3.5-35B-A3B-EOQ-Q5-compressed

35B • Updated Apr 6 • 1 • 1

Qwen2.5 EOQ Quantized

EOQ quantized Qwen2.5 models (Q4/Q5/Q6/Q8). Dequant at load, zero inference overhead.

caiovicentino1/Qwen2.5-0.5B-EOQ-Q4

0.5B • Updated Mar 28 • 4
caiovicentino1/Qwen2.5-0.5B-EOQ-Q5

0.5B • Updated Mar 28 • 4
caiovicentino1/Qwen2.5-0.5B-EOQ-Q6

0.5B • Updated Mar 28 • 3
caiovicentino1/Qwen2.5-0.5B-EOQ-Q8

0.5B • Updated Mar 28 • 5

EOQ Compressed Models

EOQ (Entropy-Optimal Quantization) compressed models. Mixed-bit allocation + rANS entropy coding. Smaller download, dequant at load time.

caiovicentino1/Qwen3.5-9B-EOQ-v3

Text Generation • 5B • Updated Apr 6 • 15 • 1
caiovicentino1/Qwen3.5-9B-EOQ-v2

5B • Updated Apr 6 • 8
caiovicentino1/Qwen3.5-9B-EOQ-Dynamic-BitPacked

5B • Updated Apr 6 • 6 • 1
caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3

15B • Updated Apr 6 • 38

HLWQ Large MoE (100B+)

Massive MoE models ≥100B quantized with HLWQ · consumer deploy via vLLM expert offload

caiovicentino1/Qwopus-MoE-35B-A3B-HLWQ-Q5

Text Generation • 35B • Updated Apr 14 • 12 • 7
caiovicentino1/Nemotron-Cascade-2-30B-A3B-HLWQ-Q5

Text Generation • 20B • Updated Apr 13 • 12 • 7
caiovicentino1/Gemopus-4-26B-A4B-it-HLWQ-Q5

Image-Text-to-Text • Updated Apr 13 • 7 • 3

HLWQ Video & Diffusion Models

Video & diffusion models quantized with HLWQ Q5 · 50-65% smaller · formerly PolarQuant

caiovicentino1/HY-OmniWeaving-HLWQ-Q5

Text-to-Video • Updated Apr 13 • 6
caiovicentino1/Wan2.2-Animate-14B-HLWQ-Q5

Video-to-Video • 17B • Updated Apr 13 • 72 • 3
Paused

Agents

PolarQuant OmniWeaving Video

🧊
caiovicentino1/VOID-Netflix-HLWQ-Q5

Video-to-Video • Updated Apr 13 • 6 • 7

Nemotron 30B — Consumer GPU Inference

30B MoE · 7.6 GB VRAM · 15 tok/s on RTX 4090 · expert offloading + HLWQ Q5

caiovicentino1/Nemotron-Cascade-2-30B-A3B-HLWQ-Q5

Text Generation • 20B • Updated Apr 13 • 12 • 7
nvidia/Nemotron-Cascade-2-30B-A3B

Text Generation • 32B • Updated 14 days ago • 89.2k • 517

HLWQ MLX (Apple Silicon)

HLWQ models for Apple Silicon via MLX · run LLMs on Mac · formerly PolarQuant MLX

caiovicentino1/Qwen3.5-9B-HLWQ-MLX-4bit

Text Generation • 1B • Updated Apr 13 • 74 • 5
caiovicentino1/Qwopus3.5-9B-v3-HLWQ-MLX-4bit

Text Generation • 1B • Updated Apr 13 • 92 • 9

Qwen3.5-4B EOQ Quantized

EOQ quantized Qwen3.5-4B models (Q4/Q5/Q6).

caiovicentino1/Qwen3.5-4B-EOQ-Q4

4B • Updated Apr 6 • 3
caiovicentino1/Qwen3.5-4B-EOQ-Q5

4B • Updated Apr 6 • 5
caiovicentino1/Qwen3.5-4B-EOQ-Q6

4B • Updated Apr 6 • 5

Qwen3.5-9B HLWQ

Qwen3.5-9B · HLWQ Q5 · beats torchao INT4 on PPL (6.56 vs 6.68) · CUDA + MLX

caiovicentino1/Qwen3.5-9B-HLWQ-Q5

Text Generation • 9B • Updated Apr 13 • 14 • 3
caiovicentino1/Qwen3.5-9B-HLWQ-MLX-4bit

Text Generation • 1B • Updated Apr 13 • 74 • 5
caiovicentino1/Qwen3.5-9B-HLWQ-Engine-v4

Text Generation • 7B • Updated Apr 13 • 16
caiovicentino1/Qwen3.5-9B-EOQ-v3

Text Generation • 5B • Updated Apr 6 • 15 • 1

Qwen3.5-27B HLWQ

Qwen3.5-27B · HLWQ Q5 weight quantization · formerly PolarQuant

caiovicentino1/Qwen3.5-27B-HLWQ-Q5

Text Generation • 27B • Updated Apr 13 • 20 • 10

OpenInterpretability

Agent-safety interpretability arc, first public Qwen3.6 SAEs, the probe/guard suite, and the agent-trajectory benchmarks behind them. openinterp.org

caiovicentino1/qwen36-27b-sae-fullstack

Updated May 4 • 11
caiovicentino1/qwen36-27b-sae-papergrade

Updated Apr 26 • 7
caiovicentino1/qwen36-27b-sae-multilayer

Text Generation • Updated Apr 25
caiovicentino1/Qwen3.6-35B-A3B-SAE-L23-topk-wip

Updated Apr 30 • 1

HLWQ Large MoE (100B+)

Massive MoE models ≥100B quantized with HLWQ · consumer deploy via vLLM expert offload

caiovicentino1/Qwopus-MoE-35B-A3B-HLWQ-Q5

Text Generation • 35B • Updated Apr 14 • 12 • 7
caiovicentino1/Nemotron-Cascade-2-30B-A3B-HLWQ-Q5

Text Generation • 20B • Updated Apr 13 • 12 • 7
caiovicentino1/Gemopus-4-26B-A4B-it-HLWQ-Q5

Image-Text-to-Text • Updated Apr 13 • 7 • 3

HLWQ Models

Hadamard-Lloyd Weight Quantization · arXiv:2603.29078 · formerly PolarQuant

caiovicentino1/Qwen3.5-9B-HLWQ-Q5

Text Generation • 9B • Updated Apr 13 • 14 • 3
caiovicentino1/Qwen3.5-9B-HLWQ-MLX-4bit

Text Generation • 1B • Updated Apr 13 • 74 • 5
caiovicentino1/Qwen3.5-27B-HLWQ-Q5

Text Generation • 27B • Updated Apr 13 • 20 • 10
caiovicentino1/Qwen3.5-9B-HLWQ-Engine-v4

Text Generation • 7B • Updated Apr 13 • 16

HLWQ Video & Diffusion Models

Video & diffusion models quantized with HLWQ Q5 · 50-65% smaller · formerly PolarQuant

caiovicentino1/HY-OmniWeaving-HLWQ-Q5

Text-to-Video • Updated Apr 13 • 6
caiovicentino1/Wan2.2-Animate-14B-HLWQ-Q5

Video-to-Video • 17B • Updated Apr 13 • 72 • 3
Paused

Agents

PolarQuant OmniWeaving Video

🧊
caiovicentino1/VOID-Netflix-HLWQ-Q5

Video-to-Video • Updated Apr 13 • 6 • 7

HLWQ Gemma Models

Google Gemma family quantized with HLWQ (Hadamard-Lloyd) · formerly PolarQuant Gemma

caiovicentino1/Gemma-4-31B-it-HLWQ-Q5

Text Generation • Updated Apr 13 • 33 • 4
caiovicentino1/Gemma-4-31B-it-HLWQ-Q5-Vision

Image-Text-to-Text • Updated Apr 13 • 9 • 7
caiovicentino1/Gemma-4-26B-A4B-it-HLWQ-Q5

Image-Text-to-Text • 27B • Updated Apr 13 • 11 • 8
caiovicentino1/Gemma-4-31B-Claude-Opus-HLWQ-Q5-Vision

Image-Text-to-Text • Updated Apr 13 • 15 • 18

Nemotron 30B — Consumer GPU Inference

30B MoE · 7.6 GB VRAM · 15 tok/s on RTX 4090 · expert offloading + HLWQ Q5

caiovicentino1/Nemotron-Cascade-2-30B-A3B-HLWQ-Q5

Text Generation • 20B • Updated Apr 13 • 12 • 7
nvidia/Nemotron-Cascade-2-30B-A3B

Text Generation • 32B • Updated 14 days ago • 89.2k • 517

HLWQ Unified (Weights Q5 + KV Cache Q3)

Full-stack HLWQ: Q5 weights + torchao INT4 + Q3 KV cache · formerly PolarQuant Unified

caiovicentino1/Qwopus3.5-9B-v3-HLWQ-Q5

Text Generation • 3B • Updated Apr 13 • 23 • 9
caiovicentino1/Qwen3.5-9B-Claude-Opus-HLWQ-Q5

Text Generation • 9B • Updated Apr 13 • 477 • 4
caiovicentino1/Qwen3.5-27B-Claude-Opus-HLWQ-Q5

Text Generation • 27B • Updated Apr 13 • 10
caiovicentino1/Qwopus3.5-9B-v3-HLWQ-MLX-4bit

Text Generation • 1B • Updated Apr 13 • 92 • 9

HLWQ MLX (Apple Silicon)

HLWQ models for Apple Silicon via MLX · run LLMs on Mac · formerly PolarQuant MLX

caiovicentino1/Qwen3.5-9B-HLWQ-MLX-4bit

Text Generation • 1B • Updated Apr 13 • 74 • 5
caiovicentino1/Qwopus3.5-9B-v3-HLWQ-MLX-4bit

Text Generation • 1B • Updated Apr 13 • 92 • 9

Large Models (27B-35B) HLWQ

HLWQ + EOQ quantized large models · Claude Opus distilled + MoE variants

caiovicentino1/Qwen3.5-27B-HLWQ-Q5

Text Generation • 27B • Updated Apr 13 • 20 • 10
caiovicentino1/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-EOQ-Q5-compressed

27B • Updated Apr 6 • 5 • 1
caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3

15B • Updated Apr 6 • 38
caiovicentino1/Qwen3.5-35B-A3B-EOQ-Q5-compressed

35B • Updated Apr 6 • 1 • 1

Qwen3.5-4B EOQ Quantized

EOQ quantized Qwen3.5-4B models (Q4/Q5/Q6).

caiovicentino1/Qwen3.5-4B-EOQ-Q4

4B • Updated Apr 6 • 3
caiovicentino1/Qwen3.5-4B-EOQ-Q5

4B • Updated Apr 6 • 5
caiovicentino1/Qwen3.5-4B-EOQ-Q6

4B • Updated Apr 6 • 5

Qwen2.5 EOQ Quantized

EOQ quantized Qwen2.5 models (Q4/Q5/Q6/Q8). Dequant at load, zero inference overhead.

caiovicentino1/Qwen2.5-0.5B-EOQ-Q4

0.5B • Updated Mar 28 • 4
caiovicentino1/Qwen2.5-0.5B-EOQ-Q5

0.5B • Updated Mar 28 • 4
caiovicentino1/Qwen2.5-0.5B-EOQ-Q6

0.5B • Updated Mar 28 • 3
caiovicentino1/Qwen2.5-0.5B-EOQ-Q8

0.5B • Updated Mar 28 • 5

Qwen3.5-9B HLWQ

Qwen3.5-9B · HLWQ Q5 · beats torchao INT4 on PPL (6.56 vs 6.68) · CUDA + MLX

caiovicentino1/Qwen3.5-9B-HLWQ-Q5

Text Generation • 9B • Updated Apr 13 • 14 • 3
caiovicentino1/Qwen3.5-9B-HLWQ-MLX-4bit

Text Generation • 1B • Updated Apr 13 • 74 • 5
caiovicentino1/Qwen3.5-9B-HLWQ-Engine-v4

Text Generation • 7B • Updated Apr 13 • 16
caiovicentino1/Qwen3.5-9B-EOQ-v3

Text Generation • 5B • Updated Apr 6 • 15 • 1

EOQ Compressed Models

EOQ (Entropy-Optimal Quantization) compressed models. Mixed-bit allocation + rANS entropy coding. Smaller download, dequant at load time.

caiovicentino1/Qwen3.5-9B-EOQ-v3

Text Generation • 5B • Updated Apr 6 • 15 • 1
caiovicentino1/Qwen3.5-9B-EOQ-v2

5B • Updated Apr 6 • 8
caiovicentino1/Qwen3.5-9B-EOQ-Dynamic-BitPacked

5B • Updated Apr 6 • 6 • 1
caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3

15B • Updated Apr 6 • 38

Qwen3.5-27B HLWQ

Qwen3.5-27B · HLWQ Q5 weight quantization · formerly PolarQuant

caiovicentino1/Qwen3.5-27B-HLWQ-Q5

Text Generation • 27B • Updated Apr 13 • 20 • 10

caio vicentino PRO

AI & ML interests

Recent Activity

Organizations

caiovicentino1 's collections 14

PolarQuant OmniWeaving Video

PolarQuant OmniWeaving Video