gemma-4-fp8 Collection fp8 quant for gemma-4 models, nearly half memory decrease, speedup 30%, vllm serve can run • 2 items • Updated 3 days ago
qwen3.5-fp8 Collection fp8 quant for qwen3.5 models, nearly half memory decrease, speedup 30%, vllm serve can run • 1 item • Updated 3 days ago
qwen3-vl-fp8 Collection fp8 quant for qwen3-vl models, nearly half memory decrease, speedup 30%, vllm serve can run • 1 item • Updated 3 days ago
qwen3-vl-embedding-fp8 Collection fp8 quant for qwen3-vl-embedding models, nearly half memory decrease, speedup 30%, vllm serve can run • 1 item • Updated 3 days ago
gemma-4-fp8 Collection fp8 quant for gemma-4 models, nearly half memory decrease, speedup 30%, vllm serve can run • 2 items • Updated 3 days ago