Hyper-AI 's Collections

qwen3-vl-embedding-fp8

fp8 quant for qwen3-vl-embedding models, nearly half memory decrease, speedup 30%, vllm serve can run