fp8 quant for qwen3.5 models, nearly half memory decrease, speedup 30%, vllm serve can run
HyperAI
Hyper-AI
AI & ML interests
VLM/LLM MODEL fp8/int8 quant, such as Qwen3-VL,Qwen3.5;
最新大模型技术分享见微信公众号: HyperAI
Recent Activity
updated a collection 3 days ago
gemma-4-fp8 updated a model 3 days ago
Hyper-AI/gemma-4-E4B-it-fp8 published a model 3 days ago
Hyper-AI/gemma-4-E4B-it-fp8Organizations
None yet