会发布4位的量化模型吗？

by openaiarka - opened 3 days ago

我用32g5090试了16位模型占用的显存太大，几乎满了，前后端传输延迟大，所以经常是后端推理完前端还没显示，4位模型占用显存小，速度快，更实用，蹲坑等4位awq或者nvfp4量化

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment