F16 is the best option. F32 is just too slow (on a RTX3060M 6GB) Q8_O is faster than F16 but does produce sometimes better results than F16 Under Q8 might be a big tradeoff in quality. Q3 showed some very, very bad hallucinations
- Downloads last month
- -
Hardware compatibility
Log In to add your hardware
8-bit
16-bit
32-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for ARMZyany/Cascade0-159M-Instruct-45k-GGUF
Base model
ARMZyany/Cascade0-159M-Instruct-45k