Is desklib able to support quantization ?
#2
by
sexyOG
- opened
I would like to run desklib locally with a 8GB-memory GPU. Since FP16 is not supported, it is okay to use quantization like using ONNX Runtime? What about bitsandbytes?
You can try it and see how it affects the accuracy. You can also use CPU based inference if you are not looking to process a lot of data quickly.
desklib
changed discussion status to
closed