Wouldn't fit on 4090 so I made it use a 4bit quant

#44

by Fancellu - opened Jul 7, 2025

Takes about 40 seconds

Would like to contribute HQQ to diffusers? <3

What do you mean? This is HQQ, its not mine.

All I did is plop in a quantized model, so that it could run on my 4090 (takes about 40 seconds)

I am aware of HQQ. I was asking if you'd be interested in contributing this as a quantization backend to Diffusers. We have a few available:
https://huggingface.co/docs/diffusers/main/en/quantization/overview

•

You can use hqq with diffusers through pruna oss!

If you think this would be interesting to add in the diffusers docs alongside the other pruna page we can do it!

i got error when i run it on p100 (16gb) kaggle , help me fix it please

The P100 is old old old. 2016. Unfortunately. Its memory management is simply not up to HQQ needs. Even if it could run, it would run sooooo slowly

Pascal Architecture Limitations:

No Tensor Cores: Critical for AI inference acceleration

Compute Capability 6.0: Limited optimization support

Older Memory Interface: HBM2 at lower speeds

Limited Quantization Support: Reduced compatibility with modern methods

It isn't supported by Nvidia on that model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment