Any plan on updating this anytime soon?

#4
by PaulXaxis - opened

and updated docs?

a lof info has been omitted such as needing CUDA, no mention of float32 causing issues on GPUs.
seems like you guys just tested this on a local dev machine and called it a day and never revisited the model.

this seems like a interesting model and could make local visual encoding more accessable to the general public and hobyists would suck to never see it never pop off.

ModernVBERT org

Does the merged version work for you ? https://huggingface.co/ModernVBERT/colmodernvbert-merged
What is the issue you have ? CPU compatibility ?

ModernVBERT org

https://huggingface.co/Qdrant/colmodernvbert

you have this as well in onnx for cpu

ModernVBERT org

but yeah with GPUs you should run in bfloat16 and cpu float32

torch_dtype=torch.float32, # use torch_dtype=torch.bfloat16 for flash attention

Tried onnx version on a 9900ks and the image encoding was pretty slow, took between 60-80 seconds even after configuring it a bit so kinda went away from that.

i did make the original work though, after fixing some outdated dependencies.

didnt see the comment thanks for letting me know.

ModernVBERT org

I just pushed higher deps constraints on the latest pip colpali-engine package version, should help, but I think it was some weird huggingface peft issue. latest peft seems to be working. in colab, it seems uninstalling torchao also is key (or updating torch peft to latest).

In all cases, image encoding on CPU will always remain extremely slow. the concept we went for was aiming for gpu encoding of images, and cpu encoding of text samples in an online manner, but you will never be able to get fast image encoding, even on gpu, it's not ioptimized for image encoding speed (in fact it's slower than colqwen). we went all in for text throughput sacrificing the rest.

cheers,

manu changed discussion status to closed

Sign up or log in to comment