question about quants

#12

by prudant - opened Jun 28, 2024

this kind of "LLM" for embeddings can be quantized, by example to AWQ o GPTQ format?
regards!

Alibaba-NLP org Jul 2, 2024

Indeed, gte embedding models can be quantized to reduce their computational requirements and memory footprint.

can you give me a little info of how get started with that? wich format, library or useful starting poing please !

i am planning to quantize from 4bytes to 2 bytes so that it is under pgvector's 2k limits. https://jkatz05.com/post/postgres/pgvector-scalar-binary-quantization/
I can report back and see if that works

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment