how did you quantize this?

by LeePapa - opened 7 days ago

trying to figure out how to quantize siglip2 models - specifically i want to quantize this variant (fancyfeast/so400m-long ) since it supports 256 tokens and not the default 64. when i try mlx-vlm or mlx-lm it says siglip2 is not supported.

alexeyalbert

MLX Community org 1 day ago

•

edited 1 day ago

Hey, yeah I don't believe mlx-vlm or mlx-lm support SigLIP 2 since it's technically not a vlm or llm. There's a sister package named mlx-embeddings that I contributed SigLIP 2 support to which should work for your purposes.

Just due to the naming of the model on hugging face, you'll have to download the model manually, rename the folder to something like "fancyfeast-so400m-long-patch14-384" and pass in the path to that as the hf_path when calling the convert() func in order for the image and patch size to be correctly detected.

In case there's still any confusion here's a snippet of my code

from mlx_embeddings.utils import convert

convert(
    hf_path="google/siglip2-base-patch16-224",
    mlx_path="Siglip2-base-patch16-224-8bit",
    quantize=True,
    q_bits=8,
    skip_vision=False
)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment