Discrepancy in model sizes

by varunneal - opened Aug 30, 2023

Aug 30, 2023

•

edited Aug 30, 2023

Hello team! The size of the unquantized onnx model is 133mb, whereas the pytorch model is only 66.8mb. This is generally uncommon. For example, all-MiniLM-L6-v2's unquantized size is 90mb, roughly the same as the pytorch model.

While this isn't a problem itself, I wanted to raise this issue for further investigation.

Edit: I found Xenova has also uploaded his own version of this model, here, and it has the same issue.

ggrn

Supabase org Aug 30, 2023

@varun4 I was confused by this at first too. The pytorch model for gte-small is 16 bit as opposed to many other models that are 32 bit. The non-quantized ONNX models are always 32 bit, and quantized are 8 bit. This is why the non-quantized ONNX model is double the size of the pytorch model, and quantized ONNX model is half the size of the pytorch model.

varunneal

Aug 31, 2023

That makes sense thank you!

ggrn changed discussion status to closed Sep 3, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment