Zero-Shot Image Classification
Transformers
Safetensors
siglip2
vision

Inconsistent Results When Passing Batch vs Single

#6
by ysdk - opened

I am seeing inconsistent behavior in the output when I pass a batch of inputs vs if I pass a single input.
image

image

image

Notice the huge discrepancy between the vector outputs of the last vector in the batch call vs the vector in the single call. Am I missing something? This behavior isn't consistent with other VL-embedding models (e.g. CLIP, SigLip, etc.)

Sign up or log in to comment