Batch vs individual inference output mismatch
#9
by
E1eMental
- opened
Description
I'm experiencing inconsistent outputs when comparing individual inference vs batch inference with Qwen3-VL-2B-Instruct. Despite using deterministic settings (temperature=0.0, do_sample=False, num_beams=1), one sample produces different results depending on whether it's processed individually or in a batch.
Issue Details
- Sample 1: Outputs differ between individual (1153 chars) and batch inference (1159 chars) β
- Sample 2: Outputs match perfectly β
- Using the same image and prompts in both modes
- Tried both left and right padding - neither resolves the issue
Environment
- Model:
Qwen/Qwen3-VL-2B-Instruct - Device: CUDA
- Precision:
torch.bfloat16 - Attention:
flash_attention_2
Reproducible Code
import torch
from qwen_vl_utils import process_vision_info
from transformers import AutoModelForImageTextToText, AutoProcessor
# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
# Load model and processor
model_id = "Qwen/Qwen3-VL-2B-Instruct"
print(f"\nLoading model: {model_id}")
model = AutoModelForImageTextToText.from_pretrained(
model_id,
dtype=torch.bfloat16,
device_map="auto",
attn_implementation="flash_attention_2",
)
processor = AutoProcessor.from_pretrained(model_id)
processor.tokenizer.padding_side = "left" # Tried both "left" and "right"
messages1 = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg",
},
{"type": "text", "text": "Describe this image in detail."},
],
}
]
messages2 = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg",
},
{"type": "text", "text": "What do you see in this image? Please provide a comprehensive description."},
],
}
]
# Combine messages for batch processing
messages = [messages1, messages2]
# Run each sample separately
print("\n=== INDIVIDUAL INFERENCE ===")
individual_outputs = []
for idx, msg in enumerate([messages1, messages2], 1):
text = processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
image_inputs, _ = process_vision_info([msg])
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=256, num_beams=1, do_sample=False, temperature=0.0)
generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
individual_outputs.append(output_text[0])
print(f"Sample {idx}: {output_text[0]}")
# Batch Inference
print("\n=== BATCH INFERENCE ===")
texts = [processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True) for msg in messages]
image_inputs, _ = process_vision_info(messages)
inputs = processor(
text=texts,
images=image_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=256, num_beams=1, do_sample=False, temperature=0.0)
generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_texts = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
for idx, output in enumerate(output_texts, 1):
print(f"Sample {idx}: {output}")
# Text Comparison
print("\n=== COMPARISON ===")
for idx in range(len(output_texts)):
match = individual_outputs[idx] == output_texts[idx]
status = "β MATCH" if match else "β MISMATCH"
print(f"Sample {idx + 1}: {status}")
if not match:
print(f" Length: {len(individual_outputs[idx])} vs {len(output_texts[idx])}")
all_match = all(individual_outputs[i] == output_texts[i] for i in range(len(output_texts)))
print(f"\nOverall: {'β All match' if all_match else 'β Differences found'}")
Output
=== COMPARISON ===
Sample 1: β MISMATCH
Length: 1153 vs 1159
Sample 2: β MATCH
Overall: β Differences found
What I've Tried
β
Set temperature=0.0, do_sample=False, num_beams=1 for deterministic generation
β
Tried padding_side="left" - still produces mismatches
β
Tried padding_side="right" - still produces mismatches
β
Used the same generation parameters in both individual and batch modes