microsoft
/

Phi-4-multimodal-instruct

Automatic Speech Recognition

text-generation

speech-summarization

speech-translation

visual-question-answering

phi-4-multimodal

Model card Files Files and versions

fixes the asserion error when num_beams > 1

#42

by freewym - opened Mar 12, 2025

base: refs/heads/main

←

from: refs/pr/42

Discussion Files changed

Files changed (1) hide show

modeling_phi4mm.py +1 -1

modeling_phi4mm.py CHANGED Viewed

@@ -2096,7 +2096,7 @@ class Phi4MMForCausalLM(Phi4MMPreTrainedModel, GenerationMixin):
         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
         if isinstance(input_mode, torch.Tensor):
-            assert len(input_mode) == 1
             input_mode = input_mode[0].item()
         input_mode = InputMode(input_mode)

         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
         if isinstance(input_mode, torch.Tensor):
+            # len(input_mode) == num_beams in beam search, and all elements of input_mode should have the same value
             input_mode = input_mode[0].item()
         input_mode = InputMode(input_mode)