Automatic Speech Recognition
Transformers
Safetensors
phi4mm
text-generation
nlp
code
audio
speech-summarization
speech-translation
visual-question-answering
phi-4-multimodal
phi
phi-4-mini
custom_code
Eval Results
Instructions to use microsoft/Phi-4-multimodal-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Phi-4-multimodal-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="microsoft/Phi-4-multimodal-instruct", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-multimodal-instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Fix image embedding logic to be mps-compatible
#45
by DefOs9 - opened
Addresses the assertion error raised on mps machines. Cf. https://huggingface.co/microsoft/Phi-4-multimodal-instruct/discussions/12
- MPS changes:
.bool()instead of.type(torch.BoolTensor)- Avoid
index_putissues by having an mps-specific logical block.
- The
temp_lenvariable in the assertion was never used anyway, so I removed the variable and the offending assertion. - Various clean up of comments and code.
I do not recommend to remove the variable of 'temp_len', this is used to verify the length consistency between the pre-spared image tokens and the real number of image tokens. Also do not change the comments with hard code dimension number, in case the users might change the input resolution or the image encoder by themselves.