Instructions to use microsoft/Phi-4-multimodal-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Phi-4-multimodal-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="microsoft/Phi-4-multimodal-instruct", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-multimodal-instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
How to use it with LM Studio?
Can't find it in LM Studio download
Someone has to convert this to GGUF, it shouldn't take long.
After having a gguf version will it be able to do audio tasks, like ASR, speaker diarization, speech synthesis, as well?
Someone has to convert this to GGUF, it shouldn't take long.
I guess GGUF not supported, otherwise we should see it by now.
Perhaps we can count on the ONNX version for CPU only environment for now.
Not sure LM Studio support it though.
https://huggingface.co/microsoft/Phi-4-multimodal-instruct-onnx
@limcheekin You may find this discussion helpful
https://huggingface.co/microsoft/Phi-4-multimodal-instruct/discussions/7#67c4d764491ec4e926ed9d84
Is speaker diarization possible?