Automatic Speech Recognition
Transformers
Safetensors
phi4mm
text-generation
nlp
code
audio
speech-summarization
speech-translation
visual-question-answering
phi-4-multimodal
phi
phi-4-mini
custom_code
Eval Results
Instructions to use microsoft/Phi-4-multimodal-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Phi-4-multimodal-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="microsoft/Phi-4-multimodal-instruct", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-multimodal-instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
update readme
Browse files
README.md
CHANGED
|
@@ -86,7 +86,7 @@ Watch as Phi-4 Multimodal analyzes spoken language to help plan a trip to Seattl
|
|
| 86 |
|
| 87 |
<div style="width: 800px; height: 400px; margin: 0 auto;">
|
| 88 |
<video autoplay muted loop controls playsinline style="width: 100%; height: 100%; object-fit: contain;">
|
| 89 |
-
<source src="https://
|
| 90 |
Your browser does not support the video tag.
|
| 91 |
</video>
|
| 92 |
</div>
|
|
@@ -94,7 +94,7 @@ Watch as Phi-4 Multimodal analyzes spoken language to help plan a trip to Seattl
|
|
| 94 |
See how Phi-4 Multimodal tackles complex mathematical problems through visual inputs, demonstrating its ability to process and solve equations presented in images.
|
| 95 |
<div style="width: 800px; height: 400px; margin: 0 auto;">
|
| 96 |
<video autoplay muted loop controls playsinline style="width: 100%; height: 100%; object-fit: contain;">
|
| 97 |
-
<source src="https://
|
| 98 |
Your browser does not support the video tag.
|
| 99 |
</video>
|
| 100 |
</div>
|
|
@@ -102,7 +102,7 @@ See how Phi-4 Multimodal tackles complex mathematical problems through visual in
|
|
| 102 |
Explore how Phi-4 Mini functions as an intelligent agent, showcasing its reasoning and task execution abilities in complex scenarios.
|
| 103 |
<div style="width: 800px; height: 400px; margin: 0 auto;">
|
| 104 |
<video autoplay muted loop controls playsinline style="width: 100%; height: 100%; object-fit: contain;">
|
| 105 |
-
<source src="https://
|
| 106 |
Your browser does not support the video tag.
|
| 107 |
</video>
|
| 108 |
</div>
|
|
|
|
| 86 |
|
| 87 |
<div style="width: 800px; height: 400px; margin: 0 auto;">
|
| 88 |
<video autoplay muted loop controls playsinline style="width: 100%; height: 100%; object-fit: contain;">
|
| 89 |
+
<source src="https://github.com/nguyenbh/phi4mm-demos/raw/refs/heads/main/clips/Phi-4-multimodal_SeattleTrip.mp4" type="video/mp4">
|
| 90 |
Your browser does not support the video tag.
|
| 91 |
</video>
|
| 92 |
</div>
|
|
|
|
| 94 |
See how Phi-4 Multimodal tackles complex mathematical problems through visual inputs, demonstrating its ability to process and solve equations presented in images.
|
| 95 |
<div style="width: 800px; height: 400px; margin: 0 auto;">
|
| 96 |
<video autoplay muted loop controls playsinline style="width: 100%; height: 100%; object-fit: contain;">
|
| 97 |
+
<source src="https://github.com/nguyenbh/phi4mm-demos/raw/refs/heads/main/clips/Phi-4-multimodal_Math.mp4" type="video/mp4">
|
| 98 |
Your browser does not support the video tag.
|
| 99 |
</video>
|
| 100 |
</div>
|
|
|
|
| 102 |
Explore how Phi-4 Mini functions as an intelligent agent, showcasing its reasoning and task execution abilities in complex scenarios.
|
| 103 |
<div style="width: 800px; height: 400px; margin: 0 auto;">
|
| 104 |
<video autoplay muted loop controls playsinline style="width: 100%; height: 100%; object-fit: contain;">
|
| 105 |
+
<source src="https://github.com/nguyenbh/phi4mm-demos/raw/refs/heads/main/clips/Phi-4-mini_Agents.mp4" type="video/mp4">
|
| 106 |
Your browser does not support the video tag.
|
| 107 |
</video>
|
| 108 |
</div>
|