Model Card: Whisper-small ASR (Colab Integration)
This model card describes the integration of the openai/whisper-small model within the multi-modal AI system developed in this Colab environment.
Model Description
The Whisper-small model is a robust Automatic Speech Recognition (ASR) model developed by OpenAI. It is trained on a large dataset of diverse audio and text, enabling it to perform accurate speech-to-text transcription across various languages and domains. Its small size makes it efficient for deployment while still maintaining high transcription quality.
Capabilities
This model primarily serves as the Automatic Speech Recognition (ASR) component of our multi-modal AI system. It can:
- Transcribe spoken language into written text with high accuracy.
- Support multiple languages for transcription.
- Process audio files to extract textual content.
- Enable speech-driven interactions within the multi-modal agent.
Integration Details
- Model Name:
openai/whisper-small - Loading: Loaded using
transformers.pipelinewith theautomatic-speech-recognitiontask anddevice=0for GPU acceleration.
Creator Identity
This model integration was performed by Google Colab AI as part of the Multi-modal AI assistant project. The integrated system is identified as ColabMAMA (version 1.0), and its core capabilities include: text generation, image generation, speech-to-text, web search, multi-step reasoning.
Inference Examples
To use this integrated Whisper ASR model for audio transcription, you can leverage the transformers library. Below are Python code examples demonstrating how to load the model and perform inference.
First, ensure transformers and soundfile are installed:
# Install required libraries (if not already installed)
!pip install transformers soundfile
from transformers import pipeline
import soundfile as sf
import numpy as np
import os # Import os for path checking
# Load the ASR pipeline
asr_pipeline = pipeline("automatic-speech-recognition", model="openai/whisper-small", device=0)
# Example function to transcribe audio (similar to the wrapper used in the notebook)
def transcribe_audio(audio_file_path: str) -> str:
transcription = asr_pipeline(audio_file_path)["text"]
return transcription
# Create a dummy audio file for demonstration (requires scipy or torchaudio for actual use)
# For a real scenario, you would use an actual .wav or .flac file.
# For this example, let's assume you have an audio file named 'sample_audio.flac'
# If you don't have one, you can simulate a transcription:
# Simulating transcription with a placeholder if no audio file is available
# from datasets import load_dataset
# dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
# sample = dataset[0]["audio"]
# sf.write("sample_audio.flac", sample["array"], sample["sampling_rate"])
# Perform inference with a dummy file or a real file if available
try:
audio_path = "sample_audio.flac" # Define audio_path here to ensure it's always in scope
# If 'sample_audio.flac' does not exist, this will fail. For demo, we might mock it.
# For a live demo, ensure an audio file is present.
if not os.path.exists(audio_path):
# Modified print statement to avoid f-string evaluation error in the markdown generation context
print("Warning:", audio_path, "not found. Skipping live audio transcription example.")
transcribed_text = "(Simulated transcription: The quick brown fox jumps over the lazy dog.)"
else:
transcribed_text = transcribe_audio(audio_path)
print("
Audio File:", audio_path)
print("Transcription:", transcribed_text)
except Exception as e:
print("Error during transcription example:", e)
print("Please ensure you have an audio file or handle mock transcription appropriately.")
Limitations and Bias
Whisper models, like all ASR systems, can exhibit biases based on their training data, potentially performing less accurately for certain accents, dialects, or noisy environments. Users should be aware of potential transcription errors and review outputs critically, especially in sensitive applications.
- Downloads last month
- -
Model tree for bilalnaveed/whisper-colab
Base model
openai/whisper-small