Model Card: Whisper-small ASR (Colab Integration)

This model card describes the integration of the openai/whisper-small model within the multi-modal AI system developed in this Colab environment.

Model Description

The Whisper-small model is a robust Automatic Speech Recognition (ASR) model developed by OpenAI. It is trained on a large dataset of diverse audio and text, enabling it to perform accurate speech-to-text transcription across various languages and domains. Its small size makes it efficient for deployment while still maintaining high transcription quality.

Capabilities

This model primarily serves as the Automatic Speech Recognition (ASR) component of our multi-modal AI system. It can:

Transcribe spoken language into written text with high accuracy.
Support multiple languages for transcription.
Process audio files to extract textual content.
Enable speech-driven interactions within the multi-modal agent.

Integration Details

Model Name: openai/whisper-small
Loading: Loaded using transformers.pipeline with the automatic-speech-recognition task and device=0 for GPU acceleration.

Creator Identity

This model integration was performed by Google Colab AI as part of the Multi-modal AI assistant project. The integrated system is identified as ColabMAMA (version 1.0), and its core capabilities include: text generation, image generation, speech-to-text, web search, multi-step reasoning.

Inference Examples

To use this integrated Whisper ASR model for audio transcription, you can leverage the transformers library. Below are Python code examples demonstrating how to load the model and perform inference.

First, ensure transformers and soundfile are installed:

# Install required libraries (if not already installed)
!pip install transformers soundfile

from transformers import pipeline
import soundfile as sf
import numpy as np
import os # Import os for path checking

# Load the ASR pipeline
asr_pipeline = pipeline("automatic-speech-recognition", model="openai/whisper-small", device=0)

# Example function to transcribe audio (similar to the wrapper used in the notebook)
def transcribe_audio(audio_file_path: str) -> str:
    transcription = asr_pipeline(audio_file_path)["text"]
    return transcription

# Create a dummy audio file for demonstration (requires scipy or torchaudio for actual use)
# For a real scenario, you would use an actual .wav or .flac file.
# For this example, let's assume you have an audio file named 'sample_audio.flac'
# If you don't have one, you can simulate a transcription:

# Simulating transcription with a placeholder if no audio file is available
# from datasets import load_dataset
# dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
# sample = dataset[0]["audio"]
# sf.write("sample_audio.flac", sample["array"], sample["sampling_rate"])

# Perform inference with a dummy file or a real file if available
try:
    audio_path = "sample_audio.flac" # Define audio_path here to ensure it's always in scope
    # If 'sample_audio.flac' does not exist, this will fail. For demo, we might mock it.
    # For a live demo, ensure an audio file is present.
    if not os.path.exists(audio_path):
        # Modified print statement to avoid f-string evaluation error in the markdown generation context
        print("Warning:", audio_path, "not found. Skipping live audio transcription example.")
        transcribed_text = "(Simulated transcription: The quick brown fox jumps over the lazy dog.)"
    else:
        transcribed_text = transcribe_audio(audio_path)

    print("
Audio File:", audio_path)
    print("Transcription:", transcribed_text)
except Exception as e:
    print("Error during transcription example:", e)
    print("Please ensure you have an audio file or handle mock transcription appropriately.")

Limitations and Bias

Whisper models, like all ASR systems, can exhibit biases based on their training data, potentially performing less accurately for certain accents, dialects, or noisy environments. Users should be aware of potential transcription errors and review outputs critically, especially in sensitive applications.

Downloads last month: -

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for bilalnaveed/whisper-colab

Base model

openai/whisper-small

Finetuned

(3336)

this model