bilalnaveed
/

whisper-colab

+---
+base_model: openai/whisper-small
+tags:
+- automatic-speech-recognition
+- asr
+- speech-to-text
+widget:
+- audio: "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"
+---
+# Model Card: Whisper-small ASR (Colab Integration)
+This model card describes the integration of the `openai/whisper-small` model within the multi-modal AI system developed in this Colab environment.
+## Model Description
+The `Whisper-small` model is a robust Automatic Speech Recognition (ASR) model developed by OpenAI. It is trained on a large dataset of diverse audio and text, enabling it to perform accurate speech-to-text transcription across various languages and domains. Its small size makes it efficient for deployment while still maintaining high transcription quality.
+## Capabilities
+This model primarily serves as the **Automatic Speech Recognition (ASR)** component of our multi-modal AI system. It can:
+- Transcribe spoken language into written text with high accuracy.
+- Support multiple languages for transcription.
+- Process audio files to extract textual content.
+- Enable speech-driven interactions within the multi-modal agent.
+## Integration Details
+- **Model Name:** `openai/whisper-small`
+- **Loading:** Loaded using `transformers.pipeline` with the `automatic-speech-recognition` task and `device=0` for GPU acceleration.
+## Creator Identity
+This model integration was performed by **Google Colab AI** as part of the **Multi-modal AI assistant** project. The integrated system is identified as **ColabMAMA** (version 1.0), and its core capabilities include: text generation, image generation, speech-to-text, web search, multi-step reasoning.
+## Inference Examples
+To use this integrated Whisper ASR model for audio transcription, you can leverage the `transformers` library. Below are Python code examples demonstrating how to load the model and perform inference.
+First, ensure `transformers` and `soundfile` are installed:
+```python
+# Install required libraries (if not already installed)
+!pip install transformers soundfile
+from transformers import pipeline
+import soundfile as sf
+import numpy as np
+import os # Import os for path checking
+# Load the ASR pipeline
+asr_pipeline = pipeline("automatic-speech-recognition", model="openai/whisper-small", device=0)
+# Example function to transcribe audio (similar to the wrapper used in the notebook)
+def transcribe_audio(audio_file_path: str) -> str:
+    transcription = asr_pipeline(audio_file_path)["text"]
+    return transcription
+# Create a dummy audio file for demonstration (requires scipy or torchaudio for actual use)
+# For a real scenario, you would use an actual .wav or .flac file.
+# For this example, let's assume you have an audio file named 'sample_audio.flac'
+# If you don't have one, you can simulate a transcription:
+# Simulating transcription with a placeholder if no audio file is available
+# from datasets import load_dataset
+# dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
+# sample = dataset[0]["audio"]
+# sf.write("sample_audio.flac", sample["array"], sample["sampling_rate"])
+# Perform inference with a dummy file or a real file if available
+try:
+    audio_path = "sample_audio.flac" # Define audio_path here to ensure it's always in scope
+    # If 'sample_audio.flac' does not exist, this will fail. For demo, we might mock it.
+    # For a live demo, ensure an audio file is present.
+    if not os.path.exists(audio_path):
+        # Modified print statement to avoid f-string evaluation error in the markdown generation context
+        print("Warning:", audio_path, "not found. Skipping live audio transcription example.")
+        transcribed_text = "(Simulated transcription: The quick brown fox jumps over the lazy dog.)"
+    else:
+        transcribed_text = transcribe_audio(audio_path)
+    print("
+Audio File:", audio_path)
+    print("Transcription:", transcribed_text)
+except Exception as e:
+    print("Error during transcription example:", e)
+    print("Please ensure you have an audio file or handle mock transcription appropriately.")
+```
+## Limitations and Bias
+Whisper models, like all ASR systems, can exhibit biases based on their training data, potentially performing less accurately for certain accents, dialects, or noisy environments. Users should be aware of potential transcription errors and review outputs critically, especially in sensitive applications.