bilalnaveed commited on
Commit
bd75dbd
·
verified ·
1 Parent(s): 5972c5c

Upload Whisper-small ASR pipeline

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ base_model: openai/whisper-small
4
+ tags:
5
+ - automatic-speech-recognition
6
+ - asr
7
+ - speech-to-text
8
+ widget:
9
+ - audio: "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"
10
+ ---
11
+ # Model Card: Whisper-small ASR (Colab Integration)
12
+
13
+ This model card describes the integration of the `openai/whisper-small` model within the multi-modal AI system developed in this Colab environment.
14
+
15
+ ## Model Description
16
+ The `Whisper-small` model is a robust Automatic Speech Recognition (ASR) model developed by OpenAI. It is trained on a large dataset of diverse audio and text, enabling it to perform accurate speech-to-text transcription across various languages and domains. Its small size makes it efficient for deployment while still maintaining high transcription quality.
17
+
18
+ ## Capabilities
19
+ This model primarily serves as the **Automatic Speech Recognition (ASR)** component of our multi-modal AI system. It can:
20
+ - Transcribe spoken language into written text with high accuracy.
21
+ - Support multiple languages for transcription.
22
+ - Process audio files to extract textual content.
23
+ - Enable speech-driven interactions within the multi-modal agent.
24
+
25
+ ## Integration Details
26
+ - **Model Name:** `openai/whisper-small`
27
+ - **Loading:** Loaded using `transformers.pipeline` with the `automatic-speech-recognition` task and `device=0` for GPU acceleration.
28
+
29
+ ## Creator Identity
30
+ This model integration was performed by **Google Colab AI** as part of the **Multi-modal AI assistant** project. The integrated system is identified as **ColabMAMA** (version 1.0), and its core capabilities include: text generation, image generation, speech-to-text, web search, multi-step reasoning.
31
+
32
+ ## Inference Examples
33
+ To use this integrated Whisper ASR model for audio transcription, you can leverage the `transformers` library. Below are Python code examples demonstrating how to load the model and perform inference.
34
+
35
+ First, ensure `transformers` and `soundfile` are installed:
36
+
37
+ ```python
38
+ # Install required libraries (if not already installed)
39
+ !pip install transformers soundfile
40
+
41
+ from transformers import pipeline
42
+ import soundfile as sf
43
+ import numpy as np
44
+ import os # Import os for path checking
45
+
46
+ # Load the ASR pipeline
47
+ asr_pipeline = pipeline("automatic-speech-recognition", model="openai/whisper-small", device=0)
48
+
49
+ # Example function to transcribe audio (similar to the wrapper used in the notebook)
50
+ def transcribe_audio(audio_file_path: str) -> str:
51
+ transcription = asr_pipeline(audio_file_path)["text"]
52
+ return transcription
53
+
54
+ # Create a dummy audio file for demonstration (requires scipy or torchaudio for actual use)
55
+ # For a real scenario, you would use an actual .wav or .flac file.
56
+ # For this example, let's assume you have an audio file named 'sample_audio.flac'
57
+ # If you don't have one, you can simulate a transcription:
58
+
59
+ # Simulating transcription with a placeholder if no audio file is available
60
+ # from datasets import load_dataset
61
+ # dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
62
+ # sample = dataset[0]["audio"]
63
+ # sf.write("sample_audio.flac", sample["array"], sample["sampling_rate"])
64
+
65
+ # Perform inference with a dummy file or a real file if available
66
+ try:
67
+ audio_path = "sample_audio.flac" # Define audio_path here to ensure it's always in scope
68
+ # If 'sample_audio.flac' does not exist, this will fail. For demo, we might mock it.
69
+ # For a live demo, ensure an audio file is present.
70
+ if not os.path.exists(audio_path):
71
+ # Modified print statement to avoid f-string evaluation error in the markdown generation context
72
+ print("Warning:", audio_path, "not found. Skipping live audio transcription example.")
73
+ transcribed_text = "(Simulated transcription: The quick brown fox jumps over the lazy dog.)"
74
+ else:
75
+ transcribed_text = transcribe_audio(audio_path)
76
+
77
+ print("
78
+ Audio File:", audio_path)
79
+ print("Transcription:", transcribed_text)
80
+ except Exception as e:
81
+ print("Error during transcription example:", e)
82
+ print("Please ensure you have an audio file or handle mock transcription appropriately.")
83
+
84
+ ```
85
+
86
+ ## Limitations and Bias
87
+ Whisper models, like all ASR systems, can exhibit biases based on their training data, potentially performing less accurately for certain accents, dialects, or noisy environments. Users should be aware of potential transcription errors and review outputs critically, especially in sensitive applications.