YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Welcome to the Nepali Whisper Model for Automatic Speech Recognition
Hey there! If youβre looking to transcribe Nepali audio into text, youβve come to the right place. Iβve fine-tuned the Whisper small model specifically for Nepali speech recognition. Below, youβll find all the details you need to get started, including where to find the datasets and notebooks!
π Model Overview
- Model Name: Whisper Small
- Language: Nepali
- Current Error Rate: 30 (achieved after training for 30 epochs)
Where to Find the Good Stuff
- Dataset: I created a combined dataset for Nepali language transcription, which you can access here on Hugging Face.
- Training Notebook: Curious about how I trained this model? Check out my Kaggle Notebook to see all the steps I took!
π§ How to Use This Model
Using this model is pretty straightforward! Hereβs a quick guide to get you up and running:
Step 1: Install Required Libraries
Before we dive in, make sure you have the necessary libraries installed. Open your terminal or notebook and run:
!pip install torch transformers yt-dlp
Step 2: Load the Model
To get started with transcribing audio, youβll need to load the Whisper model using the Hugging Face transformers library. Hereβs how:
from transformers import pipeline
# Load the Whisper model for automatic speech recognition
pipe = pipeline("automatic-speech-recognition", model="amitpant7/Nepali-Automatic-Speech-Recognition")
Step 3: Transcribe an Audio File
If you have an audio file ready, you can transcribe it easily. Just point to your audio file (like audio.mp3) and run the following:
# Run inference on an audio file
result = pipe('audio.mp3')
print(result['text']) # This will print the transcribed text
Step 4: Download and Transcribe YouTube Audio
Want to transcribe audio from a YouTube video? No problem! Use the following code to download the audio and transcribe it:
import yt_dlp
def download_youtube_audio(youtube_url, output_path="audio"):
ydl_opts = {
'format': 'bestaudio/best',
'outtmpl': output_path,
'postprocessors': [{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'mp3',
'preferredquality': '192',
}],
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
ydl.download([youtube_url])
return output_path
# Replace with the actual YouTube URL
youtube_url = "https://www.youtube.com/watch?v=H-ExNmHo2xI&pp=ygURYWJvdXQgeWFtYSBidWRkaGE%3D"
audio_file = download_youtube_audio(youtube_url)
# Now, run inference on the downloaded audio
result = pipe(audio_file)
print(result['text'])
π A Few Things to Keep in Mind
- The model works best with clear audio, so try to avoid noisy environments when recording.
- Performance may vary based on accents and pronunciation, but the model is designed to handle a variety of speech patterns.
- Downloads last month
- 195