Spaces:
Sleeping
Sleeping
Create readme.txt
Browse files- readme.txt +49 -0
readme.txt
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Accent Classifier
|
| 3 |
+
emoji: 🎙️
|
| 4 |
+
colorFrom: indigo
|
| 5 |
+
colorTo: cyan
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: "3.38.1"
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Accent Classifier 🎙️
|
| 13 |
+
|
| 14 |
+
This Gradio app downloads a public YouTube or Vimeo video, extracts its audio, and classifies the speaker (as a proxy for accent) using a Hugging Face model. It’s perfect for demonstrating how to hook up **yt-dlp**, **ffmpeg**, and a **wav2vec2** pipeline in one slick interface—no rocket science required (just a little AI eavesdropping).
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
## 🛠️ How It Works
|
| 19 |
+
|
| 20 |
+
1. **Input**: You paste a *public* YouTube or Vimeo video URL into the Gradio textbox.
|
| 21 |
+
2. **Download**: Under the hood, `yt-dlp` fetches the video’s best audio stream and saves it as `downloaded_video.mp4`.
|
| 22 |
+
3. **Extract Audio**: `ffmpeg` converts that MP4 into a WAV file (`extracted_audio.wav`) at 16 kHz, mono—exactly what speech models crave.
|
| 23 |
+
4. **Classify**: A Hugging Face `pipeline("audio-classification", model="superb/wav2vec2-base-superb-sid")` processes the WAV file and returns a speaker ID (used here as an accent proxy) plus confidence.
|
| 24 |
+
5. **Cleanup**: Temporary files are removed automatically so you don’t end up with a cluttered folder.
|
| 25 |
+
6. **Output**: The app displays the predicted speaker ID and confidence percentage.
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## 📦 Requirements
|
| 30 |
+
|
| 31 |
+
- **Python 3.8+**
|
| 32 |
+
- **yt-dlp** (installed automatically by the script if missing)
|
| 33 |
+
- **ffmpeg** (must be installed on your system and accessible via your command line)
|
| 34 |
+
- **gradio** (for the web interface)
|
| 35 |
+
- **transformers** (Hugging Face library)
|
| 36 |
+
|
| 37 |
+
> ❗ **Note**: If `yt-dlp` is not already installed, the code will install it at runtime. However, you must have `ffmpeg` installed manually. On macOS you can use Homebrew (`brew install ffmpeg`); on Ubuntu/Debian, `sudo apt-get install ffmpeg`; on Windows, download from [ffmpeg.org](https://ffmpeg.org/) and add it to your PATH.
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## 🚀 Installation
|
| 42 |
+
|
| 43 |
+
1. **Clone or Download** this repository (the one containing `app.py` and this `README.md`).
|
| 44 |
+
2. **(Optional)** Create and activate a virtual environment:
|
| 45 |
+
|
| 46 |
+
```bash
|
| 47 |
+
python3 -m venv venv
|
| 48 |
+
source venv/bin/activate # macOS/Linux
|
| 49 |
+
venv\Scripts\activate.bat # Windows
|