File size: 2,576 Bytes
398a7c3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | ## Features
- **Multilingual Transcription**: Automatically transcribes audio in various languages using OpenAI’s Whisper model.
- **Speaker Diarization**: Detects different speakers in the audio and labels the transcription accordingly.
- **File Upload**: Allows users to upload an audio file, which is then processed for transcription and speaker diarization.
- **Timestamped File Naming**: Uploaded files are saved with a unique timestamp in the filename.
## Requirements
Make sure to have the following Python libraries installed. You can install them using `pip` and the `requirements.txt` file provided.
### Installation
1. Clone the repository:
```bash
git clone https://github.com/your-repository-url.git
cd your-repository
```
2. Create and activate a virtual environment (optional but recommended):
```bash
python -m venv env
source env/bin/activate # On Windows, use `env\Scripts\activate`
```
3. Install the dependencies:
```bash
pip install -r requirements.txt
```
4. Install Hugging Face authentication token for pyannote audio (if required):
- Create an account on Hugging Face (https://huggingface.co/).
- Obtain an API token from your account.
- Use the token in your app by setting it as an environment variable or directly in the code:
```python
use_auth_token="your_token"
```
## Usage
1. Run the Streamlit app:
```bash
streamlit run app.py
```
2. The app will launch in your browser. Select an audio file (MP3, WAV, or M4A format) from your system.
3. The file will be uploaded to the `upload` directory, and the transcription will begin.
4. After processing, the app will display:
- The detected language of the audio.
- The transcription with speaker labels.
## Models
### Whisper Model
- Used for multilingual transcription.
- The model is loaded using the `whisper` Python package.
### PyAnnote Model
- Used for speaker diarization to detect speakers in the audio.
- The model is loaded using the `pyannote.audio` library.
## Troubleshooting
### Diarization Model Issues
If you face issues with loading the diarization model, ensure you have:
- Installed the correct dependencies.
- Set up the Hugging Face token if required.
### Model Load Failures
If the models fail to load, ensure that:
- The internet connection is stable.
- The model files are downloaded correctly.
### File Upload Issues
If the file upload is not working correctly:
- Ensure the `upload` folder exists in your project directory.
- Make sure the file path is correct and accessible.
|