File size: 2,576 Bytes
398a7c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
## Features

- **Multilingual Transcription**: Automatically transcribes audio in various languages using OpenAI’s Whisper model.
- **Speaker Diarization**: Detects different speakers in the audio and labels the transcription accordingly.
- **File Upload**: Allows users to upload an audio file, which is then processed for transcription and speaker diarization.
- **Timestamped File Naming**: Uploaded files are saved with a unique timestamp in the filename.

## Requirements

Make sure to have the following Python libraries installed. You can install them using `pip` and the `requirements.txt` file provided.

### Installation

1. Clone the repository:
   ```bash
   git clone https://github.com/your-repository-url.git
   cd your-repository
   ```

2. Create and activate a virtual environment (optional but recommended):
   ```bash
   python -m venv env
   source env/bin/activate  # On Windows, use `env\Scripts\activate`
   ```

3. Install the dependencies:
   ```bash
   pip install -r requirements.txt
   ```

4. Install Hugging Face authentication token for pyannote audio (if required):
   - Create an account on Hugging Face (https://huggingface.co/).
   - Obtain an API token from your account.
   - Use the token in your app by setting it as an environment variable or directly in the code:
     ```python
     use_auth_token="your_token"
     ```

## Usage

1. Run the Streamlit app:
   ```bash
   streamlit run app.py
   ```

2. The app will launch in your browser. Select an audio file (MP3, WAV, or M4A format) from your system.

3. The file will be uploaded to the `upload` directory, and the transcription will begin.

4. After processing, the app will display:
   - The detected language of the audio.
   - The transcription with speaker labels.

## Models

### Whisper Model
- Used for multilingual transcription.
- The model is loaded using the `whisper` Python package.

### PyAnnote Model
- Used for speaker diarization to detect speakers in the audio.
- The model is loaded using the `pyannote.audio` library.

## Troubleshooting

### Diarization Model Issues
If you face issues with loading the diarization model, ensure you have:
- Installed the correct dependencies.
- Set up the Hugging Face token if required.

### Model Load Failures
If the models fail to load, ensure that:
- The internet connection is stable.
- The model files are downloaded correctly.

### File Upload Issues
If the file upload is not working correctly:
- Ensure the `upload` folder exists in your project directory.
- Make sure the file path is correct and accessible.