karthigamuthuraj commited on
Commit
398a7c3
·
verified ·
1 Parent(s): b904c1e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Features
2
+
3
+ - **Multilingual Transcription**: Automatically transcribes audio in various languages using OpenAI’s Whisper model.
4
+ - **Speaker Diarization**: Detects different speakers in the audio and labels the transcription accordingly.
5
+ - **File Upload**: Allows users to upload an audio file, which is then processed for transcription and speaker diarization.
6
+ - **Timestamped File Naming**: Uploaded files are saved with a unique timestamp in the filename.
7
+
8
+ ## Requirements
9
+
10
+ Make sure to have the following Python libraries installed. You can install them using `pip` and the `requirements.txt` file provided.
11
+
12
+ ### Installation
13
+
14
+ 1. Clone the repository:
15
+ ```bash
16
+ git clone https://github.com/your-repository-url.git
17
+ cd your-repository
18
+ ```
19
+
20
+ 2. Create and activate a virtual environment (optional but recommended):
21
+ ```bash
22
+ python -m venv env
23
+ source env/bin/activate # On Windows, use `env\Scripts\activate`
24
+ ```
25
+
26
+ 3. Install the dependencies:
27
+ ```bash
28
+ pip install -r requirements.txt
29
+ ```
30
+
31
+ 4. Install Hugging Face authentication token for pyannote audio (if required):
32
+ - Create an account on Hugging Face (https://huggingface.co/).
33
+ - Obtain an API token from your account.
34
+ - Use the token in your app by setting it as an environment variable or directly in the code:
35
+ ```python
36
+ use_auth_token="your_token"
37
+ ```
38
+
39
+ ## Usage
40
+
41
+ 1. Run the Streamlit app:
42
+ ```bash
43
+ streamlit run app.py
44
+ ```
45
+
46
+ 2. The app will launch in your browser. Select an audio file (MP3, WAV, or M4A format) from your system.
47
+
48
+ 3. The file will be uploaded to the `upload` directory, and the transcription will begin.
49
+
50
+ 4. After processing, the app will display:
51
+ - The detected language of the audio.
52
+ - The transcription with speaker labels.
53
+
54
+ ## Models
55
+
56
+ ### Whisper Model
57
+ - Used for multilingual transcription.
58
+ - The model is loaded using the `whisper` Python package.
59
+
60
+ ### PyAnnote Model
61
+ - Used for speaker diarization to detect speakers in the audio.
62
+ - The model is loaded using the `pyannote.audio` library.
63
+
64
+ ## Troubleshooting
65
+
66
+ ### Diarization Model Issues
67
+ If you face issues with loading the diarization model, ensure you have:
68
+ - Installed the correct dependencies.
69
+ - Set up the Hugging Face token if required.
70
+
71
+ ### Model Load Failures
72
+ If the models fail to load, ensure that:
73
+ - The internet connection is stable.
74
+ - The model files are downloaded correctly.
75
+
76
+ ### File Upload Issues
77
+ If the file upload is not working correctly:
78
+ - Ensure the `upload` folder exists in your project directory.
79
+ - Make sure the file path is correct and accessible.