prthm11 commited on
Commit
3ce7d8c
·
verified ·
1 Parent(s): 4207399

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -240
README.md DELETED
@@ -1,240 +0,0 @@
1
- #### 1. Initialization & configuration
2
-
3
- FORMAT, CHANNELS, RATE, CHUNK, CHUNK_DURATION_SECS, OUTPUT_DIR, CHUNKS_DIR, FINAL_WAV, TRANSCRIPT_FILE, MODEL_NAME
4
-
5
- #### Device Listing
6
-
7
- ###### list_input_devices():
8
-
9
- - Lists all available audio input devices (microphones, loopbacks, etc.) with their indices and channel counts.
10
-
11
- 1. **Create a PyAudio Instance**
12
- - Initialize a new `PyAudio` object to interact with the audio hardware.
13
- 2. **Print Header**
14
- - Print a message indicating that available audio input devices will be listed.
15
- 3. **Iterate Over All Devices**
16
- - For each device index from `0` to `get_device_count() - 1`:
17
- - Retrieve device information using `get_device_info_by_index(i)`.
18
- 4. **Filter Input Devices**
19
- - For each device, check if `"maxInputChannels"` is greater than `0` (i.e., it can record audio).
20
- 5. **Print Device Info**
21
- - If the device is an input device, print its index, name, and number of input channels.
22
- 6. **Terminate PyAudio**
23
- - After listing, terminate the `PyAudio` instance to free resources.
24
-
25
- #### Audio Stream Handling
26
-
27
- ###### open_stream_for_device(device_index, channels):
28
-
29
- - Opens a PyAudio input stream for the given device index and channel count.
30
-
31
- 1. **Input Parameters**
32
- - [device_index](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): The index of the audio input device to use (e.g., microphone or system audio).
33
- - [channels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): Number of audio channels to record (default is 1, i.e., mono).
34
- 2. **Open Audio Stream**
35
- - Use the global [audio](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) object (an instance of [pyaudio.PyAudio()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)).
36
- - Call [audio.open()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) with the following parameters:
37
- - [format=FORMAT](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (audio sample format, e.g., 16-bit int)
38
- - [channels=channels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (number of channels)
39
- - [rate=RATE](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (sample rate, e.g., 44100 Hz)
40
- - [input=True](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (open for input/recording)
41
- - [frames_per_buffer=CHUNK](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (buffer size per read)
42
- - [input_device_index=device_index](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (which device to use)
43
- 3. **Return Stream**
44
- - Return the opened stream object to the caller.
45
-
46
- #### Audio file Operations
47
-
48
- ###### save_wav_from_frames(path: Path, frames: list, nchannels=1):
49
-
50
- 1. **Input Parameters**
51
- - [path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): The file path where the WAV file will be saved.
52
- - [frames](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): A list of audio frames (byte strings) to write.
53
- - [nchannels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): Number of audio channels (default is 1).
54
- 2. **Open WAV File for Writing**
55
- - Use the [wave.open()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) function to open the file at [path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) in write-binary (`'wb'`) mode.
56
- 3. **Set WAV File Parameters**
57
- - Set the number of channels using [setnchannels(nchannels)](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
58
- - Set the sample width using [setsampwidth(audio.get_sample_size(FORMAT))](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
59
- - Set the frame rate (sample rate) using [setframerate(RATE)](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
60
- 4. **Write Audio Data**
61
- - Concatenate all frames in the [frames](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) list into a single bytes object.
62
- - Write the concatenated bytes to the WAV file using [writeframes()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
63
- 5. **Close the File**
64
- - The `with` statement ensures the file is properly closed after writing.
65
-
66
- ###### merge_mono_files_to_stereo(mic_path: Path, sys_path: Path, out_path: Path):
67
-
68
- - Merges two mono WAV files (mic and system) into a stereo WAV file.
69
-
70
- 1. **Check for numpy Availability**
71
- - If numpy is not available, print a message and exit the function.
72
- 2. **Open Input WAV Files**
73
- - Open the microphone WAV file ([mic_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for reading.
74
- - Open the system audio WAV file ([sys_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for reading.
75
- 3. **Validate Audio Properties**
76
- - Assert that both files have the same sample rate ([RATE](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)).
77
- 4. **Read Audio Data**
78
- - Get the sample width from the mic file.
79
- - Determine the minimum number of frames available in both files.
80
- - Read that many frames from both files into [mic_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) and [sys_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
81
- 5. **Convert Bytes to Arrays**
82
- - Convert [mic_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) and [sys_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) to numpy arrays of type [int16](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
83
- 6. **Interleave Channels for Stereo**
84
- - Create an empty numpy array of size [nframes \* 2](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (for stereo).
85
- - Assign mic samples to even indices (left channel).
86
- - Assign system samples to odd indices (right channel).
87
- 7. **Write Stereo WAV File**
88
- - Open the output WAV file ([out_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for writing.
89
- - Set number of channels to 2 (stereo).
90
- - Set sample width and frame rate.
91
- - Write the interleaved stereo data to the file.
92
-
93
- #### Transcription
94
-
95
- ###### **Transcriber class** :
96
-
97
- - Loads the `faster-whisper` model if available.
98
-
99
- 1. **Initialization (`__init__` method)**
100
-
101
- - Set `self.model` to `None`.
102
- - If `faster-whisper` is available:
103
- - Print a message about loading the model.
104
- - Try to import `torch` and detect device:
105
- - If CUDA is available, set `device = "cuda"`.
106
- - Else, set `device = "cpu"`.
107
- - Set `compute_type`: `"float16"` if device is `"cuda"`, else `"float32"`.
108
- - Try to instantiate the `WhisperModel` with the selected model name, device, and compute type:
109
- - If successful, assign the model to `self.model` and print a success message.
110
- - If failed, print an error and set `self.model = None`.
111
- - Else (if `faster-whisper` not available):
112
- - Print a message that transcription is disabled.
113
- 2. **Transcription (`transcribe_file` method)**
114
-
115
- - If `self.model` is not set, return `None`.
116
- - Try to transcribe the given WAV file using `self.model.transcribe()`:
117
- - Use `beam_size=5` for decoding.
118
- - Concatenate all segment texts into a single string.
119
- - Return the transcribed text.
120
- - If an error occurs, print an error message and return `None`.
121
-
122
- #### Diarization
123
-
124
- ###### diarization_hook(audio_path: str):
125
-
126
- - Runs speaker diarization and returns a list of (start, end, speaker) tuples.
127
-
128
- 1. **Check Diarization Availability**
129
- - If `DIARIZATION_AVAILABLE` is `False`, return `None`.
130
- 2. **Run Diarization Pipeline**
131
- - Use the global `diarization_pipeline` to process the audio file at `audio_path`.
132
- - Store the result in a variable (e.g., `diarization`).
133
- 3. **Extract Speaker Segments**
134
- - Initialize an empty list called `results`.
135
- - For each segment in `diarization.itertracks(yield_label=True)`:
136
- - Extract the segment's start time, end time, and speaker label.
137
- - Append a tuple `(turn.start, turn.end, speaker)` to `results`.
138
- 4. **Return Results**
139
- - Return the `results` list containing tuples of (start, end, speaker) for each detected speaker segment.
140
-
141
- #### Recording Threads
142
-
143
- ###### record_loop(device_index, out_queue, label="mic"):
144
-
145
- - Continuously reads bytes from the device stream and pushes full-second frames to a queue.
146
-
147
- 1. **Open Audio Stream**
148
- - Open a PyAudio input stream for the given device index and channel count.
149
- 2. **Read Audio Data**
150
- - Continuously read audio data in chunks.
151
- - After enough frames for a chunk are collected, put them (with a timestamped filename) into a queue.
152
- - Runs in a thread for each device (mic and optionally system).
153
- 3. **Error Handling**
154
- - If repeated read errors occur, the thread will stop for that device.
155
-
156
- #### Chunk Writing & Transcription
157
-
158
- ###### chunk_writer_and_transcribe_worker(in_queue, final_frames_list, transcriber, single_channel_label)
159
-
160
- - Waits for audio chunks from the queue.
161
- - Saves each chunk as a WAV file.
162
- - Appends frames to a list for final concatenation.
163
- - If transcription is enabled, transcribes the chunk and appends the result to a transcript file.
164
- - Calls diarization on each chunk and aligns speaker segments with transcription.
165
- - Runs in a thread for each device.
166
-
167
- #### Main Recording Orchestration
168
-
169
- ###### run_recording(mic_index, sys_index=None, chunk_secs=CHUNK_DURATION_SECS, model_name=MODEL_NAME, no_transcribe=False)
170
-
171
- - Sets up and starts the recording and writer threads for mic and (optionally) system audio.
172
- - Handles stopping and joining threads on KeyboardInterrupt.
173
- - Saves the final concatenated WAV file(s).
174
- - If both mic and system were recorded, merges them into a stereo WAV.
175
- - Terminates PyAudio and prints completion message.
176
-
177
- #### CLI Wrapper (cli.py)
178
-
179
- ###### Algorithm of cli.py
180
-
181
- - Provides a command-line interface for recording, chunking, and optional transcription.
182
-
183
- 1. **Argument Parsing**
184
-
185
- - Uses `argparse` to define and parse command-line arguments:
186
- - `--mic` / `-m`: Device index for microphone (optional)
187
- - `--sys` / `-s`: Device index for system/loopback (optional)
188
- - `--chunk-secs`: Chunk length in seconds (default from config)
189
- - `--model`: Model name for transcription (default from config)
190
- - `--no-transcribe`: Disable transcription if set
191
- 2. **Device Selection**
192
-
193
- - If `--mic` is not provided:
194
- - Calls `list_input_devices()` to show available devices.
195
- - Prompts the user to enter a mic device index (or uses default if blank).
196
- - If `--sys` is not provided:
197
- - Prompts the user whether to record system audio (loopback).
198
- - If yes, prompts for system device index.
199
- 3. **Run Recording**
200
-
201
- - Calls `run_recording()` with the selected device indexes, chunk length, model, and transcription flag.
202
- 4. **Entrypoint**
203
-
204
- - If the script is run directly, calls `main()` to start the CLI workflow.
205
-
206
- ---
207
-
208
- #### Usage
209
-
210
- **Command-line (interactive):**
211
-
212
- ```sh
213
- python cli.py
214
- ```
215
-
216
- You will be prompted to select device indexes for mic and (optionally) system audio.
217
-
218
- **Command-line (with arguments):**
219
-
220
- ```sh
221
- python cli.py --mic 6 --sys 8 --chunk-secs 5 --model medium
222
- ```
223
-
224
- #### Requirements
225
-
226
- - Python 3.8+
227
- - pyaudio
228
- - numpy
229
- - faster-whisper (optional, for transcription)
230
- - pyannote.audio (optional, for diarization)
231
-
232
- Install requirements:
233
-
234
- ```sh
235
- pip install pyaudio numpy
236
- # For transcription:
237
- pip install faster-whisper
238
- # For diarization:
239
- pip install pyannote.audio
240
- ```