Spaces:
Sleeping
Sleeping
Delete README.md
Browse files
README.md
DELETED
|
@@ -1,240 +0,0 @@
|
|
| 1 |
-
#### 1. Initialization & configuration
|
| 2 |
-
|
| 3 |
-
FORMAT, CHANNELS, RATE, CHUNK, CHUNK_DURATION_SECS, OUTPUT_DIR, CHUNKS_DIR, FINAL_WAV, TRANSCRIPT_FILE, MODEL_NAME
|
| 4 |
-
|
| 5 |
-
#### Device Listing
|
| 6 |
-
|
| 7 |
-
###### list_input_devices():
|
| 8 |
-
|
| 9 |
-
- Lists all available audio input devices (microphones, loopbacks, etc.) with their indices and channel counts.
|
| 10 |
-
|
| 11 |
-
1. **Create a PyAudio Instance**
|
| 12 |
-
- Initialize a new `PyAudio` object to interact with the audio hardware.
|
| 13 |
-
2. **Print Header**
|
| 14 |
-
- Print a message indicating that available audio input devices will be listed.
|
| 15 |
-
3. **Iterate Over All Devices**
|
| 16 |
-
- For each device index from `0` to `get_device_count() - 1`:
|
| 17 |
-
- Retrieve device information using `get_device_info_by_index(i)`.
|
| 18 |
-
4. **Filter Input Devices**
|
| 19 |
-
- For each device, check if `"maxInputChannels"` is greater than `0` (i.e., it can record audio).
|
| 20 |
-
5. **Print Device Info**
|
| 21 |
-
- If the device is an input device, print its index, name, and number of input channels.
|
| 22 |
-
6. **Terminate PyAudio**
|
| 23 |
-
- After listing, terminate the `PyAudio` instance to free resources.
|
| 24 |
-
|
| 25 |
-
#### Audio Stream Handling
|
| 26 |
-
|
| 27 |
-
###### open_stream_for_device(device_index, channels):
|
| 28 |
-
|
| 29 |
-
- Opens a PyAudio input stream for the given device index and channel count.
|
| 30 |
-
|
| 31 |
-
1. **Input Parameters**
|
| 32 |
-
- [device_index](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): The index of the audio input device to use (e.g., microphone or system audio).
|
| 33 |
-
- [channels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): Number of audio channels to record (default is 1, i.e., mono).
|
| 34 |
-
2. **Open Audio Stream**
|
| 35 |
-
- Use the global [audio](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) object (an instance of [pyaudio.PyAudio()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)).
|
| 36 |
-
- Call [audio.open()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) with the following parameters:
|
| 37 |
-
- [format=FORMAT](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (audio sample format, e.g., 16-bit int)
|
| 38 |
-
- [channels=channels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (number of channels)
|
| 39 |
-
- [rate=RATE](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (sample rate, e.g., 44100 Hz)
|
| 40 |
-
- [input=True](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (open for input/recording)
|
| 41 |
-
- [frames_per_buffer=CHUNK](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (buffer size per read)
|
| 42 |
-
- [input_device_index=device_index](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (which device to use)
|
| 43 |
-
3. **Return Stream**
|
| 44 |
-
- Return the opened stream object to the caller.
|
| 45 |
-
|
| 46 |
-
#### Audio file Operations
|
| 47 |
-
|
| 48 |
-
###### save_wav_from_frames(path: Path, frames: list, nchannels=1):
|
| 49 |
-
|
| 50 |
-
1. **Input Parameters**
|
| 51 |
-
- [path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): The file path where the WAV file will be saved.
|
| 52 |
-
- [frames](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): A list of audio frames (byte strings) to write.
|
| 53 |
-
- [nchannels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): Number of audio channels (default is 1).
|
| 54 |
-
2. **Open WAV File for Writing**
|
| 55 |
-
- Use the [wave.open()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) function to open the file at [path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) in write-binary (`'wb'`) mode.
|
| 56 |
-
3. **Set WAV File Parameters**
|
| 57 |
-
- Set the number of channels using [setnchannels(nchannels)](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
|
| 58 |
-
- Set the sample width using [setsampwidth(audio.get_sample_size(FORMAT))](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
|
| 59 |
-
- Set the frame rate (sample rate) using [setframerate(RATE)](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
|
| 60 |
-
4. **Write Audio Data**
|
| 61 |
-
- Concatenate all frames in the [frames](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) list into a single bytes object.
|
| 62 |
-
- Write the concatenated bytes to the WAV file using [writeframes()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
|
| 63 |
-
5. **Close the File**
|
| 64 |
-
- The `with` statement ensures the file is properly closed after writing.
|
| 65 |
-
|
| 66 |
-
###### merge_mono_files_to_stereo(mic_path: Path, sys_path: Path, out_path: Path):
|
| 67 |
-
|
| 68 |
-
- Merges two mono WAV files (mic and system) into a stereo WAV file.
|
| 69 |
-
|
| 70 |
-
1. **Check for numpy Availability**
|
| 71 |
-
- If numpy is not available, print a message and exit the function.
|
| 72 |
-
2. **Open Input WAV Files**
|
| 73 |
-
- Open the microphone WAV file ([mic_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for reading.
|
| 74 |
-
- Open the system audio WAV file ([sys_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for reading.
|
| 75 |
-
3. **Validate Audio Properties**
|
| 76 |
-
- Assert that both files have the same sample rate ([RATE](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)).
|
| 77 |
-
4. **Read Audio Data**
|
| 78 |
-
- Get the sample width from the mic file.
|
| 79 |
-
- Determine the minimum number of frames available in both files.
|
| 80 |
-
- Read that many frames from both files into [mic_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) and [sys_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
|
| 81 |
-
5. **Convert Bytes to Arrays**
|
| 82 |
-
- Convert [mic_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) and [sys_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) to numpy arrays of type [int16](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
|
| 83 |
-
6. **Interleave Channels for Stereo**
|
| 84 |
-
- Create an empty numpy array of size [nframes \* 2](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (for stereo).
|
| 85 |
-
- Assign mic samples to even indices (left channel).
|
| 86 |
-
- Assign system samples to odd indices (right channel).
|
| 87 |
-
7. **Write Stereo WAV File**
|
| 88 |
-
- Open the output WAV file ([out_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for writing.
|
| 89 |
-
- Set number of channels to 2 (stereo).
|
| 90 |
-
- Set sample width and frame rate.
|
| 91 |
-
- Write the interleaved stereo data to the file.
|
| 92 |
-
|
| 93 |
-
#### Transcription
|
| 94 |
-
|
| 95 |
-
###### **Transcriber class** :
|
| 96 |
-
|
| 97 |
-
- Loads the `faster-whisper` model if available.
|
| 98 |
-
|
| 99 |
-
1. **Initialization (`__init__` method)**
|
| 100 |
-
|
| 101 |
-
- Set `self.model` to `None`.
|
| 102 |
-
- If `faster-whisper` is available:
|
| 103 |
-
- Print a message about loading the model.
|
| 104 |
-
- Try to import `torch` and detect device:
|
| 105 |
-
- If CUDA is available, set `device = "cuda"`.
|
| 106 |
-
- Else, set `device = "cpu"`.
|
| 107 |
-
- Set `compute_type`: `"float16"` if device is `"cuda"`, else `"float32"`.
|
| 108 |
-
- Try to instantiate the `WhisperModel` with the selected model name, device, and compute type:
|
| 109 |
-
- If successful, assign the model to `self.model` and print a success message.
|
| 110 |
-
- If failed, print an error and set `self.model = None`.
|
| 111 |
-
- Else (if `faster-whisper` not available):
|
| 112 |
-
- Print a message that transcription is disabled.
|
| 113 |
-
2. **Transcription (`transcribe_file` method)**
|
| 114 |
-
|
| 115 |
-
- If `self.model` is not set, return `None`.
|
| 116 |
-
- Try to transcribe the given WAV file using `self.model.transcribe()`:
|
| 117 |
-
- Use `beam_size=5` for decoding.
|
| 118 |
-
- Concatenate all segment texts into a single string.
|
| 119 |
-
- Return the transcribed text.
|
| 120 |
-
- If an error occurs, print an error message and return `None`.
|
| 121 |
-
|
| 122 |
-
#### Diarization
|
| 123 |
-
|
| 124 |
-
###### diarization_hook(audio_path: str):
|
| 125 |
-
|
| 126 |
-
- Runs speaker diarization and returns a list of (start, end, speaker) tuples.
|
| 127 |
-
|
| 128 |
-
1. **Check Diarization Availability**
|
| 129 |
-
- If `DIARIZATION_AVAILABLE` is `False`, return `None`.
|
| 130 |
-
2. **Run Diarization Pipeline**
|
| 131 |
-
- Use the global `diarization_pipeline` to process the audio file at `audio_path`.
|
| 132 |
-
- Store the result in a variable (e.g., `diarization`).
|
| 133 |
-
3. **Extract Speaker Segments**
|
| 134 |
-
- Initialize an empty list called `results`.
|
| 135 |
-
- For each segment in `diarization.itertracks(yield_label=True)`:
|
| 136 |
-
- Extract the segment's start time, end time, and speaker label.
|
| 137 |
-
- Append a tuple `(turn.start, turn.end, speaker)` to `results`.
|
| 138 |
-
4. **Return Results**
|
| 139 |
-
- Return the `results` list containing tuples of (start, end, speaker) for each detected speaker segment.
|
| 140 |
-
|
| 141 |
-
#### Recording Threads
|
| 142 |
-
|
| 143 |
-
###### record_loop(device_index, out_queue, label="mic"):
|
| 144 |
-
|
| 145 |
-
- Continuously reads bytes from the device stream and pushes full-second frames to a queue.
|
| 146 |
-
|
| 147 |
-
1. **Open Audio Stream**
|
| 148 |
-
- Open a PyAudio input stream for the given device index and channel count.
|
| 149 |
-
2. **Read Audio Data**
|
| 150 |
-
- Continuously read audio data in chunks.
|
| 151 |
-
- After enough frames for a chunk are collected, put them (with a timestamped filename) into a queue.
|
| 152 |
-
- Runs in a thread for each device (mic and optionally system).
|
| 153 |
-
3. **Error Handling**
|
| 154 |
-
- If repeated read errors occur, the thread will stop for that device.
|
| 155 |
-
|
| 156 |
-
#### Chunk Writing & Transcription
|
| 157 |
-
|
| 158 |
-
###### chunk_writer_and_transcribe_worker(in_queue, final_frames_list, transcriber, single_channel_label)
|
| 159 |
-
|
| 160 |
-
- Waits for audio chunks from the queue.
|
| 161 |
-
- Saves each chunk as a WAV file.
|
| 162 |
-
- Appends frames to a list for final concatenation.
|
| 163 |
-
- If transcription is enabled, transcribes the chunk and appends the result to a transcript file.
|
| 164 |
-
- Calls diarization on each chunk and aligns speaker segments with transcription.
|
| 165 |
-
- Runs in a thread for each device.
|
| 166 |
-
|
| 167 |
-
#### Main Recording Orchestration
|
| 168 |
-
|
| 169 |
-
###### run_recording(mic_index, sys_index=None, chunk_secs=CHUNK_DURATION_SECS, model_name=MODEL_NAME, no_transcribe=False)
|
| 170 |
-
|
| 171 |
-
- Sets up and starts the recording and writer threads for mic and (optionally) system audio.
|
| 172 |
-
- Handles stopping and joining threads on KeyboardInterrupt.
|
| 173 |
-
- Saves the final concatenated WAV file(s).
|
| 174 |
-
- If both mic and system were recorded, merges them into a stereo WAV.
|
| 175 |
-
- Terminates PyAudio and prints completion message.
|
| 176 |
-
|
| 177 |
-
#### CLI Wrapper (cli.py)
|
| 178 |
-
|
| 179 |
-
###### Algorithm of cli.py
|
| 180 |
-
|
| 181 |
-
- Provides a command-line interface for recording, chunking, and optional transcription.
|
| 182 |
-
|
| 183 |
-
1. **Argument Parsing**
|
| 184 |
-
|
| 185 |
-
- Uses `argparse` to define and parse command-line arguments:
|
| 186 |
-
- `--mic` / `-m`: Device index for microphone (optional)
|
| 187 |
-
- `--sys` / `-s`: Device index for system/loopback (optional)
|
| 188 |
-
- `--chunk-secs`: Chunk length in seconds (default from config)
|
| 189 |
-
- `--model`: Model name for transcription (default from config)
|
| 190 |
-
- `--no-transcribe`: Disable transcription if set
|
| 191 |
-
2. **Device Selection**
|
| 192 |
-
|
| 193 |
-
- If `--mic` is not provided:
|
| 194 |
-
- Calls `list_input_devices()` to show available devices.
|
| 195 |
-
- Prompts the user to enter a mic device index (or uses default if blank).
|
| 196 |
-
- If `--sys` is not provided:
|
| 197 |
-
- Prompts the user whether to record system audio (loopback).
|
| 198 |
-
- If yes, prompts for system device index.
|
| 199 |
-
3. **Run Recording**
|
| 200 |
-
|
| 201 |
-
- Calls `run_recording()` with the selected device indexes, chunk length, model, and transcription flag.
|
| 202 |
-
4. **Entrypoint**
|
| 203 |
-
|
| 204 |
-
- If the script is run directly, calls `main()` to start the CLI workflow.
|
| 205 |
-
|
| 206 |
-
---
|
| 207 |
-
|
| 208 |
-
#### Usage
|
| 209 |
-
|
| 210 |
-
**Command-line (interactive):**
|
| 211 |
-
|
| 212 |
-
```sh
|
| 213 |
-
python cli.py
|
| 214 |
-
```
|
| 215 |
-
|
| 216 |
-
You will be prompted to select device indexes for mic and (optionally) system audio.
|
| 217 |
-
|
| 218 |
-
**Command-line (with arguments):**
|
| 219 |
-
|
| 220 |
-
```sh
|
| 221 |
-
python cli.py --mic 6 --sys 8 --chunk-secs 5 --model medium
|
| 222 |
-
```
|
| 223 |
-
|
| 224 |
-
#### Requirements
|
| 225 |
-
|
| 226 |
-
- Python 3.8+
|
| 227 |
-
- pyaudio
|
| 228 |
-
- numpy
|
| 229 |
-
- faster-whisper (optional, for transcription)
|
| 230 |
-
- pyannote.audio (optional, for diarization)
|
| 231 |
-
|
| 232 |
-
Install requirements:
|
| 233 |
-
|
| 234 |
-
```sh
|
| 235 |
-
pip install pyaudio numpy
|
| 236 |
-
# For transcription:
|
| 237 |
-
pip install faster-whisper
|
| 238 |
-
# For diarization:
|
| 239 |
-
pip install pyannote.audio
|
| 240 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|