Spaces:

prthm11
/

AudioTransDiar

Sleeping

App Files Files Community

prthm11 commited on Sep 2, 2025

Commit

105dda6

verified ·

1 Parent(s): a0b8bc0

Update README.md

Browse files

Files changed (1) hide show

README.md +238 -10

README.md CHANGED Viewed

@@ -1,12 +1,240 @@
 ---
-title: AudioTransDiar
-emoji: 📚
-colorFrom: pink
-colorTo: red
-sdk: docker
-pinned: false
-license: apache-2.0
-short_description: Real Time Transcription with Speaker Diarization
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+#### 1. Initialization & configuration
+FORMAT, CHANNELS, RATE, CHUNK, CHUNK_DURATION_SECS, OUTPUT_DIR, CHUNKS_DIR, FINAL_WAV, TRANSCRIPT_FILE, MODEL_NAME
+#### Device Listing
+###### list_input_devices():
+- Lists all available audio input devices (microphones, loopbacks, etc.) with their indices and channel counts.
+1. **Create a PyAudio Instance**
+   - Initialize a new `PyAudio` object to interact with the audio hardware.
+2. **Print Header**
+   - Print a message indicating that available audio input devices will be listed.
+3. **Iterate Over All Devices**
+   - For each device index from `0` to `get_device_count() - 1`:
+     - Retrieve device information using `get_device_info_by_index(i)`.
+4. **Filter Input Devices**
+   - For each device, check if `"maxInputChannels"` is greater than `0` (i.e., it can record audio).
+5. **Print Device Info**
+   - If the device is an input device, print its index, name, and number of input channels.
+6. **Terminate PyAudio**
+   - After listing, terminate the `PyAudio` instance to free resources.
+#### Audio Stream Handling
+###### open_stream_for_device(device_index, channels):
+- Opens a PyAudio input stream for the given device index and channel count.
+1. **Input Parameters**
+   - [device_index](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): The index of the audio input device to use (e.g., microphone or system audio).
+   - [channels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): Number of audio channels to record (default is 1, i.e., mono).
+2. **Open Audio Stream**
+   - Use the global [audio](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) object (an instance of [pyaudio.PyAudio()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)).
+   - Call [audio.open()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) with the following parameters:
+     - [format=FORMAT](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (audio sample format, e.g., 16-bit int)
+     - [channels=channels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (number of channels)
+     - [rate=RATE](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (sample rate, e.g., 44100 Hz)
+     - [input=True](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (open for input/recording)
+     - [frames_per_buffer=CHUNK](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (buffer size per read)
+     - [input_device_index=device_index](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (which device to use)
+3. **Return Stream**
+   - Return the opened stream object to the caller.
+#### Audio file Operations
+###### save_wav_from_frames(path: Path, frames: list, nchannels=1):
+1. **Input Parameters**
+   - [path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): The file path where the WAV file will be saved.
+   - [frames](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): A list of audio frames (byte strings) to write.
+   - [nchannels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): Number of audio channels (default is 1).
+2. **Open WAV File for Writing**
+   - Use the [wave.open()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) function to open the file at [path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) in write-binary (`'wb'`) mode.
+3. **Set WAV File Parameters**
+   - Set the number of channels using [setnchannels(nchannels)](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
+   - Set the sample width using [setsampwidth(audio.get_sample_size(FORMAT))](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
+   - Set the frame rate (sample rate) using [setframerate(RATE)](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
+4. **Write Audio Data**
+   - Concatenate all frames in the [frames](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) list into a single bytes object.
+   - Write the concatenated bytes to the WAV file using [writeframes()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
+5. **Close the File**
+   - The `with` statement ensures the file is properly closed after writing.
+###### merge_mono_files_to_stereo(mic_path: Path, sys_path: Path, out_path: Path):
+- Merges two mono WAV files (mic and system) into a stereo WAV file.
+1. **Check for numpy Availability**
+   - If numpy is not available, print a message and exit the function.
+2. **Open Input WAV Files**
+   - Open the microphone WAV file ([mic_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for reading.
+   - Open the system audio WAV file ([sys_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for reading.
+3. **Validate Audio Properties**
+   - Assert that both files have the same sample rate ([RATE](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)).
+4. **Read Audio Data**
+   - Get the sample width from the mic file.
+   - Determine the minimum number of frames available in both files.
+   - Read that many frames from both files into [mic_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) and [sys_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
+5. **Convert Bytes to Arrays**
+   - Convert [mic_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) and [sys_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) to numpy arrays of type [int16](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
+6. **Interleave Channels for Stereo**
+   - Create an empty numpy array of size [nframes \* 2](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (for stereo).
+   - Assign mic samples to even indices (left channel).
+   - Assign system samples to odd indices (right channel).
+7. **Write Stereo WAV File**
+   - Open the output WAV file ([out_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for writing.
+   - Set number of channels to 2 (stereo).
+   - Set sample width and frame rate.
+   - Write the interleaved stereo data to the file.
+#### Transcription
+###### **Transcriber class** :
+- Loads the `faster-whisper` model if available.
+1. **Initialization (`__init__` method)**
+   - Set `self.model` to `None`.
+   - If `faster-whisper` is available:
+     - Print a message about loading the model.
+     - Try to import `torch` and detect device:
+       - If CUDA is available, set `device = "cuda"`.
+       - Else, set `device = "cpu"`.
+     - Set `compute_type`: `"float16"` if device is `"cuda"`, else `"float32"`.
+     - Try to instantiate the `WhisperModel` with the selected model name, device, and compute type:
+       - If successful, assign the model to `self.model` and print a success message.
+       - If failed, print an error and set `self.model = None`.
+   - Else (if `faster-whisper` not available):
+     - Print a message that transcription is disabled.
+2. **Transcription (`transcribe_file` method)**
+   - If `self.model` is not set, return `None`.
+   - Try to transcribe the given WAV file using `self.model.transcribe()`:
+     - Use `beam_size=5` for decoding.
+     - Concatenate all segment texts into a single string.
+     - Return the transcribed text.
+   - If an error occurs, print an error message and return `None`.
+#### Diarization
+###### diarization_hook(audio_path: str):
+- Runs speaker diarization and returns a list of (start, end, speaker) tuples.
+1. **Check Diarization Availability**
+   - If `DIARIZATION_AVAILABLE` is `False`, return `None`.
+2. **Run Diarization Pipeline**
+   - Use the global `diarization_pipeline` to process the audio file at `audio_path`.
+   - Store the result in a variable (e.g., `diarization`).
+3. **Extract Speaker Segments**
+   - Initialize an empty list called `results`.
+   - For each segment in `diarization.itertracks(yield_label=True)`:
+     - Extract the segment's start time, end time, and speaker label.
+     - Append a tuple `(turn.start, turn.end, speaker)` to `results`.
+4. **Return Results**
+   - Return the `results` list containing tuples of (start, end, speaker) for each detected speaker segment.
+#### Recording Threads
+###### record_loop(device_index, out_queue, label="mic"):
+- Continuously reads bytes from the device stream and pushes full-second frames to a queue.
+1. **Open Audio Stream**
+   - Open a PyAudio input stream for the given device index and channel count.
+2. **Read Audio Data**
+   - Continuously read audio data in chunks.
+   - After enough frames for a chunk are collected, put them (with a timestamped filename) into a queue.
+   - Runs in a thread for each device (mic and optionally system).
+3. **Error Handling**
+   - If repeated read errors occur, the thread will stop for that device.
+#### Chunk Writing & Transcription
+###### chunk_writer_and_transcribe_worker(in_queue, final_frames_list, transcriber, single_channel_label)
+- Waits for audio chunks from the queue.
+- Saves each chunk as a WAV file.
+- Appends frames to a list for final concatenation.
+- If transcription is enabled, transcribes the chunk and appends the result to a transcript file.
+- Calls diarization on each chunk and aligns speaker segments with transcription.
+- Runs in a thread for each device.
+#### Main Recording Orchestration
+###### run_recording(mic_index, sys_index=None, chunk_secs=CHUNK_DURATION_SECS, model_name=MODEL_NAME, no_transcribe=False)
+- Sets up and starts the recording and writer threads for mic and (optionally) system audio.
+- Handles stopping and joining threads on KeyboardInterrupt.
+- Saves the final concatenated WAV file(s).
+- If both mic and system were recorded, merges them into a stereo WAV.
+- Terminates PyAudio and prints completion message.
+#### CLI Wrapper (cli.py)
+###### Algorithm of cli.py
+- Provides a command-line interface for recording, chunking, and optional transcription.
+1. **Argument Parsing**
+   - Uses `argparse` to define and parse command-line arguments:
+     - `--mic` / `-m`: Device index for microphone (optional)
+     - `--sys` / `-s`: Device index for system/loopback (optional)
+     - `--chunk-secs`: Chunk length in seconds (default from config)
+     - `--model`: Model name for transcription (default from config)
+     - `--no-transcribe`: Disable transcription if set
+2. **Device Selection**
+   - If `--mic` is not provided:
+     - Calls `list_input_devices()` to show available devices.
+     - Prompts the user to enter a mic device index (or uses default if blank).
+   - If `--sys` is not provided:
+     - Prompts the user whether to record system audio (loopback).
+     - If yes, prompts for system device index.
+3. **Run Recording**
+   - Calls `run_recording()` with the selected device indexes, chunk length, model, and transcription flag.
+4. **Entrypoint**
+   - If the script is run directly, calls `main()` to start the CLI workflow.
 ---
+#### Usage
+**Command-line (interactive):**
+```sh
+python cli.py
+```
+You will be prompted to select device indexes for mic and (optionally) system audio.
+**Command-line (with arguments):**
+```sh
+python cli.py --mic 6 --sys 8 --chunk-secs 5 --model medium
+```
+#### Requirements
+- Python 3.8+
+- pyaudio
+- numpy
+- faster-whisper (optional, for transcription)
+- pyannote.audio (optional, for diarization)
+Install requirements:
+```sh
+pip install pyaudio numpy
+# For transcription:
+pip install faster-whisper
+# For diarization:
+pip install pyannote.audio
+```