Spaces:

prthm11
/

AudioTransDiar

Sleeping

App Files Files Community

prthm11 commited on Sep 2, 2025

Commit

3ce7d8c

verified ·

1 Parent(s): 4207399

Delete README.md

Browse files

Files changed (1) hide show

README.md +0 -240

README.md DELETED Viewed

@@ -1,240 +0,0 @@
-#### 1. Initialization & configuration
-FORMAT, CHANNELS, RATE, CHUNK, CHUNK_DURATION_SECS, OUTPUT_DIR, CHUNKS_DIR, FINAL_WAV, TRANSCRIPT_FILE, MODEL_NAME
-#### Device Listing
-###### list_input_devices():
-- Lists all available audio input devices (microphones, loopbacks, etc.) with their indices and channel counts.
-1. **Create a PyAudio Instance**
-   - Initialize a new `PyAudio` object to interact with the audio hardware.
-2. **Print Header**
-   - Print a message indicating that available audio input devices will be listed.
-3. **Iterate Over All Devices**
-   - For each device index from `0` to `get_device_count() - 1`:
-     - Retrieve device information using `get_device_info_by_index(i)`.
-4. **Filter Input Devices**
-   - For each device, check if `"maxInputChannels"` is greater than `0` (i.e., it can record audio).
-5. **Print Device Info**
-   - If the device is an input device, print its index, name, and number of input channels.
-6. **Terminate PyAudio**
-   - After listing, terminate the `PyAudio` instance to free resources.
-#### Audio Stream Handling
-###### open_stream_for_device(device_index, channels):
-- Opens a PyAudio input stream for the given device index and channel count.
-1. **Input Parameters**
-   - [device_index](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): The index of the audio input device to use (e.g., microphone or system audio).
-   - [channels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): Number of audio channels to record (default is 1, i.e., mono).
-2. **Open Audio Stream**
-   - Use the global [audio](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) object (an instance of [pyaudio.PyAudio()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)).
-   - Call [audio.open()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) with the following parameters:
-     - [format=FORMAT](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (audio sample format, e.g., 16-bit int)
-     - [channels=channels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (number of channels)
-     - [rate=RATE](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (sample rate, e.g., 44100 Hz)
-     - [input=True](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (open for input/recording)
-     - [frames_per_buffer=CHUNK](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (buffer size per read)
-     - [input_device_index=device_index](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (which device to use)
-3. **Return Stream**
-   - Return the opened stream object to the caller.
-#### Audio file Operations
-###### save_wav_from_frames(path: Path, frames: list, nchannels=1):
-1. **Input Parameters**
-   - [path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): The file path where the WAV file will be saved.
-   - [frames](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): A list of audio frames (byte strings) to write.
-   - [nchannels](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html): Number of audio channels (default is 1).
-2. **Open WAV File for Writing**
-   - Use the [wave.open()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) function to open the file at [path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) in write-binary (`'wb'`) mode.
-3. **Set WAV File Parameters**
-   - Set the number of channels using [setnchannels(nchannels)](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
-   - Set the sample width using [setsampwidth(audio.get_sample_size(FORMAT))](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
-   - Set the frame rate (sample rate) using [setframerate(RATE)](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
-4. **Write Audio Data**
-   - Concatenate all frames in the [frames](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) list into a single bytes object.
-   - Write the concatenated bytes to the WAV file using [writeframes()](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
-5. **Close the File**
-   - The `with` statement ensures the file is properly closed after writing.
-###### merge_mono_files_to_stereo(mic_path: Path, sys_path: Path, out_path: Path):
-- Merges two mono WAV files (mic and system) into a stereo WAV file.
-1. **Check for numpy Availability**
-   - If numpy is not available, print a message and exit the function.
-2. **Open Input WAV Files**
-   - Open the microphone WAV file ([mic_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for reading.
-   - Open the system audio WAV file ([sys_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for reading.
-3. **Validate Audio Properties**
-   - Assert that both files have the same sample rate ([RATE](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)).
-4. **Read Audio Data**
-   - Get the sample width from the mic file.
-   - Determine the minimum number of frames available in both files.
-   - Read that many frames from both files into [mic_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) and [sys_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
-5. **Convert Bytes to Arrays**
-   - Convert [mic_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) and [sys_bytes](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) to numpy arrays of type [int16](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html).
-6. **Interleave Channels for Stereo**
-   - Create an empty numpy array of size [nframes \* 2](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) (for stereo).
-   - Assign mic samples to even indices (left channel).
-   - Assign system samples to odd indices (right channel).
-7. **Write Stereo WAV File**
-   - Open the output WAV file ([out_path](vscode-file://vscode-app/c:/Users/as/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) for writing.
-   - Set number of channels to 2 (stereo).
-   - Set sample width and frame rate.
-   - Write the interleaved stereo data to the file.
-#### Transcription
-###### **Transcriber class** :
-- Loads the `faster-whisper` model if available.
-1. **Initialization (`__init__` method)**
-   - Set `self.model` to `None`.
-   - If `faster-whisper` is available:
-     - Print a message about loading the model.
-     - Try to import `torch` and detect device:
-       - If CUDA is available, set `device = "cuda"`.
-       - Else, set `device = "cpu"`.
-     - Set `compute_type`: `"float16"` if device is `"cuda"`, else `"float32"`.
-     - Try to instantiate the `WhisperModel` with the selected model name, device, and compute type:
-       - If successful, assign the model to `self.model` and print a success message.
-       - If failed, print an error and set `self.model = None`.
-   - Else (if `faster-whisper` not available):
-     - Print a message that transcription is disabled.
-2. **Transcription (`transcribe_file` method)**
-   - If `self.model` is not set, return `None`.
-   - Try to transcribe the given WAV file using `self.model.transcribe()`:
-     - Use `beam_size=5` for decoding.
-     - Concatenate all segment texts into a single string.
-     - Return the transcribed text.
-   - If an error occurs, print an error message and return `None`.
-#### Diarization
-###### diarization_hook(audio_path: str):
-- Runs speaker diarization and returns a list of (start, end, speaker) tuples.
-1. **Check Diarization Availability**
-   - If `DIARIZATION_AVAILABLE` is `False`, return `None`.
-2. **Run Diarization Pipeline**
-   - Use the global `diarization_pipeline` to process the audio file at `audio_path`.
-   - Store the result in a variable (e.g., `diarization`).
-3. **Extract Speaker Segments**
-   - Initialize an empty list called `results`.
-   - For each segment in `diarization.itertracks(yield_label=True)`:
-     - Extract the segment's start time, end time, and speaker label.
-     - Append a tuple `(turn.start, turn.end, speaker)` to `results`.
-4. **Return Results**
-   - Return the `results` list containing tuples of (start, end, speaker) for each detected speaker segment.
-#### Recording Threads
-###### record_loop(device_index, out_queue, label="mic"):
-- Continuously reads bytes from the device stream and pushes full-second frames to a queue.
-1. **Open Audio Stream**
-   - Open a PyAudio input stream for the given device index and channel count.
-2. **Read Audio Data**
-   - Continuously read audio data in chunks.
-   - After enough frames for a chunk are collected, put them (with a timestamped filename) into a queue.
-   - Runs in a thread for each device (mic and optionally system).
-3. **Error Handling**
-   - If repeated read errors occur, the thread will stop for that device.
-#### Chunk Writing & Transcription
-###### chunk_writer_and_transcribe_worker(in_queue, final_frames_list, transcriber, single_channel_label)
-- Waits for audio chunks from the queue.
-- Saves each chunk as a WAV file.
-- Appends frames to a list for final concatenation.
-- If transcription is enabled, transcribes the chunk and appends the result to a transcript file.
-- Calls diarization on each chunk and aligns speaker segments with transcription.
-- Runs in a thread for each device.
-#### Main Recording Orchestration
-###### run_recording(mic_index, sys_index=None, chunk_secs=CHUNK_DURATION_SECS, model_name=MODEL_NAME, no_transcribe=False)
-- Sets up and starts the recording and writer threads for mic and (optionally) system audio.
-- Handles stopping and joining threads on KeyboardInterrupt.
-- Saves the final concatenated WAV file(s).
-- If both mic and system were recorded, merges them into a stereo WAV.
-- Terminates PyAudio and prints completion message.
-#### CLI Wrapper (cli.py)
-###### Algorithm of cli.py
-- Provides a command-line interface for recording, chunking, and optional transcription.
-1. **Argument Parsing**
-   - Uses `argparse` to define and parse command-line arguments:
-     - `--mic` / `-m`: Device index for microphone (optional)
-     - `--sys` / `-s`: Device index for system/loopback (optional)
-     - `--chunk-secs`: Chunk length in seconds (default from config)
-     - `--model`: Model name for transcription (default from config)
-     - `--no-transcribe`: Disable transcription if set
-2. **Device Selection**
-   - If `--mic` is not provided:
-     - Calls `list_input_devices()` to show available devices.
-     - Prompts the user to enter a mic device index (or uses default if blank).
-   - If `--sys` is not provided:
-     - Prompts the user whether to record system audio (loopback).
-     - If yes, prompts for system device index.
-3. **Run Recording**
-   - Calls `run_recording()` with the selected device indexes, chunk length, model, and transcription flag.
-4. **Entrypoint**
-   - If the script is run directly, calls `main()` to start the CLI workflow.
----
-#### Usage
-**Command-line (interactive):**
-```sh
-python cli.py
-```
-You will be prompted to select device indexes for mic and (optionally) system audio.
-**Command-line (with arguments):**
-```sh
-python cli.py --mic 6 --sys 8 --chunk-secs 5 --model medium
-```
-#### Requirements
-- Python 3.8+
-- pyaudio
-- numpy
-- faster-whisper (optional, for transcription)
-- pyannote.audio (optional, for diarization)
-Install requirements:
-```sh
-pip install pyaudio numpy
-# For transcription:
-pip install faster-whisper
-# For diarization:
-pip install pyannote.audio
-```