| --- |
| title: Hearing Visualized |
| emoji: 🎧 |
| colorFrom: purple |
| colorTo: pink |
| sdk: docker |
| pinned: false |
| license: mit |
| --- |
| |
| # Multi-Talker Audio Source Separation |
|
|
| Pipeline for analyzing a 4-channel hearing-aid recording and extracting per-speaker information: |
|
|
| - speaker count |
| - direction of arrival (DoA) |
| - gender estimate from F0 |
| - per-speaker transcription |
| - talker-of-interest (ToI) selection |
|
|
| ## Quick Start |
|
|
| ```bash |
| # install dependencies |
| uv sync |
| |
| # run default approach (ica) |
| uv run python main.py data/mixture.wav --approach ica --output output/ica |
| ``` |
|
|
| ASR is optional. To enable Whisper transcription: |
|
|
| ```bash |
| uv sync --extra asr |
| ``` |
|
|
| ## CLI |
|
|
| ```bash |
| uv run python main.py <input_wav> [options] |
| ``` |
|
|
| Core options: |
|
|
| - `-a, --approach {ica,frankenstein,ica_deeplearning}` |
| - `-o, --output <dir>` |
| - `-w, --whisper-model {tiny,base,small,medium,large}` |
| - `--hf-token <token>` (only relevant for `ica_deeplearning`) |
| - `-v, --verbose` |
|
|
| Example runs: |
|
|
| ```bash |
| uv run python main.py data/mixture.wav --approach ica --output output/ica |
| uv run python main.py data/mixture.wav --approach frankenstein --output output/frankenstein |
| uv run python main.py data/mixture.wav --approach ica_deeplearning --output output/ica_dl |
| ``` |
|
|
| ## Approaches (Current Status) |
|
|
| - `ica` |
| - FastICA separation with fixed 4 sources |
| - DoA from ICA mixing matrix |
| - ToI uses weighted scoring (front/language/energy/gender) |
|
|
| - `frankenstein` |
| - FastICA separation with fixed 4 sources |
| - language-priority ToI policy (strong English bonus) |
| - Does not use DoA for final ToI decision |
|
|
| - `ica_deeplearning` |
| - pass 1: PCA + ICA (source count from variance threshold) |
| - pass 2 deep stage is currently simplified/placeholder in code |
| - useful as experimental variant, not full deep overlap resolution yet |
|
|
| ## Output |
|
|
| Each run writes to the chosen output directory: |
|
|
| ```text |
| <output_dir>/ |
| source_1.wav |
| source_2.wav |
| source_3.wav |
| source_4.wav (or fewer/more for ica_deeplearning) |
| output.wav |
| results.json |
| ``` |
|
|
| `results.json` contains per-source metadata such as direction, energy, gender, language, transcript, selection score, and ToI marker. |
|
|
| ## Microphone Geometry |
|
|
| Channel mapping expected by the pipeline: |
|
|
| - channel 0: Left Front (LF) |
| - channel 1: Left Rear (LR) |
| - channel 2: Right Front (RF) |
| - channel 3: Right Rear (RR) |
|
|
| Orientation convention: |
|
|
| - `0°` front |
| - `90°` right |
| - `180°` rear |
| - `270°` left |
|
|
| ## Project Layout |
|
|
| ```text |
| . |
| main.py |
| approaches/ |
| pipeline_modules/ |
| scripts/ |
| tests/ |
| archive_solution/ |
| data/ |
| output/ |
| docs/ |
| ``` |
|
|
| - `main.py`: canonical entrypoint |
| - `approaches/`: pipeline variants |
| - `pipeline_modules/`: shared logic (audio loading, DoA, gender, ASR, ToI) |
| - `scripts/`: utility/analysis scripts not required for the main run |
| - `tests/`: lightweight validation scripts |
|
|
| ## Utility Commands |
|
|
| Run benchmark across all approaches: |
|
|
| ```bash |
| uv run python scripts/benchmark.py --data-dir data --output-dir benchmark_results |
| ``` |
|
|
| Run lightweight checks: |
|
|
| ```bash |
| uv run python tests/test_lazy_loading.py |
| uv run python tests/validate_outputs.py |
| uv run python tests/validate_audio.py |
| ``` |
|
|
| ## Notes and Limitations |
|
|
| - Input must be a 4-channel WAV file. |
| - If Whisper is not installed, transcription is skipped (pipeline still runs). |
| - F0-based gender can return `unknown` when voiced frames are insufficient. |
| - `ica_deeplearning` pass 2 is not fully implemented yet. |
|
|
| ## Docs |
|
|
| - `docs/pipeline-details.md` |
| - `docs/lazy-imports-fix.md` |
|
|