| # VAE Audio Evaluation | |
| This directory contains the script and resources for evaluating the performance of models in audio reconstruction tasks. The primary script, `eval_compare_matrix.py`, computes a suite of objective metrics to compare the quality of audio generated by the model against the original ground truth audio. | |
| ## Features | |
| - **Comprehensive Metrics**: Calculates a wide range of industry-standard and research-grade metrics: | |
| - **Time-Domain**: Scale-Invariant Signal-to-Distortion Ratio (SI-SDR). | |
| - **Frequency-Domain**: Multi-Resolution STFT Loss and Multi-Resolution Mel-Spectrogram Loss. | |
| - **Phase**: Multi-Resolution Phase Coherence (both per-channel and inter-channel for stereo). | |
| - **Loudness**: Integrated Loudness (LUFS-I), Loudness Range (LRA), and True Peak, analyzed using `ffmpeg`. | |
| - **Batch Processing**: Automatically discovers and processes multiple model output directories. | |
| - **File Matching**: Intelligently pairs reconstructed audio files (e.g., `*_vae_rec.wav`) with their corresponding ground truth files (e.g., `*.wav`). | |
| - **Robust & Resilient**: Handles missing files, audio processing errors, and varying sample rates gracefully. | |
| - **Organized Output**: Saves aggregated results in both machine-readable (`.json`) and human-readable (`.txt`) formats for each model. | |
| - **Command-Line Interface**: Easy-to-use CLI for specifying the input directory and other options. | |
| ## Prerequisites | |
| ### 1. Python Environment | |
| Ensure you have a Python environment (3.8 or newer recommended) with the required packages installed. You can install them using pip: | |
| ```bash | |
| pip install torch torchaudio auraloss numpy | |
| ``` | |
| ### 2. FFmpeg | |
| The script relies on `ffmpeg` for loudness analysis. You must have `ffmpeg` installed and accessible in your system's PATH. | |
| **On Ubuntu/Debian:** | |
| ```bash | |
| sudo apt update && sudo apt install ffmpeg | |
| ``` | |
| **On macOS (using Homebrew):** | |
| ```bash | |
| brew install ffmpeg | |
| ``` | |
| **On Windows:** | |
| Download the executable from the [official FFmpeg website](https://ffmpeg.org/download.html) and add its `bin` directory to your system's PATH environment variable. | |
| You can verify the installation by running: | |
| ```bash | |
| ffmpeg -version | |
| ``` | |
| **Also On Conda ENv:** | |
| ```bash | |
| conda install -c conda-forge 'ffmpeg<7' | |
| ``` | |
| ## Directory Structure | |
| The script expects a specific directory structure for the evaluation data. The root input directory should contain subdirectories, where each subdirectory represents a different model or experiment to be evaluated. | |
| Inside each model's subdirectory, you should place the pairs of ground truth and reconstructed audio files. The script identifies pairs based on a naming convention: | |
| - **Ground Truth**: `your_audio_file.wav` | |
| - **Reconstructed**: `your_audio_file_vae_rec.wav` | |
| Here is an example structure: | |
| ``` | |
| /path/to/your/evaluation_data/ | |
| βββ model_A/ | |
| β βββ song1.wav # Ground Truth 1 | |
| β βββ song1_vae_rec.wav # Reconstructed 1 | |
| β βββ song2.wav # Ground Truth 2 | |
| β βββ song2_vae_rec.wav # Reconstructed 2 | |
| β βββ ... | |
| βββ model_B/ | |
| β βββ trackA.wav | |
| β βββ trackA_vae_rec.wav | |
| β βββ ... | |
| βββ ... | |
| ``` | |
| ## Usage | |
| Run the evaluation script from the command line, pointing it to the root directory containing your model outputs. | |
| ```bash | |
| python eval_compare_matrix.py --input_dir /path/to/your/evaluation_data/ | |
| ``` | |
| ### Command-Line Arguments | |
| - `--input_dir` (required): The path to the root directory containing the model folders (e.g., `/path/to/your/evaluation_data/`). | |
| - `--force` (optional): If specified, the script will re-run the evaluation for all models, even if results files (`evaluation_results.json`) already exist. By default, it skips models that have already been evaluated. | |
| - `--echo` (optional): If specified, the script will print the detailed evaluation metrics for each individual audio pair during processing. By default, only the progress bar and final summary are shown. | |
| ### Example | |
| ```bash | |
| python eval/eval_compare_matrix.py --input_dir ./results/ | |
| ``` | |
| ## Output | |
| After running, the script will generate two files inside each model's directory: | |
| 1. **`evaluation_results.json`**: A JSON file containing the aggregated average of all computed metrics. This is ideal for programmatic analysis. | |
| ```json | |
| { | |
| "model_name": "model_A", | |
| "file_count": 50, | |
| "avg_sisdr": 15.78, | |
| "avg_mel_distance": 0.45, | |
| "avg_stft_distance": 0.89, | |
| "avg_per_channel_coherence": 0.95, | |
| "avg_interchannel_coherence": 0.92, | |
| "avg_gen_lufs-i": -14.2, | |
| "avg_gt_lufs-i": -14.0, | |
| ... | |
| } | |
| ``` | |
| 2. **`evaluation_summary.txt`**: A human-readable text file summarizing the results. | |
| ``` | |
| model_name: model_A | |
| file_count: 50 | |
| avg_sisdr: 15.78... | |
| avg_mel_distance: 0.45... | |
| ... | |
| ``` | |
| This allows for quick inspection of a model's performance without needing to parse the JSON. | |