File size: 4,969 Bytes
b3c4dc3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# VAE Audio Evaluation

This directory contains the script and resources for evaluating the performance of models in audio reconstruction tasks. The primary script, `eval_compare_matrix.py`, computes a suite of objective metrics to compare the quality of audio generated by the model against the original ground truth audio.

## Features

- **Comprehensive Metrics**: Calculates a wide range of industry-standard and research-grade metrics:
  - **Time-Domain**: Scale-Invariant Signal-to-Distortion Ratio (SI-SDR).
  - **Frequency-Domain**: Multi-Resolution STFT Loss and Multi-Resolution Mel-Spectrogram Loss.
  - **Phase**: Multi-Resolution Phase Coherence (both per-channel and inter-channel for stereo).
  - **Loudness**: Integrated Loudness (LUFS-I), Loudness Range (LRA), and True Peak, analyzed using `ffmpeg`.
- **Batch Processing**: Automatically discovers and processes multiple model output directories.
- **File Matching**: Intelligently pairs reconstructed audio files (e.g., `*_vae_rec.wav`) with their corresponding ground truth files (e.g., `*.wav`).
- **Robust & Resilient**: Handles missing files, audio processing errors, and varying sample rates gracefully.
- **Organized Output**: Saves aggregated results in both machine-readable (`.json`) and human-readable (`.txt`) formats for each model.
- **Command-Line Interface**: Easy-to-use CLI for specifying the input directory and other options.

## Prerequisites

### 1. Python Environment
Ensure you have a Python environment (3.8 or newer recommended) with the required packages installed. You can install them using pip:
```bash
pip install torch torchaudio auraloss numpy
```

### 2. FFmpeg
The script relies on `ffmpeg` for loudness analysis. You must have `ffmpeg` installed and accessible in your system's PATH.

**On Ubuntu/Debian:**
```bash
sudo apt update && sudo apt install ffmpeg
```

**On macOS (using Homebrew):**
```bash
brew install ffmpeg
```

**On Windows:**
Download the executable from the [official FFmpeg website](https://ffmpeg.org/download.html) and add its `bin` directory to your system's PATH environment variable.

You can verify the installation by running:
```bash
ffmpeg -version
```

**Also On Conda ENv:**
```bash
conda install -c conda-forge 'ffmpeg<7'
```

## Directory Structure

The script expects a specific directory structure for the evaluation data. The root input directory should contain subdirectories, where each subdirectory represents a different model or experiment to be evaluated.

Inside each model's subdirectory, you should place the pairs of ground truth and reconstructed audio files. The script identifies pairs based on a naming convention:
- **Ground Truth**: `your_audio_file.wav`
- **Reconstructed**: `your_audio_file_vae_rec.wav`

Here is an example structure:
```
/path/to/your/evaluation_data/
β”œβ”€β”€ model_A/
β”‚   β”œβ”€β”€ song1.wav           # Ground Truth 1
β”‚   β”œβ”€β”€ song1_vae_rec.wav   # Reconstructed 1
β”‚   β”œβ”€β”€ song2.wav           # Ground Truth 2
β”‚   β”œβ”€β”€ song2_vae_rec.wav   # Reconstructed 2
β”‚   └── ...
β”œβ”€β”€ model_B/
β”‚   β”œβ”€β”€ trackA.wav
β”‚   β”œβ”€β”€ trackA_vae_rec.wav
β”‚   └── ...
└── ...
```

## Usage

Run the evaluation script from the command line, pointing it to the root directory containing your model outputs.

```bash
python eval_compare_matrix.py --input_dir /path/to/your/evaluation_data/
```

### Command-Line Arguments

- `--input_dir` (required): The path to the root directory containing the model folders (e.g., `/path/to/your/evaluation_data/`).
- `--force` (optional): If specified, the script will re-run the evaluation for all models, even if results files (`evaluation_results.json`) already exist. By default, it skips models that have already been evaluated.
- `--echo` (optional): If specified, the script will print the detailed evaluation metrics for each individual audio pair during processing. By default, only the progress bar and final summary are shown.

### Example
```bash
python eval/eval_compare_matrix.py --input_dir ./results/
```

## Output

After running, the script will generate two files inside each model's directory:

1.  **`evaluation_results.json`**: A JSON file containing the aggregated average of all computed metrics. This is ideal for programmatic analysis.
    ```json
    {
        "model_name": "model_A",
        "file_count": 50,
        "avg_sisdr": 15.78,
        "avg_mel_distance": 0.45,
        "avg_stft_distance": 0.89,
        "avg_per_channel_coherence": 0.95,
        "avg_interchannel_coherence": 0.92,
        "avg_gen_lufs-i": -14.2,
        "avg_gt_lufs-i": -14.0,
        ...
    }
    ```

2.  **`evaluation_summary.txt`**: A human-readable text file summarizing the results.
    ```
    model_name: model_A
    file_count: 50
    avg_sisdr: 15.78...
    avg_mel_distance: 0.45...
    ...
    ```
This allows for quick inspection of a model's performance without needing to parse the JSON.