Spaces:
Sleeping
Sleeping
| title: MAPSS Multi Source Audio Perceptual Separation Scores | |
| emoji: π΅ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.45.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # MAPSS: Manifold-based Assessment of Perceptual Source Separation | |
| Granular evaluation of speech and music source separation with the MAPSS measures: | |
| - **Perceptual Matching (PM)**: Measures how closely an output perceptually aligns with its reference. Range: 0-1, higher is better. | |
| - **Perceptual Similarity (PS)**: Measures how well an output is separated from its interfering references. Range: 0-1, higher is better. | |
| ## Input Format | |
| Upload a ZIP file containing: | |
| ``` | |
| your_mixture.zip | |
| βββ references/ # Original clean sources | |
| β βββ speaker1.wav | |
| β βββ speaker2.wav | |
| β βββ ... | |
| βββ outputs/ # Separated outputs from your algorithm | |
| βββ separated1.wav | |
| βββ separated2.wav | |
| βββ ... | |
| ``` | |
| ### Audio Requirements | |
| - Format: WAV files | |
| - Sample rate: Any (automatically resampled to 16kHz) | |
| - Channels: Mono or stereo (converted to mono) | |
| - Number of files: Equal number of references and outputs | |
| ## Output Format | |
| The tool generates a ZIP file containing: | |
| - `ps_scores_{model}.csv`: PS scores for each speaker/source | |
| - `pm_scores_{model}.csv`: PM scores for each speaker/source | |
| - `params.json`: Experiment parameters used | |
| - `manifest_canonical.json`: File mapping and processing details | |
| ## Available Models | |
| | Model | Description | Default Layer | Use Case | | |
| |-------|-------------|---------------|----------| | |
| | `raw` | Raw waveform features | N/A | Baseline comparison | | |
| | `wavlm` | WavLM Large | 24 | Best overall performance | | |
| | `wav2vec2` | Wav2Vec2 Large | 24 | Strong performance | | |
| | `hubert` | HuBERT Large | 24 | Good for speech | | |
| | `wavlm_base` | WavLM Base | 12 | Faster, good quality | | |
| | `wav2vec2_base` | Wav2Vec2 Base | 12 | Faster processing | | |
| | `hubert_base` | HuBERT Base | 12 | Faster for speech | | |
| | `wav2vec2_xlsr` | Wav2Vec2 XLSR-53 | 24 | Multilingual | | |
| | `ast` | Audio Spectrogram Transformer | 12 | General audio | | |
| ## Parameters | |
| - **Model**: Select the embedding model for feature extraction | |
| - **Layer**: Which transformer layer to use (auto-selected by default) | |
| - **Alpha**: Diffusion maps parameter (0.0-1.0, default: 1.0) | |
| - 0.0 = No normalization | |
| - 1.0 = Full normalization (recommended) | |
| ## Citation | |
| If you use MAPSS in your research, please cite: | |
| ```bibtex | |
| @article{Ivry2025MAPSS, | |
| title = {MAPSS: Manifold-based Assessment of Perceptual Source Separation}, | |
| author = {Ivry, Amir and Cornell, Samuele and Watanabe, Shinji}, | |
| journal = {arXiv preprint arXiv:2509.09212}, | |
| year = {2025}, | |
| url = {https://arxiv.org/abs/2509.09212} | |
| } | |
| ``` | |
| ## Limitations | |
| - Processing time scales with number of sources, audio length and model size | |
| ## License | |
| Code: MIT License | |
| Paper: CC-BY-4.0 | |
| ## Support | |
| For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/amir-ivry/MAPSS-measures). |