Spaces:

AIvry
/

MAPSS-measures

Sleeping

App Files Files Community

MAPSS-measures / README.md

AIvry

Update README.md

ee3a404 verified 4 months ago

preview code

raw

history blame contribute delete

3.04 kB

	---
	title: MAPSS Multi Source Audio Perceptual Separation Scores
	emoji: 🎵
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.45.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# MAPSS: Manifold-based Assessment of Perceptual Source Separation

	Granular evaluation of speech and music source separation with the MAPSS measures:
	- Perceptual Matching (PM): Measures how closely an output perceptually aligns with its reference. Range: 0-1, higher is better.
	- Perceptual Similarity (PS): Measures how well an output is separated from its interfering references. Range: 0-1, higher is better.

	## Input Format

	Upload a ZIP file containing:
	```
	your_mixture.zip
	├── references/ # Original clean sources
	│ ├── speaker1.wav
	│ ├── speaker2.wav
	│ └── ...
	└── outputs/ # Separated outputs from your algorithm
	├── separated1.wav
	├── separated2.wav
	└── ...
	```

	### Audio Requirements
	- Format: WAV files
	- Sample rate: Any (automatically resampled to 16kHz)
	- Channels: Mono or stereo (converted to mono)
	- Number of files: Equal number of references and outputs

	## Output Format

	The tool generates a ZIP file containing:
	- `ps_scores_{model}.csv`: PS scores for each speaker/source
	- `pm_scores_{model}.csv`: PM scores for each speaker/source
	- `params.json`: Experiment parameters used
	- `manifest_canonical.json`: File mapping and processing details

	## Available Models

	\| Model \| Description \| Default Layer \| Use Case \|
	\|-------\|-------------\|---------------\|----------\|
	\| `raw` \| Raw waveform features \| N/A \| Baseline comparison \|
	\| `wavlm` \| WavLM Large \| 24 \| Best overall performance \|
	\| `wav2vec2` \| Wav2Vec2 Large \| 24 \| Strong performance \|
	\| `hubert` \| HuBERT Large \| 24 \| Good for speech \|
	\| `wavlm_base` \| WavLM Base \| 12 \| Faster, good quality \|
	\| `wav2vec2_base` \| Wav2Vec2 Base \| 12 \| Faster processing \|
	\| `hubert_base` \| HuBERT Base \| 12 \| Faster for speech \|
	\| `wav2vec2_xlsr` \| Wav2Vec2 XLSR-53 \| 24 \| Multilingual \|
	\| `ast` \| Audio Spectrogram Transformer \| 12 \| General audio \|

	## Parameters

	- Model: Select the embedding model for feature extraction
	- Layer: Which transformer layer to use (auto-selected by default)
	- Alpha: Diffusion maps parameter (0.0-1.0, default: 1.0)
	- 0.0 = No normalization
	- 1.0 = Full normalization (recommended)

	## Citation

	If you use MAPSS in your research, please cite:

	```bibtex
	@article{Ivry2025MAPSS,
	title = {MAPSS: Manifold-based Assessment of Perceptual Source Separation},
	author = {Ivry, Amir and Cornell, Samuele and Watanabe, Shinji},
	journal = {arXiv preprint arXiv:2509.09212},
	year = {2025},
	url = {https://arxiv.org/abs/2509.09212}
	}
	```

	## Limitations

	- Processing time scales with number of sources, audio length and model size

	## License

	Code: MIT License
	Paper: CC-BY-4.0

	## Support

	For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/amir-ivry/MAPSS-measures).