Spaces:

ray-006
/

Sample-Audio

Running on Zero

App Files Files Community

Sample-Audio / eval /README.md

ray-006

Upload 43 files

fc605f9 verified 2 days ago

preview code

raw

history blame contribute delete

2.49 kB

	# Evaluation

	This directory contains the evaluation code to reproduce the results from the SAM-Audio paper. The evaluation framework supports multiple datasets, prompting modes (text-only, span, visual), and metrics.

	## Setup

	Before running evaluation, ensure you have:

	1. Installed the SAM-Audio package and its dependencies
	2. Authenticated with Hugging Face to access the model checkpoints (see main [README](../README.md))

	## Quick Start

	Run evaluation on the default setting (instr-pro):

	```bash
	python main.py
	```

	You can also use multiple GPUs to speed up evaluation:

	```bash
	torchrun --nproc_per_node=<ngpus> python main.py
	```

	Evaluate on a specific setting:

	```bash
	python main.py --setting sfx
	```

	Evaluate on multiple settings:

	```bash
	python main.py --setting sfx speech music
	```

	## Available Evaluation Settings

	Run `python main.py --help` to see all available settings

	## Command Line Options

	```bash
	python main.py [OPTIONS]
	```

	### Options:

	- `-s, --setting` - Which setting(s) to evaluate (default: `instr-pro`)
	- Choices: See available settings above
	- Can specify multiple settings: `--setting sfx speech music`

	- `--cache-path` - Where to cache downloaded datasets (default: `~/.cache/sam_audio`)

	- `-p, --checkpoint-path` - Model checkpoint to evaluate (default: `facebook/sam-audio-1b`)
	- Can use local path or Hugging Face model ID

	- `-b, --batch-size` - Batch size for evaluation (default: `1`)

	- `-w, --num-workers` - Number of data loading workers (default: `4`)

	- `-c, --candidates` - Number of reranking candidates (default: `8`)

	## Evaluation Metrics

	The evaluation framework computes the following metrics:

	- Judge - SAM Audio Judge quality assessment metric
	- Aesthetic - Aesthetic quality metric
	- CLAP - Audio-text alignment metric (CLAP similarity)
	- ImageBind - Audio-video alignment metric (for visual settings only)

	## Output

	Results are saved to the `results/` directory as JSON files, one per setting:

	```
	results/
	├── sfx.json
	├── speech.json
	└── music.json
	```

	Each JSON file contains the averaged metric scores across all samples in that setting.

	Example output:
	```json
	{
	"JudgeOverall": "4.386",
	"JudgeFaithfulness": "4.708",
	"JudgeRecall": "4.934",
	"JudgePrecision": "4.451",
	"ContentEnjoyment": "5.296",
	"ContentUsefulness": "6.903",
	"ProductionComplexity": "4.301",
	"ProductionQuality": "7.100",
	"CLAPSimilarity": "0.271"
	}
	```