AutoSpeech / models /ailia-models /code /README.md

AutoSpeech (code, models, paper)

7900a1d verified 2 months ago

1.86 kB

	# AutoSpeech

	## Input

	Audio file
	```
	Wav file from The VoxCeleb1 Dataset https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html

	Default input: wav/id10283/oGZsanLiXsY/00004.wav
	```

	Please download the test data set (https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip) to check various data.

	## Output

	- Identification mode
	Top 5 label.
	```
	Top5: id10283, id11084, id10200, id11064, id10404
	```

	- Verification mode
	Degree of similarity.
	```
	similar: 0.42575997
	verification: match (threshold: 0.260)
	```

	## Usage
	Automatically downloads the onnx and prototxt files on the first run.
	It is necessary to be connected to the Internet while downloading.

	For the sample wav,
	```bash
	$ python3 auto_speech.py
	```
	It outputs top 5 label. (identification mode)

	If you want to specify the input file, put the path after the `--input` option.
	```bash
	$ python3 auto_speech.py --input wav/id10283/oGZsanLiXsY/00004.wav
	```

	When two files are specified with the `--input1` and `--input2` options,
	check if two audio files belong to the same person. (verification mode)
	```bash
	$ python3 auto_speech.py --input1 wav/id10270/8jEAjG6SegY/00008.wav --input2 wav/id10270/x6uYqmx31kE/00001.wav
	```

	## Reference

	[AutoSpeech: Neural Architecture Search for Speaker Recognition](https://github.com/VITA-Group/AutoSpeech)

	## Framework

	Pytorch

	## Model Format

	ONNX opset=11

	## Netron

	[proposed_iden.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/auto_speech/proposed_iden.onnx.prototxt)
	[proposed_classifier.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/auto_speech/proposed_classifier.onnx.prototxt)
	[proposed_veri.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/auto_speech/proposed_veri.onnx.prototxt)