Qwen-Audio (code, models, paper)

111ec90 verified 3 months ago

1.33 kB

	# Qwen-Audio

	## Input

	- Audio file

	https://github.com/QwenLM/Qwen-Audio/blob/main/assets/audio/1272-128104-0000.flac

	- Prompt

	what does the person say?

	## Output

	The person says: "mister quilter is the apostle of the middle classes and we are glad to welcome his gospel".

	## Requirements

	This model requires additional module.
	```
	pip3 install transformers
	pip3 install tiktoken
	pip3 install librosa
	```


	## Usage
	Automatically downloads the onnx and prototxt files on the first run.
	It is necessary to be connected to the Internet while downloading.

	For the sample wav,
	```bash
	$ python3 qwen_audio.py
	```

	If you want to specify the audio, put the file path after the `--input` option.
	```bash
	$ python3 qwen_audio.py --input AUDIO_FILE
	```

	If you want to specify the prompt, put the prompt after the `--prompt` option.
	```bash
	$ python3 qwen_audio.py --prompt PROMPT
	```

	## Reference

	- [Qwen-Audio](https://github.com/QwenLM/Qwen-Audio)

	## Framework

	Pytorch

	## Model Format

	ONNX opset=17

	## Netron

	[Qwen-Audio-Chat_encode.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/qwen_audio/Qwen-Audio-Chat_encode.onnx.prototxt)
	[Qwen-Audio-Chat.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/qwen_audio/Qwen-Audio-Chat.onnx.prototxt)