# Qwen-Audio ## Input - Audio file https://github.com/QwenLM/Qwen-Audio/blob/main/assets/audio/1272-128104-0000.flac - Prompt what does the person say? ## Output The person says: "mister quilter is the apostle of the middle classes and we are glad to welcome his gospel". ## Requirements This model requires additional module. ``` pip3 install transformers pip3 install tiktoken pip3 install librosa ``` ## Usage Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading. For the sample wav, ```bash $ python3 qwen_audio.py ``` If you want to specify the audio, put the file path after the `--input` option. ```bash $ python3 qwen_audio.py --input AUDIO_FILE ``` If you want to specify the prompt, put the prompt after the `--prompt` option. ```bash $ python3 qwen_audio.py --prompt PROMPT ``` ## Reference - [Qwen-Audio](https://github.com/QwenLM/Qwen-Audio) ## Framework Pytorch ## Model Format ONNX opset=17 ## Netron [Qwen-Audio-Chat_encode.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/qwen_audio/Qwen-Audio-Chat_encode.onnx.prototxt) [Qwen-Audio-Chat.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/qwen_audio/Qwen-Audio-Chat.onnx.prototxt)