File size: 1,329 Bytes

111ec90

# Qwen-Audio

## Input

- Audio file

  https://github.com/QwenLM/Qwen-Audio/blob/main/assets/audio/1272-128104-0000.flac

- Prompt

  what does the person say?

## Output

The person says: "mister quilter is the apostle of the middle classes and we are glad to welcome his gospel".

## Requirements

This model requires additional module.
```
pip3 install transformers
pip3 install tiktoken
pip3 install librosa
```


## Usage
Automatically downloads the onnx and prototxt files on the first run.
It is necessary to be connected to the Internet while downloading.

For the sample wav,
```bash
$ python3 qwen_audio.py
```

If you want to specify the audio, put the file path after the `--input` option.
```bash
$ python3 qwen_audio.py --input AUDIO_FILE
```

If you want to specify the prompt, put the prompt after the `--prompt` option.  
```bash
$ python3 qwen_audio.py --prompt PROMPT
```

## Reference

- [Qwen-Audio](https://github.com/QwenLM/Qwen-Audio)

## Framework

Pytorch

## Model Format

ONNX opset=17

## Netron

[Qwen-Audio-Chat_encode.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/qwen_audio/Qwen-Audio-Chat_encode.onnx.prototxt)  
[Qwen-Audio-Chat.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/qwen_audio/Qwen-Audio-Chat.onnx.prototxt)