File size: 1,329 Bytes
111ec90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# Qwen-Audio

## Input

- Audio file

  https://github.com/QwenLM/Qwen-Audio/blob/main/assets/audio/1272-128104-0000.flac

- Prompt

  what does the person say?

## Output

The person says: "mister quilter is the apostle of the middle classes and we are glad to welcome his gospel".

## Requirements

This model requires additional module.
```
pip3 install transformers
pip3 install tiktoken
pip3 install librosa
```


## Usage
Automatically downloads the onnx and prototxt files on the first run.
It is necessary to be connected to the Internet while downloading.

For the sample wav,
```bash
$ python3 qwen_audio.py
```

If you want to specify the audio, put the file path after the `--input` option.
```bash
$ python3 qwen_audio.py --input AUDIO_FILE
```

If you want to specify the prompt, put the prompt after the `--prompt` option.  
```bash
$ python3 qwen_audio.py --prompt PROMPT
```

## Reference

- [Qwen-Audio](https://github.com/QwenLM/Qwen-Audio)

## Framework

Pytorch

## Model Format

ONNX opset=17

## Netron

[Qwen-Audio-Chat_encode.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/qwen_audio/Qwen-Audio-Chat_encode.onnx.prototxt)  
[Qwen-Audio-Chat.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/qwen_audio/Qwen-Audio-Chat.onnx.prototxt)