Add pipeline tag
Browse filesThis PR adds the `audio-to-audio` pipeline tag to the model metadata. This ensures the model is correctly categorized and discoverable under audio tasks on the Hugging Face Hub. It also maintains the existing documentation, including benchmarks and usage examples.
README.md
CHANGED
|
@@ -1,9 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
license: other
|
| 3 |
-
license_name: license-term-of-universal-audio-tokenizer
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
- zh
|
|
|
|
|
|
|
| 7 |
tags:
|
| 8 |
- audio
|
| 9 |
- audio-tokenizer
|
|
@@ -11,11 +11,12 @@ tags:
|
|
| 11 |
- speech
|
| 12 |
- sound
|
| 13 |
- music
|
|
|
|
| 14 |
---
|
|
|
|
| 15 |
# Universal Audio Tokenizer: Empowering Semantic Speech Tokenizers with General Audio Perception
|
| 16 |
|
| 17 |
-
**Universal Audio Tokenizer** is a compact single-codebook audio tokenizer that unifies general audio perception and
|
| 18 |
-
linguistic alignment for downstream Audio-LLMs.
|
| 19 |
|
| 20 |
📄 [Paper](https://arxiv.org/abs/2605.31521) | 💻 [GitHub](https://github.com/Tencent/Universal_Audio_Tokenizer)
|
| 21 |
|
|
@@ -108,6 +109,7 @@ Also, you can directly run the inference code snippet below:
|
|
| 108 |
```python
|
| 109 |
import os
|
| 110 |
import torch
|
|
|
|
| 111 |
from transformers import WhisperFeatureExtractor
|
| 112 |
from src.model.modeling_whisper import WhisperVQEncoder
|
| 113 |
from src.model.flow_inference import AudioDecoder
|
|
@@ -167,7 +169,7 @@ Our Universal Audio Tokenizer achieves high-quality speech reconstruction with a
|
|
| 167 |
|
| 168 |
### Superior Downstream Audio-LLM Performance
|
| 169 |
|
| 170 |
-
When integrated with the Qwen2.5 LLM backbone, our Universal Audio Tokenizer yields superior performance on a wide range of downstream audio understanding benchmarks and controllable TTS synthesis tasks
|
| 171 |
|
| 172 |
#### Audio Understanding
|
| 173 |
|
|
@@ -208,4 +210,4 @@ If you find our code or model useful for your research, please cite:
|
|
| 208 |
|
| 209 |
## License
|
| 210 |
|
| 211 |
-
This project is licensed under the [License Term of Universal_Audio_Tokenizer](LICENSE).
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
- zh
|
| 5 |
+
license: other
|
| 6 |
+
license_name: license-term-of-universal-audio-tokenizer
|
| 7 |
tags:
|
| 8 |
- audio
|
| 9 |
- audio-tokenizer
|
|
|
|
| 11 |
- speech
|
| 12 |
- sound
|
| 13 |
- music
|
| 14 |
+
pipeline_tag: audio-to-audio
|
| 15 |
---
|
| 16 |
+
|
| 17 |
# Universal Audio Tokenizer: Empowering Semantic Speech Tokenizers with General Audio Perception
|
| 18 |
|
| 19 |
+
**Universal Audio Tokenizer** (UniAudio-Token) is a compact single-codebook audio tokenizer that unifies general audio perception and linguistic alignment for downstream Audio-LLMs.
|
|
|
|
| 20 |
|
| 21 |
📄 [Paper](https://arxiv.org/abs/2605.31521) | 💻 [GitHub](https://github.com/Tencent/Universal_Audio_Tokenizer)
|
| 22 |
|
|
|
|
| 109 |
```python
|
| 110 |
import os
|
| 111 |
import torch
|
| 112 |
+
from huggingface_hub import snapshot_download
|
| 113 |
from transformers import WhisperFeatureExtractor
|
| 114 |
from src.model.modeling_whisper import WhisperVQEncoder
|
| 115 |
from src.model.flow_inference import AudioDecoder
|
|
|
|
| 169 |
|
| 170 |
### Superior Downstream Audio-LLM Performance
|
| 171 |
|
| 172 |
+
When integrated with the Qwen2.5 LLM backbone, our Universal Audio Tokenizer yields superior performance on a wide range of downstream audio understanding benchmarks and controllable TTS synthesis tasks.
|
| 173 |
|
| 174 |
#### Audio Understanding
|
| 175 |
|
|
|
|
| 210 |
|
| 211 |
## License
|
| 212 |
|
| 213 |
+
This project is licensed under the [License Term of Universal_Audio_Tokenizer](LICENSE).
|