Add pipeline tag

This PR adds the `audio-to-audio` pipeline tag to the model metadata. This ensures the model is correctly categorized and discoverable under audio tasks on the Hugging Face Hub. It also maintains the existing documentation, including benchmarks and usage examples.

Files changed (1) hide show

README.md +8 -6

README.md CHANGED Viewed

@@ -1,9 +1,9 @@
 ---
-license: other
-license_name: license-term-of-universal-audio-tokenizer
 language:
 - en
 - zh
 tags:
 - audio
 - audio-tokenizer
@@ -11,11 +11,12 @@ tags:
 - speech
 - sound
 - music
 ---
 # Universal Audio Tokenizer: Empowering Semantic Speech Tokenizers with General Audio Perception
-**Universal Audio Tokenizer** is a compact single-codebook audio tokenizer that unifies general audio perception and
-linguistic alignment for downstream Audio-LLMs.
 📄 [Paper](https://arxiv.org/abs/2605.31521) | 💻 [GitHub](https://github.com/Tencent/Universal_Audio_Tokenizer)
@@ -108,6 +109,7 @@ Also, you can directly run the inference code snippet below:
 ```python
 import os
 import torch
 from transformers import WhisperFeatureExtractor
 from src.model.modeling_whisper import WhisperVQEncoder
 from src.model.flow_inference import AudioDecoder
@@ -167,7 +169,7 @@ Our Universal Audio Tokenizer achieves high-quality speech reconstruction with a
 ### Superior Downstream Audio-LLM Performance
-When integrated with the Qwen2.5 LLM backbone, our Universal Audio Tokenizer yields superior performance on a wide range of downstream audio understanding benchmarks and controllable TTS synthesis tasks, demonstrating its effectiveness as a unified audio input/output interface for Audio-LLMs.
 #### Audio Understanding
@@ -208,4 +210,4 @@ If you find our code or model useful for your research, please cite:
 ## License
-This project is licensed under the [License Term of Universal_Audio_Tokenizer](LICENSE).

 ---
 language:
 - en
 - zh
+license: other
+license_name: license-term-of-universal-audio-tokenizer
 tags:
 - audio
 - audio-tokenizer
 - speech
 - sound
 - music
+pipeline_tag: audio-to-audio
 ---
 # Universal Audio Tokenizer: Empowering Semantic Speech Tokenizers with General Audio Perception
+**Universal Audio Tokenizer** (UniAudio-Token) is a compact single-codebook audio tokenizer that unifies general audio perception and linguistic alignment for downstream Audio-LLMs.
 📄 [Paper](https://arxiv.org/abs/2605.31521) | 💻 [GitHub](https://github.com/Tencent/Universal_Audio_Tokenizer)
 ```python
 import os
 import torch
+from huggingface_hub import snapshot_download
 from transformers import WhisperFeatureExtractor
 from src.model.modeling_whisper import WhisperVQEncoder
 from src.model.flow_inference import AudioDecoder
 ### Superior Downstream Audio-LLM Performance
+When integrated with the Qwen2.5 LLM backbone, our Universal Audio Tokenizer yields superior performance on a wide range of downstream audio understanding benchmarks and controllable TTS synthesis tasks.
 #### Audio Understanding
 ## License
+This project is licensed under the [License Term of Universal_Audio_Tokenizer](LICENSE).