nielsr HF Staff commited on
Commit
ba91eda
·
verified ·
1 Parent(s): 2038c43

Improve model card and add metadata for UniAudio 2.0

Browse files

Hi! I'm Niels from the Hugging Face community science team. I've updated the model card for UniAudio 2.0 to include:
- Metadata for the `audio-to-audio` pipeline tag and the MIT license.
- Links to the research paper, project demo page, and official GitHub repository.
- A summary of supported tasks across speech, sound, and music.
- Sample usage instructions for both understanding (ASR) and generation (TTS) based on the official documentation.
- The BibTeX citation for the paper.

Files changed (1) hide show
  1. README.md +85 -3
README.md CHANGED
@@ -1,3 +1,85 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: audio-to-audio
4
+ tags:
5
+ - audio
6
+ - speech
7
+ - music
8
+ - audio-generation
9
+ ---
10
+
11
+ # UniAudio 2.0: A Unified Audio Language Model with Text-Aligned Factorized Audio Tokenization
12
+
13
+ UniAudio 2.0 is a unified audio foundation model for speech, sound, and music. It uses **ReasoningCodec** (reasoning tokens and reconstruction tokens) and a unified autoregressive architecture trained on 100B text and 60B audio tokens.
14
+
15
+ - **Paper:** [UniAudio 2.0: A Unified Audio Language Model with Text-Aligned Factorized Audio Tokenization](https://huggingface.co/papers/2602.04683)
16
+ - **Project Page:** [Demo 🎶](https://dongchaoyang.top/UniAudio2Demo/)
17
+ - **Code:** [GitHub Repository](https://github.com/yangdongchao/UniAudio2)
18
+
19
+ ## Supported Tasks
20
+
21
+ - **Speech:** TTS (EN/ZH/Yue), Audio-Instructed TTS, InstructTTS, ASR, Dysarthric Speech Recognition, S2S Q&A, S2T Q&A
22
+ - **Sound:** Text-to-Sound, Audio Caption, audio-question answer
23
+ - **Music:** Song Generation (EN/ZH) and Recognition, Text-to-Music Generation, music-question answer
24
+
25
+ ## Installation
26
+
27
+ ```bash
28
+ # Clone the repo
29
+ git clone https://github.com/yangdongchao/UniAudio2
30
+ cd UniAudio2
31
+
32
+ # Create environment (Python 3.10)
33
+ conda create -n uniaudio2 python=3.10
34
+ conda activate uniaudio2
35
+
36
+ # Editable install
37
+ pip install -e .
38
+ ```
39
+
40
+ ## Sample Usage
41
+
42
+ All tasks are run via the `multi_task_inference.py` script. You need to download the checkpoints and update paths in `tools/tokenizer/ReasoningCodec_film/codec_infer_config.yaml`.
43
+
44
+ ### Understanding (Audio → Text) - ASR Example
45
+
46
+ ```bash
47
+ python multi_task_inference.py \
48
+ --task ASR \
49
+ --audio samples/p225_002.wav \
50
+ --output_dir ./ASR_output \
51
+ --llm_train_config <LLM_CONFIG> \
52
+ --exp_dir <EXP_DIR> \
53
+ --resume <RESUME> \
54
+ --text_tokenizer_path tools/tokenizer/Text2ID/llama3_2_tokenizer \
55
+ --prompt_text "Transcribe the provided audio recording into accurate text." \
56
+ --audio_tokenizer_config tools/tokenizer/ReasoningCodec_film/infer_config.yaml \
57
+ --codec_config tools/tokenizer/ReasoningCodec_film/infer_config.yaml \
58
+ --codec_ckpt <CODEC_CKPT>
59
+ ```
60
+
61
+ ### Generation (Text → Audio) - TTS Example
62
+
63
+ ```bash
64
+ python multi_task_inference.py \
65
+ --task TTS \
66
+ --stage all \
67
+ --text "Hello, this is a test." \
68
+ --output_dir ./TTS_output \
69
+ --llm_train_config <LLM_CONFIG> --exp_dir <EXP_DIR> --resume <RESUME> \
70
+ --text_tokenizer_path tools/tokenizer/Text2ID/llama3_2_tokenizer \
71
+ --prompt_text "Convert the given text into natural speech." \
72
+ --audio_tokenizer_config tools/tokenizer/ReasoningCodec_film/infer_config.yaml \
73
+ --codec_config tools/tokenizer/ReasoningCodec_film/infer_config.yaml \
74
+ --codec_ckpt <CODEC_CKPT> --codec_steps 10
75
+ ```
76
+
77
+ ## Citation
78
+
79
+ ```bibtex
80
+ @article{uniaudio2,
81
+ title={UniAudio 2.0: A Unified Audio Language Model with Text-Aligned Factorized Audio Tokenization},
82
+ author={Dongchao Yang, Yuanyuan Wang, Dading Chong, Songxiang Liu, Xixin Wu, Helen Meng},
83
+ year={2026}
84
+ }
85
+ ```