Improve model card: Add pipeline tag, library name, and update links

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +10 -12
README.md CHANGED
@@ -1,23 +1,24 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
  - zh
 
6
  tags:
7
  - MoE
8
  - Unified Generation
9
  - Speech and Music
10
  - Multi-modal
 
 
11
  ---
12
 
13
-
14
  <h1 align="center">UniMoE-Audio</h1>
15
 
16
- **UniMoE-Audio** is a unified framework that seamlessly combines speech and music generation. Powered by a novel Dynamic-Capacity Mixture-of-Experts architecture.
17
 
18
  <div align="center" style="display: flex; justify-content: center; margin-top: 10px;">
19
  <a href="https://mukioxun.github.io/Uni-MoE-site/home.html"><img src="https://img.shields.io/badge/๐Ÿ“ฐ -Website-228B22" style="margin-right: 5px;"></a>
20
- <a href="https://arxiv.org/abs/2510.13344"><img src="https://img.shields.io/badge/๐Ÿ“„-Paper-8A2BE2" style="margin-right: 5px;"></a>
21
  </div>
22
 
23
  ---
@@ -28,8 +29,8 @@ tags:
28
  - [x] Model Checkpoint
29
  - [x] [UniMoE-Audio-preview](https://huggingface.co/foggyforest/UniMoE-Audio-preview)
30
  - [ ] [UniMoE-Audio]()
31
- - [x] Training and Inference Code: [HITsz-TMG/UniMoE-Audio](https://github.com/HITsz-TMG/UMOE-Scaling-Unified-Multimodal-LLMs/tree/master/UniMoE-Audio)
32
- - [x] Technical Report: [UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE](https://arxiv.org/abs/2510.13344)
33
 
34
  ## Evaluation
35
 
@@ -56,7 +57,7 @@ pip install qwen-vl-utils
56
  ```
57
 
58
 
59
- We use the Descript Audio Codec (DAC) for audio compression. You can install it using the following command:
60
  ```
61
  pip install descript-audio-codec
62
  ```
@@ -187,7 +188,7 @@ video = [
187
  "/path/to/your/video/path.mp4",
188
  ]
189
 
190
- text_input, video_inputs, fps_inputs, v2m_generation_kwargs = v2m_preprocess(caption, video)
191
 
192
  source_input = processor(text=text_input, images=None, videos=video_inputs, fps=fps_inputs, padding=True, return_tensors="pt", do_resize=False)
193
  source_input = source_input.to(model.device)
@@ -219,9 +220,6 @@ for i in range(len(audios)):
219
  dac.decode(audios[i].transpose(0, 1).unsqueeze(0), save_path=output_path, min_duration=1)
220
  ```
221
 
222
-
223
-
224
-
225
  # Citation
226
 
227
  Please cite the repo if you use the model or code in this repo.
@@ -232,7 +230,7 @@ Please cite the repo if you use the model or code in this repo.
232
  author={Zhenyu Liu and Yunxin Li and Xuanyu Zhang and Qixun Teng and Shenyuan Jiang and Xinyu Chen and Haoyuan Shi and Jinchao Li and Qi Wang and Haolan Chen and Fanbo Meng and Mingjun Zhao and Yu Xu and Yancheng He and Baotian Hu and Min Zhang},
233
  year={2025},
234
  journal={arXiv preprint arXiv:2510.13344},
235
- url={https://arxiv.org/abs/2510.13344},
236
  }
237
  ```
238
 
 
1
  ---
 
2
  language:
3
  - en
4
  - zh
5
+ license: apache-2.0
6
  tags:
7
  - MoE
8
  - Unified Generation
9
  - Speech and Music
10
  - Multi-modal
11
+ pipeline_tag: text-to-audio
12
+ library_name: transformers
13
  ---
14
 
 
15
  <h1 align="center">UniMoE-Audio</h1>
16
 
17
+ **UniMoE-Audio** is a unified framework that seamlessly combines speech and music generation. Powered by a novel Dynamic-Capacity Mixture-of-Experts architecture.
18
 
19
  <div align="center" style="display: flex; justify-content: center; margin-top: 10px;">
20
  <a href="https://mukioxun.github.io/Uni-MoE-site/home.html"><img src="https://img.shields.io/badge/๐Ÿ“ฐ -Website-228B22" style="margin-right: 5px;"></a>
21
+ <a href="https://huggingface.co/papers/2510.13344"><img src="https://img.shields.io/badge/๐Ÿ“„-Paper-8A2BE2" style="margin-right: 5px;"></a>
22
  </div>
23
 
24
  ---
 
29
  - [x] Model Checkpoint
30
  - [x] [UniMoE-Audio-preview](https://huggingface.co/foggyforest/UniMoE-Audio-preview)
31
  - [ ] [UniMoE-Audio]()
32
+ - [x] Training and Inference Code: [HITsz-TMG/UniMoE-Audio](https://github.com/HITsz-TMG/Uni-MoE/tree/master/UniMoE-Audio)
33
+ - [x] Technical Report: [UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE](https://huggingface.co/papers/2510.13344)
34
 
35
  ## Evaluation
36
 
 
57
  ```
58
 
59
 
60
+ We use the Descript Audio Codec (DAC) for audio compression. You can install it using the following command:
61
  ```
62
  pip install descript-audio-codec
63
  ```
 
188
  "/path/to/your/video/path.mp4",
189
  ]
190
 
191
+ text_input, video_inputs, fps_inputs, v2m_generation_kwargs = v2m_preprocess(caption, video)
192
 
193
  source_input = processor(text=text_input, images=None, videos=video_inputs, fps=fps_inputs, padding=True, return_tensors="pt", do_resize=False)
194
  source_input = source_input.to(model.device)
 
220
  dac.decode(audios[i].transpose(0, 1).unsqueeze(0), save_path=output_path, min_duration=1)
221
  ```
222
 
 
 
 
223
  # Citation
224
 
225
  Please cite the repo if you use the model or code in this repo.
 
230
  author={Zhenyu Liu and Yunxin Li and Xuanyu Zhang and Qixun Teng and Shenyuan Jiang and Xinyu Chen and Haoyuan Shi and Jinchao Li and Qi Wang and Haolan Chen and Fanbo Meng and Mingjun Zhao and Yu Xu and Yancheng He and Baotian Hu and Min Zhang},
231
  year={2025},
232
  journal={arXiv preprint arXiv:2510.13344},
233
+ url={https://huggingface.co/papers/2510.13344},
234
  }
235
  ```
236