shaunxsyang commited on
Commit
dd8a016
·
verified ·
1 Parent(s): 69ccd7e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation
2
+
3
+ [![githubio](https://img.shields.io/badge/GitHub.io-Demo_Page-blue?logo=Github&style=flat-square)](https://yangxusheng-yxs.github.io/U-Codec/)
4
+ [![GitHub](https://img.shields.io/badge/Github-Code_Release-pink?logo=Github&style=flat-square)](https://github.com/YangXusheng-yxs/CodecFormer_5Hz)
5
+ [![HuggingFace](https://img.shields.io/badge/HugginigFace-Stable_Release-blue?style=flat-square)](https://huggingface.co/shaunxsyang/U-Codec)
6
+
7
+
8
+ ### News
9
+ This paper is currently under review. We have released the checkpoint of U-Codec (5Hz), which can be directly used for inference.
10
+
11
+
12
+ ### To do list
13
+ - Provide the full training code for the U-Codec framework.
14
+
15
+ - Release the public code of the TTS models built on top of U-Codec.
16
+
17
+ If you are interested in U-Codec, feel free to contact us!
18
+ ## Overview
19
+
20
+ We propose **U-Codec**, an **U**ltra low frame-rate neural speech **Codec** that achieves high-fidelity reconstruction and fast generation via an extremely frame-rate at 5Hz (5 frames per second).
21
+ Extreme compression at 5Hz typically leads to severe intelligibility and spectral detail loss, we overcome this by integrating a Transformer-based inter-frame long-term dependency module and systematically optimizing residual vector quantization (RVQ) depth and codebook size.
22
+ Moreover, we apply U-Codec into a large language model (LLM)-based auto-regressive TTS model, which leverages global and local hierarchical architecture to effectively capture dependencies across multi-layer tokens.
23
+
24
+ The overview of U-Codec as following picture shows.
25
+ ![The overview of UniAudio](fig/fig2.png)
26
+
27
+
28
+
29
+ ## How to inference U-Codec
30
+ We provide an example to demonstrate how to run U-Codec (5Hz) for audio tokenization and reconstruction.
31
+ ### Environment Setup
32
+ First, create a Python environment following a similar setup to [project page](https://github.com/yangdongchao/UniAudio).
33
+ ```
34
+ conda create -n ucodec python=3.8
35
+ conda init
36
+ source ~/.bashrc
37
+ conda activate ucodec
38
+ ```
39
+ Then:
40
+ ```
41
+ cd U-Codec
42
+ bash requirements.sh
43
+ ```
44
+ ### Run Inference
45
+ If you need pretrained weights, please download them on the [Checkpoint](https://huggingface.co/shaunxsyang/U-Codec).
46
+
47
+ We provide an example script AudioTokenizer_UCodec.py for tokenizing audio into discrete codes and reconstructing audio from the codes.
48
+
49
+ ```
50
+ cd tools/tokenizer/soundstream
51
+ python AudioTokenizer_HY.py
52
+ ```
53
+
54
+ You can directly use the released U-Codec 5Hz checkpoint for inference. More examples (e.g., TTS pipeline integration) will be released soon.
55
+
56
+
57
+ ### Citation
58
+ If you find this code useful in your research, please cite our work and give us a star
59
+ ```bib
60
+ @inproceedings{U-Codec,
61
+ title = {U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation},
62
+ author = {Xusheng Yang, Long Zhou, Wenfu Wang, Kai Hu, Shulin Feng, Chenxing Li, Meng Yu, Dong Yu, Yuexian Zou},
63
+ booktitle = {arXiv},
64
+ year = {2025}
65
+ }
66
+ ```
67
+
68
+ ### Contact us
69
+ If you have any problem about the our code, please contact Xusheng (yangxs@stu.pku.edu.cn).
70
+
71
+ ### License
72
+ You can use the code under MIT license.