Spaces:
Sleeping
Sleeping
primepake
commited on
Commit
·
5562789
1
Parent(s):
0238bb4
update learnable speech:
Browse files
README.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
An unofficial implementation based on improvements of cosyvoice with learnable encoder and dac-vae, with core components adapted from [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice).
|
| 4 |
|
|
@@ -6,7 +6,7 @@ An unofficial implementation based on improvements of cosyvoice with learnable e
|
|
| 6 |
|
| 7 |
## Overview
|
| 8 |
|
| 9 |
-
This repository provides an implementation of the
|
| 10 |
|
| 11 |
## Key Features
|
| 12 |
|
|
@@ -168,7 +168,7 @@ This implementation builds upon several key projects:
|
|
| 168 |
|
| 169 |
- **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: Core model architectures and training pipelines
|
| 170 |
- **[Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)**: Audio tokenization framework
|
| 171 |
-
- **
|
| 172 |
|
| 173 |
## Citation
|
| 174 |
|
|
@@ -176,8 +176,8 @@ If you use this code in your research, please cite:
|
|
| 176 |
|
| 177 |
```bibtex
|
| 178 |
@article{minimax-speech,
|
| 179 |
-
title={
|
| 180 |
-
author={[
|
| 181 |
year={[2025]}
|
| 182 |
url={https://arxiv.org/pdf/2505.07916}
|
| 183 |
}
|
|
@@ -200,7 +200,7 @@ This project follows the licensing terms of its dependencies:
|
|
| 200 |
|
| 201 |
- **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: This implementation extensively uses code and architectures from CosyVoice2
|
| 202 |
- **[FSQ](https://github.com/xingchensong/S3Tokenizer)**: For the FSQ implementation
|
| 203 |
-
- **
|
| 204 |
- **FunAudioLLM team**: For the excellent CosyVoice2 codebase
|
| 205 |
|
| 206 |
## Contributing
|
|
|
|
| 1 |
+
# Learnable-Speech Technical Implementation
|
| 2 |
|
| 3 |
An unofficial implementation based on improvements of cosyvoice with learnable encoder and dac-vae, with core components adapted from [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice).
|
| 4 |
|
|
|
|
| 6 |
|
| 7 |
## Overview
|
| 8 |
|
| 9 |
+
This repository provides an implementation of the Learnable-Speech model, featuring a two-stage training approach for high-quality 24kHz audio generation.
|
| 10 |
|
| 11 |
## Key Features
|
| 12 |
|
|
|
|
| 168 |
|
| 169 |
- **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: Core model architectures and training pipelines
|
| 170 |
- **[Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)**: Audio tokenization framework
|
| 171 |
+
- **Learnable-Speech**: Original technical report and methodology
|
| 172 |
|
| 173 |
## Citation
|
| 174 |
|
|
|
|
| 176 |
|
| 177 |
```bibtex
|
| 178 |
@article{minimax-speech,
|
| 179 |
+
title={Learnable-Speech},
|
| 180 |
+
author={[Learnable team]},
|
| 181 |
year={[2025]}
|
| 182 |
url={https://arxiv.org/pdf/2505.07916}
|
| 183 |
}
|
|
|
|
| 200 |
|
| 201 |
- **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: This implementation extensively uses code and architectures from CosyVoice2
|
| 202 |
- **[FSQ](https://github.com/xingchensong/S3Tokenizer)**: For the FSQ implementation
|
| 203 |
+
- **Learnable team**: For the technical report and methodology
|
| 204 |
- **FunAudioLLM team**: For the excellent CosyVoice2 codebase
|
| 205 |
|
| 206 |
## Contributing
|