primepake commited on
Commit
5562789
·
1 Parent(s): 0238bb4

update learnable speech:

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -1,4 +1,4 @@
1
- # MiniMax-Speech Technical Implementation
2
 
3
  An unofficial implementation based on improvements of cosyvoice with learnable encoder and dac-vae, with core components adapted from [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice).
4
 
@@ -6,7 +6,7 @@ An unofficial implementation based on improvements of cosyvoice with learnable e
6
 
7
  ## Overview
8
 
9
- This repository provides an implementation of the MiniMax-Speech model, featuring a two-stage training approach for high-quality 24kHz audio generation.
10
 
11
  ## Key Features
12
 
@@ -168,7 +168,7 @@ This implementation builds upon several key projects:
168
 
169
  - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: Core model architectures and training pipelines
170
  - **[Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)**: Audio tokenization framework
171
- - **MiniMax-Speech**: Original technical report and methodology
172
 
173
  ## Citation
174
 
@@ -176,8 +176,8 @@ If you use this code in your research, please cite:
176
 
177
  ```bibtex
178
  @article{minimax-speech,
179
- title={MiniMax-Speech},
180
- author={[MiniMax team]},
181
  year={[2025]}
182
  url={https://arxiv.org/pdf/2505.07916}
183
  }
@@ -200,7 +200,7 @@ This project follows the licensing terms of its dependencies:
200
 
201
  - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: This implementation extensively uses code and architectures from CosyVoice2
202
  - **[FSQ](https://github.com/xingchensong/S3Tokenizer)**: For the FSQ implementation
203
- - **MiniMax team**: For the technical report and methodology
204
  - **FunAudioLLM team**: For the excellent CosyVoice2 codebase
205
 
206
  ## Contributing
 
1
+ # Learnable-Speech Technical Implementation
2
 
3
  An unofficial implementation based on improvements of cosyvoice with learnable encoder and dac-vae, with core components adapted from [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice).
4
 
 
6
 
7
  ## Overview
8
 
9
+ This repository provides an implementation of the Learnable-Speech model, featuring a two-stage training approach for high-quality 24kHz audio generation.
10
 
11
  ## Key Features
12
 
 
168
 
169
  - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: Core model architectures and training pipelines
170
  - **[Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)**: Audio tokenization framework
171
+ - **Learnable-Speech**: Original technical report and methodology
172
 
173
  ## Citation
174
 
 
176
 
177
  ```bibtex
178
  @article{minimax-speech,
179
+ title={Learnable-Speech},
180
+ author={[Learnable team]},
181
  year={[2025]}
182
  url={https://arxiv.org/pdf/2505.07916}
183
  }
 
200
 
201
  - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: This implementation extensively uses code and architectures from CosyVoice2
202
  - **[FSQ](https://github.com/xingchensong/S3Tokenizer)**: For the FSQ implementation
203
+ - **Learnable team**: For the technical report and methodology
204
  - **FunAudioLLM team**: For the excellent CosyVoice2 codebase
205
 
206
  ## Contributing