primepake commited on
Commit
1c33894
·
2 Parent(s): 95af23d 5562789

Merge branch 'main' of https://github.com/primepake/learnable-speech

Browse files
Files changed (2) hide show
  1. README.md +9 -9
  2. assets/image.png +2 -2
README.md CHANGED
@@ -1,12 +1,12 @@
1
- # MiniMax-Speech Technical Implementation
2
 
3
- An unofficial implementation based on the MiniMax-Speech technical report, with core components adapted from [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice).
4
 
5
- ![MiniMax-Speech Architecture](assets/image.png)
6
 
7
  ## Overview
8
 
9
- This repository provides an implementation of the MiniMax-Speech model, featuring a two-stage training approach for high-quality 24kHz audio generation.
10
 
11
  ## Key Features
12
 
@@ -168,7 +168,7 @@ This implementation builds upon several key projects:
168
 
169
  - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: Core model architectures and training pipelines
170
  - **[Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)**: Audio tokenization framework
171
- - **MiniMax-Speech**: Original technical report and methodology
172
 
173
  ## Citation
174
 
@@ -176,8 +176,8 @@ If you use this code in your research, please cite:
176
 
177
  ```bibtex
178
  @article{minimax-speech,
179
- title={MiniMax-Speech},
180
- author={[MiniMax team]},
181
  year={[2025]}
182
  url={https://arxiv.org/pdf/2505.07916}
183
  }
@@ -200,7 +200,7 @@ This project follows the licensing terms of its dependencies:
200
 
201
  - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: This implementation extensively uses code and architectures from CosyVoice2
202
  - **[FSQ](https://github.com/xingchensong/S3Tokenizer)**: For the FSQ implementation
203
- - **MiniMax team**: For the technical report and methodology
204
  - **FunAudioLLM team**: For the excellent CosyVoice2 codebase
205
 
206
  ## Contributing
@@ -212,4 +212,4 @@ The content provided above is for academic purposes only and is intended to demo
212
 
213
  ## Contact
214
 
215
- [nguyennhutsam.math@gmail.com, https://www.linkedin.com/in/primepake/]
 
1
+ # Learnable-Speech Technical Implementation
2
 
3
+ An unofficial implementation based on improvements of cosyvoice with learnable encoder and dac-vae, with core components adapted from [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice).
4
 
5
+ ![Architecture](assets/image.png)
6
 
7
  ## Overview
8
 
9
+ This repository provides an implementation of the Learnable-Speech model, featuring a two-stage training approach for high-quality 24kHz audio generation.
10
 
11
  ## Key Features
12
 
 
168
 
169
  - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: Core model architectures and training pipelines
170
  - **[Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)**: Audio tokenization framework
171
+ - **Learnable-Speech**: Original technical report and methodology
172
 
173
  ## Citation
174
 
 
176
 
177
  ```bibtex
178
  @article{minimax-speech,
179
+ title={Learnable-Speech},
180
+ author={[Learnable team]},
181
  year={[2025]}
182
  url={https://arxiv.org/pdf/2505.07916}
183
  }
 
200
 
201
  - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: This implementation extensively uses code and architectures from CosyVoice2
202
  - **[FSQ](https://github.com/xingchensong/S3Tokenizer)**: For the FSQ implementation
203
+ - **Learnable team**: For the technical report and methodology
204
  - **FunAudioLLM team**: For the excellent CosyVoice2 codebase
205
 
206
  ## Contributing
 
212
 
213
  ## Contact
214
 
215
+ [nguyennhutsam.math@gmail.com, https://www.linkedin.com/in/primepake/]
assets/image.png CHANGED

Git LFS Details

  • SHA256: f10f503661fd5331b31f6a2450391c12df4042ae1b7333d8b4c8646852d2ebae
  • Pointer size: 130 Bytes
  • Size of remote file: 32.6 kB

Git LFS Details

  • SHA256: e29c4acf00e8658092632cb17ba76cd5873fad1be0c0024bbb6356e8e6c045f4
  • Pointer size: 130 Bytes
  • Size of remote file: 50.7 kB