Spaces:

mnhatdaous
/

learnable-speech

Sleeping

App Files Files Community

primepake commited on Aug 13

Commit

1c33894

2 Parent(s): 95af23d 5562789

Merge branch 'main' of https://github.com/primepake/learnable-speech

Browse files

Files changed (2) hide show

README.md +9 -9
assets/image.png +2 -2

README.md CHANGED Viewed

@@ -1,12 +1,12 @@
-# MiniMax-Speech Technical Implementation
-An unofficial implementation based on the MiniMax-Speech technical report, with core components adapted from [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice).
-![MiniMax-Speech Architecture](assets/image.png)
 ## Overview
-This repository provides an implementation of the MiniMax-Speech model, featuring a two-stage training approach for high-quality 24kHz audio generation.
 ## Key Features
@@ -168,7 +168,7 @@ This implementation builds upon several key projects:
 - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: Core model architectures and training pipelines
 - **[Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)**: Audio tokenization framework
-- **MiniMax-Speech**: Original technical report and methodology
 ## Citation
@@ -176,8 +176,8 @@ If you use this code in your research, please cite:
 ```bibtex
 @article{minimax-speech,
-  title={MiniMax-Speech},
-  author={[MiniMax team]},
   year={[2025]}
   url={https://arxiv.org/pdf/2505.07916}
 }
@@ -200,7 +200,7 @@ This project follows the licensing terms of its dependencies:
 - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: This implementation extensively uses code and architectures from CosyVoice2
 - **[FSQ](https://github.com/xingchensong/S3Tokenizer)**: For the FSQ implementation
-- **MiniMax team**: For the technical report and methodology
 - **FunAudioLLM team**: For the excellent CosyVoice2 codebase
 ## Contributing
@@ -212,4 +212,4 @@ The content provided above is for academic purposes only and is intended to demo
 ## Contact
-[nguyennhutsam.math@gmail.com, https://www.linkedin.com/in/primepake/]

+# Learnable-Speech Technical Implementation
+An unofficial implementation based on improvements of cosyvoice with learnable encoder and dac-vae, with core components adapted from [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice).
+![Architecture](assets/image.png)
 ## Overview
+This repository provides an implementation of the Learnable-Speech model, featuring a two-stage training approach for high-quality 24kHz audio generation.
 ## Key Features
 - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: Core model architectures and training pipelines
 - **[Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)**: Audio tokenization framework
+- **Learnable-Speech**: Original technical report and methodology
 ## Citation
 ```bibtex
 @article{minimax-speech,
+  title={Learnable-Speech},
+  author={[Learnable team]},
   year={[2025]}
   url={https://arxiv.org/pdf/2505.07916}
 }
 - **[CosyVoice2](https://github.com/FunAudioLLM/CosyVoice)**: This implementation extensively uses code and architectures from CosyVoice2
 - **[FSQ](https://github.com/xingchensong/S3Tokenizer)**: For the FSQ implementation
+- **Learnable team**: For the technical report and methodology
 - **FunAudioLLM team**: For the excellent CosyVoice2 codebase
 ## Contributing
 ## Contact
+[nguyennhutsam.math@gmail.com, https://www.linkedin.com/in/primepake/]

assets/image.png CHANGED Viewed

Git LFS Details

SHA256: f10f503661fd5331b31f6a2450391c12df4042ae1b7333d8b4c8646852d2ebae
Pointer size: 130 Bytes
Size of remote file: 32.6 kB

Git LFS Details

SHA256: e29c4acf00e8658092632cb17ba76cd5873fad1be0c0024bbb6356e8e6c045f4
Pointer size: 130 Bytes
Size of remote file: 50.7 kB