ttsds
/

maskgct

speech-synthesis

Model card Files Files and versions

maskgct / README.md

cdminix's picture

Add maskgct weights

72314e3 verified 25 days ago

|

history blame contribute delete

3.24 kB

	---
	license: cc-by-nc-4.0
	language:
	- eng
	- zho
	- kor
	- jpn
	- fra
	- deu
	tags:
	- tts
	- text-to-speech
	- speech-synthesis
	- voice-cloning
	library_name: ttsdb
	pipeline_tag: text-to-speech
	---

	# MaskGCT

	> This is a mirror of the original weights for use with [TTSDB](https://github.com/ttsds/ttsdb).
	>
	> Original weights: [https://huggingface.co/amphion/MaskGCT](https://huggingface.co/amphion/MaskGCT)
	> Original code: [https://github.com/open-mmlab/Amphion](https://github.com/open-mmlab/Amphion)


	MaskGCT is a zero-shot text-to-speech model using a masked generative codec transformer by [Amphion](https://github.com/open-mmlab/Amphion).



	## Original Work

	This model was created by the original authors. Please cite their work if you use this model:


	```bibtex
	@article{wang2024maskgct,
	title={MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer},
	author={Wang, Yuancheng and Zhan, Haoyue and Liu, Liwei and Zeng, Ruihong and Guo, Haotian and Zheng, Jiachen and Zhang, Qiang and Zhang, Xueyao and Zhang, Shunsi and Wu, Zhizheng},
	journal={arXiv preprint arXiv:2409.00750},
	year={2024}
	}
	```

	```bibtex
	@inproceedings{amphion,
	author={Zhang, Xueyao and Xue, Liumeng and Gu, Yicheng and Wang, Yuancheng and Li, Jiaqi and He, Haorui and Wang, Chaoren and Song, Ting and Chen, Xi and Fang, Zihao and Chen, Haopeng and Zhang, Junan and Tang, Tze Ying and Zou, Lexiao and Wang, Mingxuan and Han, Jun and Chen, Kai and Li, Haizhou and Wu, Zhizheng},
	title={Amphion: An Open-Source Audio, Music and Speech Generation Toolkit},
	booktitle={{IEEE} Spoken Language Technology Workshop, {SLT} 2024},
	year={2024}
	}
	```



	Papers:

	- https://arxiv.org/abs/2409.00750

	- https://ieeexplore.ieee.org/abstract/document/10832255



	## Installation

	```bash
	pip install ttsdb-maskgct
	```

	## Usage

	```python
	from ttsdb_maskgct import MaskGCT

	# Load the model (downloads weights automatically)
	model = MaskGCT(model_id="ttsds/MaskGCT")

	# Synthesize speech
	audio, sample_rate = model.synthesize(
	text="Hello, this is a test of MaskGCT.",
	reference_audio="path/to/reference.wav",
	text_reference="Transcript of the reference audio.",
	language="en",
	)

	# Save the output
	model.save_audio(audio, sample_rate, "output.wav")
	```

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Sample Rate \| 24000 Hz \|
	\| Parameters \| 1010M \|
	\| Architecture \| Non-Autoregressive Masked Transformer \|
	\| Languages \| English, Chinese, Korean, Japanese, French, German \|
	\| Release Date \| 2024-10-17 \|


	### Training Data


	- [Emilia Dataset](https://huggingface.co/datasets/amphion/Emilia-Dataset) (100000 hours)




	## License

	- Weights: Creative Commons Attribution-NonCommercial 4.0
	- Code: MIT License

	Please refer to the original repositories for full license terms.

	## Links

	- Original Code: [https://github.com/open-mmlab/Amphion](https://github.com/open-mmlab/Amphion)
	- Original Weights: [https://huggingface.co/amphion/MaskGCT](https://huggingface.co/amphion/MaskGCT)
	- TTSDB Package: [ttsdb-maskgct](https://pypi.org/project/ttsdb-maskgct/)
	- TTSDB GitHub: [https://github.com/ttsds/ttsdb](https://github.com/ttsds/ttsdb)