Respair
/

RiFornet_Vocoder

Model card Files Files and versions

RiFornet_Vocoder / README.md

Respair's picture

Create README.md

78ff1ca verified 12 months ago

|

854 Bytes

	---
	license: mit
	tags:
	- vocoder
	- audio
	- speech
	- tts
	---

	# Model Card for Model ID

	This Vocoder, is a combination of [HiFTnet](https://github.com/yl4579/HiFTNet) and [Ringformer](https://github.com/seongho608/RingFormer). it supports Ring Attention, Conformer and Neural Source Filtering etc.
	This repository is experimental, expect some bugs and some hardcoded params.

	The default setting is 44.1khz - 128 Mel bin. if you want to change it to 24khz, copy the config from HiFTnet (make sure to copy its pitch extractor, both the model + the checkpoint.), then change 128 to 80 in LN-384 of the models.py. then uncomment the "multiscale_subband_cfg" for the 24khz version.

	Huge Thanks to [Johnathan Duering](https://github.com/duerig) for his help. I mostly implemented this based on his [STTS2 Fork](https://github.com/duerig/StyleTTS2/tree/main)