tungnaa-models-public / models /tts /tungnaa_117_jvs.md

add JVS and VCTK models

9a604e2 8 months ago

1.23 kB

	---
	block_size: 2048
	sample_rate: 44100
	latent_size: 12
	vocoder: "042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts"
	dataset: "John Van Stan (LibriTTS)"
	vocoder_type: "RAVE"
	alignment_type: "DCA"
	likelihood_type: "NSF"
	text_encoder_type: "CANINE"
	---

	# tungnaa_116_jvs

	### dimensions

	block size: 2048

	sample rate: 44100

	latent size: 12

	### dataset

	JVS (Hi-Fi TTS speaker 9017)

	### vocoder

	`models/vocoder/042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts`

	### training

	tungnaa commit `09ecdcd532eac3d454a8b4e28e896bca5bccbf9f`

	```bash
	tungnaa trainer --experiment 117-jvs-e2emulti-mask-ends --model-dir /data/users/victor/ivoice-models --log-dir /data/users/victor/ivoice-logs --manifest /data/users/victor/tmp/ivoice_prep_100m_0abe_multi/9017_manifest_clean_train.json --rave-model /data/users/victor/rave-v2/runs/042-jvs-100m-xfermulti_0abe2b072b/version_0/checkpoints/042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts --lr 3e-4 --lr-text 3e-5 --epoch-size 200 --save-epochs 20 --device cuda:0 train
	```

	### notes

	trained with full JVS dataset, no annotations.

	uses a 12-dimensional vocoder trained with a subset of JVS, fine tuned from a multivoice model.

	this model uses a neural spline flow likelihood.