tortoise-base / models

Commit History

remove redundant phonemize for vall-e (oops), quantize all files and then phonemize all files for cope optimization, load alignment model once instead of for every transcription (speedup with whisperx)
d2a9ab9

mrq commited on

oops
da96161

mrq commited on

cleanups, realigning vall-e training
f822c87

mrq commited on

VALL-E config edits
34ef046

mrq commited on

added japanese tokenizer (experimental)
b17260c

mrq commited on

cleanup, metrics are grabbed for vall-e trainer
249c601

mrq commited on

forgot to separate phonemes by spaces for [redacted]
1b72d0b

mrq commited on

cleaned up some prepare dataset code
d4c5096

mrq commited on

unk hunting
1a8c5de

mrq commited on

oops
da4f926

mrq commited on

preparations for training an IPA-based finetune
ee8270b

mrq commited on

added options to pick tokenizer json and diffusion model (so I don't have to add it in later when I get bored and add in diffusion training)
363d0b0

mrq commited on

removed redundant training data (they exist within tortoise itself anyways), added utility: view tokenized text
07b684c

mrq commited on

;)
7b16b3e

mrq commited on

(:
c85e32f

mrq commited on

:)
54036fd

mrq commited on

added mel LR weight (as I finally understand when to adjust the text), added text validation on dataset creation
66ac8ba

mrq commited on

cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription
2feb6da

ecker commited on

only God knows why the YAML spec lets you specify string values without quotes
d318400

ecker commited on

added the mysterious tortoise_compat flag mentioned in DLAS repo
b8867a5

ecker commited on

forgot template
b0baa19

ecker commited on

big cleanup to make my life easier when i add more parameters
3f321fe

ecker commited on

actually make using adamw_zero optimizer for multi-gpus work
34dcb84

ecker commited on

disable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset)
ff07f70

ecker commited on

made validation working (will document later)
b4098dc

ecker commited on

set validation to save rate and validation file if exists (need to test later)
e862169

ecker commited on

added option to set worker size in training config generator (because the default is overkill), for whisper transcriptions, load a specialized language model if it exists (for now, only english), output transcription to web UI when done transcribing
3e220ed

ecker commited on

renamed mega batch factor to an actual real term: gradient accumulation factor, fixed halting training not actually killing the training process and freeing up resources, some logic cleanup for gradient accumulation (so many brain worms and wrong assumptions from testing on low batch sizes) (read the training section in the wiki for more details)
df24827

ecker commited on

added new training tunable: loss_text_ce_loss weight, added option to specify source model in case you want to finetune a finetuned model (for example, train a Japanese finetune on a large dataset, then finetune for a specific voice, need to truly validate if it produces usable output), some bug fixes that came up for some reason now and not earlier
c2726fa

ecker commited on

huge success
225dee2

ecker commited on

Added very experimental float16 training for cards with not enough VRAM (10GiB and below, maybe) \!NOTE\! this is VERY EXPERIMETNAL, I have zero free time to validate it right now, I'll do it later
8a1a48f

ecker commited on

added more safeties and parameters to training yaml generator, I think I tested it extensively enough
092dd7b

ecker commited on

oops
cf758f4

ecker commited on

added dropdown to select autoregressive model for TTS, fixed a bug where the settings saveer constantly fires I hate gradio so much why are dropdown.change broken to contiuously fire and send an empty array
2615caf

ecker commited on

a bit of UI cleanup, import multiple audio files at once, actually shows progress when importing voices, hides audio metadata / latents if no generated settings are detected, preparing datasets shows its progress, saving a training YAML shows a message when done, training now works within the web UI, training output shows to web UI, provided notebook is cleaned up and uses a venv, etc.
d5c1433

ecker commited on

almost
229be0b

ecker commited on