Duplicated from bharatgenai/sooktam2

Renderlib-dev
/

sooktam2

Model card Files Files and versions

sooktam2 / src /f5_tts /model /backbones /README.md

Renderlib-dev's picture

Duplicate from bharatgenai/sooktam2

bccbc5b 1 day ago

|

History Blame Contribute Delete

708 Bytes

Backbones quick introduction

unett.py

flat unet transformer
structure same as in e2-tts & voicebox paper except using rotary pos emb
possible abs pos emb & convnextv2 blocks for embedded text before concat

dit.py

adaln-zero dit
embedded timestep as condition
concatted noised_input + masked_cond + embedded_text, linear proj in
possible abs pos emb & convnextv2 blocks for embedded text before concat
possible long skip connection (first layer to last layer)

mmdit.py

stable diffusion 3 block structure
timestep as condition
left stream: text embedded and applied a abs pos emb
right stream: masked_cond & noised_input concatted and with same conv pos emb as unett