indicF5

Sleeping

ashishkblink commited on Jan 5

Commit

e4f6994

verified ·

1 Parent(s): 295e206

Upload f5_tts/model/backbones/README.md with huggingface_hub

Files changed (1) hide show

f5_tts/model/backbones/README.md ADDED Viewed

+## Backbones quick introduction
+### unett.py
+- flat unet transformer
+- structure same as in e2-tts & voicebox paper except using rotary pos emb
+- update: allow possible abs pos emb & convnextv2 blocks for embedded text before concat
+### dit.py
+- adaln-zero dit
+- embedded timestep as condition
+- concatted noised_input + masked_cond + embedded_text, linear proj in
+- possible abs pos emb & convnextv2 blocks for embedded text before concat
+- possible long skip connection (first layer to last layer)
+### mmdit.py
+- sd3 structure
+- timestep as condition
+- left stream: text embedded and applied a abs pos emb
+- right stream: masked_cond & noised_input concatted and with same conv pos emb as unett