Any-to-Any
Transformers
Safetensors
English
xoron
multimodal
Mixture of Experts
text-to-image
image editing
image to video
text-to-video
video editing
text-to-speech
speech-to-text
speech-to-speech
image-to-text
video-to-text
agentic
tool-use
flow-matching
3d-rope
titok
vidtok
dual-stream-attention
zero-shot-voice-cloning
bigvgan
snake-activation
multi-receptive-field-fusion
custom_code
| model.safetensors filter=lfs diff=lfs merge=lfs -text | |
| tokenizer.json filter=lfs diff=lfs merge=lfs -text | |
| training_state.pt filter=lfs diff=lfs merge=lfs -text | |
| audio_decoder.safetensors filter=lfs diff=lfs merge=lfs -text | |
| audio_encoder.safetensors filter=lfs diff=lfs merge=lfs -text | |
| audio_projector.safetensors filter=lfs diff=lfs merge=lfs -text | |
| cross_attention.safetensors filter=lfs diff=lfs merge=lfs -text | |
| generator.safetensors filter=lfs diff=lfs merge=lfs -text | |
| llm.safetensors filter=lfs diff=lfs merge=lfs -text | |
| projector.safetensors filter=lfs diff=lfs merge=lfs -text | |
| video_encoder.safetensors filter=lfs diff=lfs merge=lfs -text | |
| video_generator.safetensors filter=lfs diff=lfs merge=lfs -text | |
| vision_encoder.safetensors filter=lfs diff=lfs merge=lfs -text | |
| waveform_decoder.safetensors filter=lfs diff=lfs merge=lfs -text | |
| assets/IMG_2925.PNG filter=lfs diff=lfs merge=lfs -text | |
| assets/IMG_2967.png filter=lfs diff=lfs merge=lfs -text | |
| assets/IMG_2970.png filter=lfs diff=lfs merge=lfs -text | |