Foley-Omni

GitHub Code | arXiv | Demo

Overview

This repository packages the public inference checkpoint set for Foley-Omni. The release focuses on Video-to-Soundtrack (V2ST) generation, where the model jointly generates synchronized speech, sound effects, and music from a video and optional text prompt.

Model Size

5.5B

Repository Contents

ckpts/
β”œβ”€β”€ Foley-Omni/
β”‚   └── v2st.pth
β”œβ”€β”€ Wan2.2-TI2V-5B/
β”‚   β”œβ”€β”€ models_t5_umt5-xxl-enc-bf16.pth
β”‚   └── google/
β”‚       └── umt5-xxl/
β”‚           β”œβ”€β”€ special_tokens_map.json
β”‚           β”œβ”€β”€ spiece.model
β”‚           β”œβ”€β”€ tokenizer.json
β”‚           └── tokenizer_config.json
└── mmaudio/
    └── ext_weights/
        β”œβ”€β”€ v1-16.pth
        β”œβ”€β”€ best_netG.pt
        └── synchformer_state_dict.pth

What each part is used for:

  • ckpts/Foley-Omni/v2st.pth: released inference-only Foley-Omni weights
  • ckpts/Wan2.2-TI2V-5B/*: text encoder and tokenizer for text conditioning
  • ckpts/mmaudio/ext_weights/v1-16.pth: audio VAE for the 16 kHz inference path
  • ckpts/mmaudio/ext_weights/best_netG.pt: vocoder for waveform decoding
  • ckpts/mmaudio/ext_weights/synchformer_state_dict.pth: online visual feature extraction

Online Feature Extraction

This release supports both:

  • direct V2ST inference with pre-extracted clip_feature_path and sync_feature_path
  • V2ST inference without pre-extracted features, using online visual feature extraction

Notes:

  • synchformer_state_dict.pth is included in this repository because it is required for online Sync feature extraction.
  • The CLIP image encoder is loaded by open_clip from apple/DFN5B-CLIP-ViT-H-14-384 on first use. The current code path does not use a separate local CLIP checkpoint file.

Source Attribution

This repository redistributes a small subset of files from the following upstream releases for convenience:

  • Wan2.2-TI2V-5B: text encoder and tokenizer files
  • MMAudio: audio VAE, vocoder, and Synchformer files

Please refer to the original upstream repositories for their licenses, usage terms, and project details.

Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for CocoBro/Foley-Omni