Missing VAE / Text Encoder files for inference (only model.pt provided)

by rikunarita - opened Mar 19

Mar 19

I’m trying to use SongGeneration-v2-medium, but I noticed that the repository only provides a model.pt file.

For running inference, it seems that additional components are required, such as:

VAE (for audio latent decoding)

Text encoder (for prompt processing)

Possibly tokenizer or config files

However, I could not find these files in the repository.

Could you clarify:

Are these components included inside model.pt, or should they be provided separately?

If they are separate, where can I download the correct versions?

Is there an official inference pipeline or example (e.g., with ACE-Step-like workflow)?

Any guidance would be greatly appreciated.

Thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment