Missing VAE / Text Encoder files for inference (only model.pt provided)

#1
by rikunarita - opened

Iโ€™m trying to use SongGeneration-v2-medium, but I noticed that the repository only provides a model.pt file.

For running inference, it seems that additional components are required, such as:

VAE (for audio latent decoding)

Text encoder (for prompt processing)

Possibly tokenizer or config files

However, I could not find these files in the repository.

Could you clarify:

Are these components included inside model.pt, or should they be provided separately?

If they are separate, where can I download the correct versions?

Is there an official inference pipeline or example (e.g., with ACE-Step-like workflow)?

Any guidance would be greatly appreciated.

Thank you!

Sign up or log in to comment