| license: mit | |
| base_model: | |
| - FunAudioLLM/PrismAudio | |
| tags: | |
| - audio | |
| - video2audio | |
| - generation | |
| - safetensors | |
| pipeline_tag: text-to-audio | |
| # PrismAudio Models (SafeTensors Mirror) | |
| Mirrored and converted from [FunAudioLLM/PrismAudio](https://huggingface.co/FunAudioLLM/PrismAudio). | |
| All weights have been converted from PyTorch `.ckpt`/`.pth` to **SafeTensors** format for: | |
| - ✅ Faster loading | |
| - ✅ Memory-mapped I/O | |
| - ✅ No arbitrary code execution risk | |
| ## Files | |
| | File | Description | | |
| |------|-------------| | |
| | `prismaudio.safetensors` | Main PrismAudio model weights (518M params) | | |
| | `synchformer_state_dict.safetensors` | Synchformer temporal alignment encoder | | |
| | `vae.safetensors` | Oobleck VAE decoder | | |
| ## Usage | |
| These weights are used by the MAESTRO AI Workstation's PrismAudio panel for | |
| decomposed Chain-of-Thought video-to-audio generation. | |
| ## Citation | |
| ```bibtex | |
| @misc{liu2025thinksound, | |
| title={ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing}, | |
| author={Huadai Liu and Jialei Wang and Kaicheng Luo and Wen Wang and Qian Chen and Zhou Zhao and Wei Xue}, | |
| year={2025}, | |
| eprint={2506.21448}, | |
| archivePrefix={arXiv}, | |
| } | |
| ``` | |