--- license: mit base_model: - FunAudioLLM/PrismAudio tags: - audio - video2audio - generation - safetensors pipeline_tag: text-to-audio --- # PrismAudio Models (SafeTensors Mirror) Mirrored and converted from [FunAudioLLM/PrismAudio](https://huggingface.co/FunAudioLLM/PrismAudio). All weights have been converted from PyTorch `.ckpt`/`.pth` to **SafeTensors** format for: - ✅ Faster loading - ✅ Memory-mapped I/O - ✅ No arbitrary code execution risk ## Files | File | Description | |------|-------------| | `prismaudio.safetensors` | Main PrismAudio model weights (518M params) | | `synchformer_state_dict.safetensors` | Synchformer temporal alignment encoder | | `vae.safetensors` | Oobleck VAE decoder | ## Usage These weights are used by the MAESTRO AI Workstation's PrismAudio panel for decomposed Chain-of-Thought video-to-audio generation. ## Citation ```bibtex @misc{liu2025thinksound, title={ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing}, author={Huadai Liu and Jialei Wang and Kaicheng Luo and Wen Wang and Qian Chen and Zhou Zhao and Wei Xue}, year={2025}, eprint={2506.21448}, archivePrefix={arXiv}, } ```