File size: 1,217 Bytes
ba722b4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | ---
license: mit
base_model:
- FunAudioLLM/PrismAudio
tags:
- audio
- video2audio
- generation
- safetensors
pipeline_tag: text-to-audio
---
# PrismAudio Models (SafeTensors Mirror)
Mirrored and converted from [FunAudioLLM/PrismAudio](https://huggingface.co/FunAudioLLM/PrismAudio).
All weights have been converted from PyTorch `.ckpt`/`.pth` to **SafeTensors** format for:
- ✅ Faster loading
- ✅ Memory-mapped I/O
- ✅ No arbitrary code execution risk
## Files
| File | Description |
|------|-------------|
| `prismaudio.safetensors` | Main PrismAudio model weights (518M params) |
| `synchformer_state_dict.safetensors` | Synchformer temporal alignment encoder |
| `vae.safetensors` | Oobleck VAE decoder |
## Usage
These weights are used by the MAESTRO AI Workstation's PrismAudio panel for
decomposed Chain-of-Thought video-to-audio generation.
## Citation
```bibtex
@misc{liu2025thinksound,
title={ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing},
author={Huadai Liu and Jialei Wang and Kaicheng Luo and Wen Wang and Qian Chen and Zhou Zhao and Wei Xue},
year={2025},
eprint={2506.21448},
archivePrefix={arXiv},
}
```
|