File size: 1,217 Bytes
ba722b4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: mit
base_model:
  - FunAudioLLM/PrismAudio
tags:
  - audio
  - video2audio
  - generation
  - safetensors
pipeline_tag: text-to-audio
---

# PrismAudio Models (SafeTensors Mirror)

Mirrored and converted from [FunAudioLLM/PrismAudio](https://huggingface.co/FunAudioLLM/PrismAudio).

All weights have been converted from PyTorch `.ckpt`/`.pth` to **SafeTensors** format for:
- ✅ Faster loading
- ✅ Memory-mapped I/O
- ✅ No arbitrary code execution risk

## Files

| File | Description |
|------|-------------|
| `prismaudio.safetensors` | Main PrismAudio model weights (518M params) |
| `synchformer_state_dict.safetensors` | Synchformer temporal alignment encoder |
| `vae.safetensors` | Oobleck VAE decoder |

## Usage

These weights are used by the MAESTRO AI Workstation's PrismAudio panel for
decomposed Chain-of-Thought video-to-audio generation.

## Citation

```bibtex
@misc{liu2025thinksound,
  title={ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing},
  author={Huadai Liu and Jialei Wang and Kaicheng Luo and Wen Wang and Qian Chen and Zhou Zhao and Wei Xue},
  year={2025},
  eprint={2506.21448},
  archivePrefix={arXiv},
}
```