AEmotionStudio
/

prismaudio-models

Model card Files Files and versions

prismaudio-models / README.md

AEmotionStudio's picture

Upload README.md with huggingface_hub

ba722b4 verified 23 days ago

|

history blame contribute delete

1.22 kB

	---
	license: mit
	base_model:
	- FunAudioLLM/PrismAudio
	tags:
	- audio
	- video2audio
	- generation
	- safetensors
	pipeline_tag: text-to-audio
	---

	# PrismAudio Models (SafeTensors Mirror)

	Mirrored and converted from [FunAudioLLM/PrismAudio](https://huggingface.co/FunAudioLLM/PrismAudio).

	All weights have been converted from PyTorch `.ckpt`/`.pth` to SafeTensors format for:
	- ✅ Faster loading
	- ✅ Memory-mapped I/O
	- ✅ No arbitrary code execution risk

	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `prismaudio.safetensors` \| Main PrismAudio model weights (518M params) \|
	\| `synchformer_state_dict.safetensors` \| Synchformer temporal alignment encoder \|
	\| `vae.safetensors` \| Oobleck VAE decoder \|

	## Usage

	These weights are used by the MAESTRO AI Workstation's PrismAudio panel for
	decomposed Chain-of-Thought video-to-audio generation.

	## Citation

	```bibtex
	@misc{liu2025thinksound,
	title={ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing},
	author={Huadai Liu and Jialei Wang and Kaicheng Luo and Wen Wang and Qian Chen and Zhou Zhao and Wei Xue},
	year={2025},
	eprint={2506.21448},
	archivePrefix={arXiv},
	}
	```