Spaces:

FunAudioLLM
/

ThinkSound

Running on Zero

liuhuadai commited on Jul 1

Commit

2056f15

verified ·

1 Parent(s): 0690e84

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -329,7 +329,33 @@ def synthesize_video_with_audio(video_file, caption):
 # Gradio界面
 with gr.Blocks() as demo:
-    gr.Markdown("# ThinkSound\nupload video and caption(optional), and get video with audio!")
     with gr.Row():
         video_input = gr.Video(label="upload video")
         caption_input = gr.Textbox(label="caption(optional)", placeholder="can be empty", lines=1)

 # Gradio界面
 with gr.Blocks() as demo:
+    gr.Markdown(
+        """
+# ThinkSound\n
+ThinkSound is a unified Any2Audio generation framework with flow matching guided by Chain-of-Thought (CoT) reasoning.
+Upload video and caption (optional), and get video with audio!
+[Project page is here](https://huggingface.co/spaces/FunAudioLLM/ThinkSound)
+[Model weights is here](https://huggingface.co/liuhuadai/ThinkSound)
+## Citation
+If you find our work useful, please cite our paper:
+```bibtex
+@misc{liu2025thinksoundchainofthoughtreasoningmultimodal,
+    title={ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing},
+    author={Huadai Liu and Jialei Wang and Kaicheng Luo and Wen Wang and Qian Chen and Zhou Zhao and Wei Xue},
+    year={2025},
+    eprint={2506.21448},
+    archivePrefix={arXiv},
+    primaryClass={eess.AS},
+    url={https://arxiv.org/abs/2506.21448},
+}
+```
+"""
+    )
     with gr.Row():
         video_input = gr.Video(label="upload video")
         caption_input = gr.Textbox(label="caption(optional)", placeholder="can be empty", lines=1)