Add pipeline tag, library name and content from the Github README

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +137 -23
README.md CHANGED
@@ -1,7 +1,9 @@
1
  ---
2
- license: apache-2.0
3
  datasets:
4
  - omniaudio/Sphere360
 
 
 
5
  tags:
6
  - 360V2SA
7
  - audio
@@ -14,51 +16,163 @@ paper: arxiv:2504.14906
14
 
15
  # 🎧 [ICML 2025]OmniAudio: Generating Spatial Audio from 360-Degree Video
16
 
17
-
18
-
19
-
20
  <p align="center"> If you find this project useful, a star ⭐ on GitHub would be greatly appreciated! </p>
21
 
22
  <p align="center">
23
-
24
  <a href="https://OmniAudio-360v2sa.github.io/">🌐 Online Demo</a>
25
  <a href="https://github.com/liuhuadai/OmniAudio">🌐 Github Repository</a>
26
  </p>
27
 
28
- ## πŸ“¦ Code
 
 
 
29
 
30
- For code, models, and additional resources, please visit our GitHub repository:
31
- πŸ‘‰ [**OmniAudio GitHub Repository**](https://github.com/liuhuadai/OmniAudio)
 
 
 
32
 
33
  ---
34
 
 
35
 
 
36
 
 
37
 
38
- ## πŸ“‘ Citation
39
 
40
- If **OmniAudio** contributes to your research or applications, please cite it using the following BibTeX entry:
41
 
42
- ```bibtex
43
- @misc{liu2025omniaudio,
44
- title = {OmniAudio: Generating Spatial Audio from 360-Degree Video},
45
- author = {Huadai Liu and Tianyi Luo and Qikai Jiang and Kaicheng Luo and Peiwen Sun and Jialei Wan and Rongjie Huang and Qian Chen and Wen Wang and Xiangtai Li and Shiliang Zhang and Zhijie Yan and Zhou Zhao and Wei Xue},
46
- year = {2025},
47
- eprint = {2504.14906},
48
- archivePrefix = {arXiv},
49
- primaryClass = {eess.AS},
50
- url = {https://arxiv.org/abs/2504.14906}
51
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ```
 
 
 
 
 
 
 
 
 
 
53
 
54
  ---
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  ---
58
 
59
- πŸ’‘ *Have fun experimenting with OmniAudio! πŸ› οΈπŸ’–*
60
 
 
 
 
 
 
61
 
62
  ---
63
- license: apache-2.0
64
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  datasets:
3
  - omniaudio/Sphere360
4
+ license: apache-2.0
5
+ library_name: diffusers
6
+ pipeline_tag: audio-to-audio
7
  tags:
8
  - 360V2SA
9
  - audio
 
16
 
17
  # 🎧 [ICML 2025]OmniAudio: Generating Spatial Audio from 360-Degree Video
18
 
 
 
 
19
  <p align="center"> If you find this project useful, a star ⭐ on GitHub would be greatly appreciated! </p>
20
 
21
  <p align="center">
 
22
  <a href="https://OmniAudio-360v2sa.github.io/">🌐 Online Demo</a>
23
  <a href="https://github.com/liuhuadai/OmniAudio">🌐 Github Repository</a>
24
  </p>
25
 
26
+ [![Demo Video](https://img.shields.io/badge/Demo-Video-blue?style=for-the-badge)](https://streamable.com/pqwvji)
27
+
28
+ ---
29
+ ## πŸ—žοΈ News
30
 
31
+ * **\[2025.02]** πŸ”₯ [Online Demo](https://OmniAudio-360v2sa.github.io/) is live β€” try it now!
32
+ * **\[2025.04]** πŸ”₯ [OmniAudio paper](https://arxiv.org/pdf/2504.14906) is released on arXiv.
33
+ * **\[2025.05]** πŸŽ‰ OmniAudio has been accepted by **ICML 2025**, See you Vancouver!
34
+ * **\[2025.05]** πŸ”₯ Released inference code and OmniAudio dataset.
35
+ * **\[2025.05]** πŸ“¦ Released pretrained model weights and dataset on Hugging Face.
36
 
37
  ---
38
 
39
+ βœ¨πŸ”Š Transform your 360-degree videos into immersive spatial audio! 🌍🎢
40
 
41
+ <img src="assets/figure1-a.png" width="45%"> <img src="assets/figure1-b.png" width="45%">
42
 
43
+ PyTorch Implementation of **OmniAudio**, a model for generating spatial audio from 360-degree videos.
44
 
45
+ The **[checkpoints](https://huggingface.co/OmniAudio/OmniAudio360V2SA)** and the **[Sphere360 dataset](https://huggingface.co/datasets/OmniAudio/Sphere360)** are now publicly available on Hugging Face.
46
 
47
+ ---
48
 
49
+ ## 🧠 Model Architecture & Demo
50
+
51
+ The overall architecture of OmniAudio is shown below:
52
+
53
+ <img src="assets/framework.png" width="100%">
54
+
55
+ Curious about the results? 🎧🌐
56
+ πŸ‘‰ **[Try our demo page here!](https://OmniAudio-360v2sa.github.io/)**
57
+
58
+ ---
59
+
60
+ ## 🎬 Quick Start
61
+ We provide an example of how you can perform inference using OmniAudio.
62
+
63
+ ### πŸƒ Inference with Pretrained Model
64
+ To run inference, follow these steps:
65
+
66
+ 1️⃣ **Navigate to the root directory.** πŸ“‚
67
+ 2️⃣ **Create the Inference Environment.**
68
+
69
+ To set up the environment, ensure you have **Python >= 3.8.20** installed. Then, run the following commands:
70
+
71
+ ```bash
72
+ pip install -r requirements.txt
73
+ pip install git+https://github.com/patrick-kidger/torchcubicspline.git
74
  ```
75
+
76
+ 3️⃣ **Run inference with the provided script:**
77
+ ```bash
78
+ bash demo.sh video_path cuda_id
79
+ ```
80
+ πŸ’‘ *You can also modify `demo.sh` to change the output directory.* The `cases` folder contains some sample 360-degree videos in the equirectangular formatβ€”make sure your videos follow the same format! πŸŽ₯✨
81
+
82
+ By default, the script will automatically **download the pretrained model checkpoint** from our [HuggingFace repository](https://huggingface.co/OmniAudio/OmniAudio360V2SA) if no custom checkpoint is specified.
83
+
84
+ If you wish to use your **own trained model**, you can modify `demo.sh` to explicitly pass `--ckpt-path` and point to your checkpoint directory.
85
 
86
  ---
87
 
88
+ ## πŸ“¦ Dataset: Sphere360
89
+
90
+ We provide **Sphere360**, a large-scale, high-quality dataset of paired 360-degree video and spatial audio clips, specifically curated to support training and evaluation of spatial audio generation models like OmniAudio.
91
+
92
+ The dataset includes:
93
+
94
+ * **Over 103,000** 10-second clips
95
+ * **288 hours** of total spatial content
96
+ * Paired **equirectangular 360-degree video** and **first-order ambisonics (FOA)** 4-channel audio (W, X, Y, Z)
97
+
98
+ ### πŸ“ Access and Structure
99
+
100
+ To explore or use the dataset, follow these steps:
101
+
102
+ 1️⃣ **Navigate to the dataset folder**:
103
+
104
+ ```bash
105
+ cd Sphere360
106
+ ```
107
+
108
+ 2️⃣ **Refer to the detailed usage guide** in the README file:
109
+ πŸ“– [Sphere360 Dataset README](Sphere360/README.md)
110
+
111
+ Inside the directory, you’ll find:
112
+
113
+ * `dataset/`: contains split configurations, metadata, and channel information
114
+ * `toolset/`: crawling and cleaning tools for dataset construction
115
+ * `docs/`: figures and documentation describing the pipeline
116
 
117
  ---
118
 
119
+ ### πŸ”€ Dataset Split
120
 
121
+ The dataset is splited as follows (see `dataset/split/`):
122
+
123
+ * **Training set**: \~100.5k samples
124
+ * **Test set**: \~3k samples
125
+ * **Each sample**: 10 seconds of paired video and audio
126
 
127
  ---
128
+
129
+ ### πŸ› οΈ Data Collection & Cleaning
130
+
131
+ The dataset is constructed via a two-stage crawling and filtering pipeline as follows:
132
+
133
+ * **Crawling**
134
+
135
+ * Uses the [YouTube API](https://developers.google.com/youtube/v3/)
136
+ * Retrieves videos by channel and keyword-based queries
137
+ * Employs `yt-dlp` and `FFmpeg` to download and process audio/video streams
138
+ * Details: [docs/crawl.md](Sphere360/docs/crawl.md)
139
+
140
+ * **Cleaning**
141
+
142
+ * Filters out content using the following criteria:
143
+
144
+ * **Silent audio**
145
+ * **Static frames**
146
+ * **Audio-visual mismatches**
147
+ * **Human voice presence**
148
+ * Relies on models like [ImageBind](https://github.com/facebookresearch/ImageBind) and [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
149
+ * Details: [docs/clean.md](Sphere360/docs/clean.md)
150
+
151
+ ---
152
+
153
+ ### ⚠️ Legal Notice & Licensing
154
+
155
+ * All videos are collected from YouTube under terms consistent with fair use for academic research.
156
+ * Videos under Creative Commons licenses are properly attributed.
157
+ * No video is used for commercial purposes.
158
+ * All channel metadata is recorded in `dataset/channels.csv`.
159
+
160
+ ---
161
+
162
+ ## πŸ“‘ Citation
163
+
164
+ If OmniAudio contributes to your research or applications, please cite it using the following BibTeX entry:
165
+ ```bibtex
166
+ @misc{liu2025omniaudiogeneratingspatialaudio,
167
+ title={OmniAudio: Generating Spatial Audio from 360-Degree Video},
168
+ author={Huadai Liu and Tianyi Luo and Qikai Jiang and Kaicheng Luo and Peiwen Sun and Jialei Wan and Rongjie Huang and Qian Chen and Wen Wang and Xiangtai Li and Shiliang Zhang and Zhijie Yan and Zhou Zhao and Wei Xue},
169
+ year={2025},
170
+ eprint={2504.14906},
171
+ archivePrefix={arXiv},
172
+ primaryClass={eess.AS},
173
+ url={https://arxiv.org/abs/2504.14906},
174
+ }
175
+ ```
176
+ ---
177
+
178
+ πŸ’‘ *Have fun experimenting with OmniAudio! If you have any comments or questions, feel free to contact liuhuadai@zju.edu.cn πŸ› οΈπŸ’–*