Add pipeline tag, library name and content from the Github README
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,7 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
datasets:
|
| 4 |
- omniaudio/Sphere360
|
|
|
|
|
|
|
|
|
|
| 5 |
tags:
|
| 6 |
- 360V2SA
|
| 7 |
- audio
|
|
@@ -14,51 +16,163 @@ paper: arxiv:2504.14906
|
|
| 14 |
|
| 15 |
# π§ [ICML 2025]OmniAudio: Generating Spatial Audio from 360-Degree Video
|
| 16 |
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
<p align="center"> If you find this project useful, a star β on GitHub would be greatly appreciated! </p>
|
| 21 |
|
| 22 |
<p align="center">
|
| 23 |
-
|
| 24 |
<a href="https://OmniAudio-360v2sa.github.io/">π Online Demo</a>
|
| 25 |
<a href="https://github.com/liuhuadai/OmniAudio">π Github Repository</a>
|
| 26 |
</p>
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
---
|
| 34 |
|
|
|
|
| 35 |
|
|
|
|
| 36 |
|
|
|
|
| 37 |
|
| 38 |
-
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
---
|
| 55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
---
|
| 58 |
|
| 59 |
-
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
---
|
| 63 |
-
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
datasets:
|
| 3 |
- omniaudio/Sphere360
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
library_name: diffusers
|
| 6 |
+
pipeline_tag: audio-to-audio
|
| 7 |
tags:
|
| 8 |
- 360V2SA
|
| 9 |
- audio
|
|
|
|
| 16 |
|
| 17 |
# π§ [ICML 2025]OmniAudio: Generating Spatial Audio from 360-Degree Video
|
| 18 |
|
|
|
|
|
|
|
|
|
|
| 19 |
<p align="center"> If you find this project useful, a star β on GitHub would be greatly appreciated! </p>
|
| 20 |
|
| 21 |
<p align="center">
|
|
|
|
| 22 |
<a href="https://OmniAudio-360v2sa.github.io/">π Online Demo</a>
|
| 23 |
<a href="https://github.com/liuhuadai/OmniAudio">π Github Repository</a>
|
| 24 |
</p>
|
| 25 |
|
| 26 |
+
[](https://streamable.com/pqwvji)
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
## ποΈ News
|
| 30 |
|
| 31 |
+
* **\[2025.02]** π₯ [Online Demo](https://OmniAudio-360v2sa.github.io/) is live β try it now!
|
| 32 |
+
* **\[2025.04]** π₯ [OmniAudio paper](https://arxiv.org/pdf/2504.14906) is released on arXiv.
|
| 33 |
+
* **\[2025.05]** π OmniAudio has been accepted by **ICML 2025**, See you Vancouver!
|
| 34 |
+
* **\[2025.05]** π₯ Released inference code and OmniAudio dataset.
|
| 35 |
+
* **\[2025.05]** π¦ Released pretrained model weights and dataset on Hugging Face.
|
| 36 |
|
| 37 |
---
|
| 38 |
|
| 39 |
+
β¨π Transform your 360-degree videos into immersive spatial audio! ππΆ
|
| 40 |
|
| 41 |
+
<img src="assets/figure1-a.png" width="45%"> <img src="assets/figure1-b.png" width="45%">
|
| 42 |
|
| 43 |
+
PyTorch Implementation of **OmniAudio**, a model for generating spatial audio from 360-degree videos.
|
| 44 |
|
| 45 |
+
The **[checkpoints](https://huggingface.co/OmniAudio/OmniAudio360V2SA)** and the **[Sphere360 dataset](https://huggingface.co/datasets/OmniAudio/Sphere360)** are now publicly available on Hugging Face.
|
| 46 |
|
| 47 |
+
---
|
| 48 |
|
| 49 |
+
## π§ Model Architecture & Demo
|
| 50 |
+
|
| 51 |
+
The overall architecture of OmniAudio is shown below:
|
| 52 |
+
|
| 53 |
+
<img src="assets/framework.png" width="100%">
|
| 54 |
+
|
| 55 |
+
Curious about the results? π§π
|
| 56 |
+
π **[Try our demo page here!](https://OmniAudio-360v2sa.github.io/)**
|
| 57 |
+
|
| 58 |
+
---
|
| 59 |
+
|
| 60 |
+
## π¬ Quick Start
|
| 61 |
+
We provide an example of how you can perform inference using OmniAudio.
|
| 62 |
+
|
| 63 |
+
### π Inference with Pretrained Model
|
| 64 |
+
To run inference, follow these steps:
|
| 65 |
+
|
| 66 |
+
1οΈβ£ **Navigate to the root directory.** π
|
| 67 |
+
2οΈβ£ **Create the Inference Environment.**
|
| 68 |
+
|
| 69 |
+
To set up the environment, ensure you have **Python >= 3.8.20** installed. Then, run the following commands:
|
| 70 |
+
|
| 71 |
+
```bash
|
| 72 |
+
pip install -r requirements.txt
|
| 73 |
+
pip install git+https://github.com/patrick-kidger/torchcubicspline.git
|
| 74 |
```
|
| 75 |
+
|
| 76 |
+
3οΈβ£ **Run inference with the provided script:**
|
| 77 |
+
```bash
|
| 78 |
+
bash demo.sh video_path cuda_id
|
| 79 |
+
```
|
| 80 |
+
π‘ *You can also modify `demo.sh` to change the output directory.* The `cases` folder contains some sample 360-degree videos in the equirectangular formatβmake sure your videos follow the same format! π₯β¨
|
| 81 |
+
|
| 82 |
+
By default, the script will automatically **download the pretrained model checkpoint** from our [HuggingFace repository](https://huggingface.co/OmniAudio/OmniAudio360V2SA) if no custom checkpoint is specified.
|
| 83 |
+
|
| 84 |
+
If you wish to use your **own trained model**, you can modify `demo.sh` to explicitly pass `--ckpt-path` and point to your checkpoint directory.
|
| 85 |
|
| 86 |
---
|
| 87 |
|
| 88 |
+
## π¦ Dataset: Sphere360
|
| 89 |
+
|
| 90 |
+
We provide **Sphere360**, a large-scale, high-quality dataset of paired 360-degree video and spatial audio clips, specifically curated to support training and evaluation of spatial audio generation models like OmniAudio.
|
| 91 |
+
|
| 92 |
+
The dataset includes:
|
| 93 |
+
|
| 94 |
+
* **Over 103,000** 10-second clips
|
| 95 |
+
* **288 hours** of total spatial content
|
| 96 |
+
* Paired **equirectangular 360-degree video** and **first-order ambisonics (FOA)** 4-channel audio (W, X, Y, Z)
|
| 97 |
+
|
| 98 |
+
### π Access and Structure
|
| 99 |
+
|
| 100 |
+
To explore or use the dataset, follow these steps:
|
| 101 |
+
|
| 102 |
+
1οΈβ£ **Navigate to the dataset folder**:
|
| 103 |
+
|
| 104 |
+
```bash
|
| 105 |
+
cd Sphere360
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
2οΈβ£ **Refer to the detailed usage guide** in the README file:
|
| 109 |
+
π [Sphere360 Dataset README](Sphere360/README.md)
|
| 110 |
+
|
| 111 |
+
Inside the directory, youβll find:
|
| 112 |
+
|
| 113 |
+
* `dataset/`: contains split configurations, metadata, and channel information
|
| 114 |
+
* `toolset/`: crawling and cleaning tools for dataset construction
|
| 115 |
+
* `docs/`: figures and documentation describing the pipeline
|
| 116 |
|
| 117 |
---
|
| 118 |
|
| 119 |
+
### π Dataset Split
|
| 120 |
|
| 121 |
+
The dataset is splited as follows (see `dataset/split/`):
|
| 122 |
+
|
| 123 |
+
* **Training set**: \~100.5k samples
|
| 124 |
+
* **Test set**: \~3k samples
|
| 125 |
+
* **Each sample**: 10 seconds of paired video and audio
|
| 126 |
|
| 127 |
---
|
| 128 |
+
|
| 129 |
+
### π οΈ Data Collection & Cleaning
|
| 130 |
+
|
| 131 |
+
The dataset is constructed via a two-stage crawling and filtering pipeline as follows:
|
| 132 |
+
|
| 133 |
+
* **Crawling**
|
| 134 |
+
|
| 135 |
+
* Uses the [YouTube API](https://developers.google.com/youtube/v3/)
|
| 136 |
+
* Retrieves videos by channel and keyword-based queries
|
| 137 |
+
* Employs `yt-dlp` and `FFmpeg` to download and process audio/video streams
|
| 138 |
+
* Details: [docs/crawl.md](Sphere360/docs/crawl.md)
|
| 139 |
+
|
| 140 |
+
* **Cleaning**
|
| 141 |
+
|
| 142 |
+
* Filters out content using the following criteria:
|
| 143 |
+
|
| 144 |
+
* **Silent audio**
|
| 145 |
+
* **Static frames**
|
| 146 |
+
* **Audio-visual mismatches**
|
| 147 |
+
* **Human voice presence**
|
| 148 |
+
* Relies on models like [ImageBind](https://github.com/facebookresearch/ImageBind) and [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)
|
| 149 |
+
* Details: [docs/clean.md](Sphere360/docs/clean.md)
|
| 150 |
+
|
| 151 |
+
---
|
| 152 |
+
|
| 153 |
+
### β οΈ Legal Notice & Licensing
|
| 154 |
+
|
| 155 |
+
* All videos are collected from YouTube under terms consistent with fair use for academic research.
|
| 156 |
+
* Videos under Creative Commons licenses are properly attributed.
|
| 157 |
+
* No video is used for commercial purposes.
|
| 158 |
+
* All channel metadata is recorded in `dataset/channels.csv`.
|
| 159 |
+
|
| 160 |
+
---
|
| 161 |
+
|
| 162 |
+
## π Citation
|
| 163 |
+
|
| 164 |
+
If OmniAudio contributes to your research or applications, please cite it using the following BibTeX entry:
|
| 165 |
+
```bibtex
|
| 166 |
+
@misc{liu2025omniaudiogeneratingspatialaudio,
|
| 167 |
+
title={OmniAudio: Generating Spatial Audio from 360-Degree Video},
|
| 168 |
+
author={Huadai Liu and Tianyi Luo and Qikai Jiang and Kaicheng Luo and Peiwen Sun and Jialei Wan and Rongjie Huang and Qian Chen and Wen Wang and Xiangtai Li and Shiliang Zhang and Zhijie Yan and Zhou Zhao and Wei Xue},
|
| 169 |
+
year={2025},
|
| 170 |
+
eprint={2504.14906},
|
| 171 |
+
archivePrefix={arXiv},
|
| 172 |
+
primaryClass={eess.AS},
|
| 173 |
+
url={https://arxiv.org/abs/2504.14906},
|
| 174 |
+
}
|
| 175 |
+
```
|
| 176 |
+
---
|
| 177 |
+
|
| 178 |
+
π‘ *Have fun experimenting with OmniAudio! If you have any comments or questions, feel free to contact liuhuadai@zju.edu.cn π οΈπ*
|