Improve model card with full details and tags

This PR significantly expands the model card by incorporating the detailed information from the project's GitHub README. This includes the abstract, comprehensive setup instructions, usage examples for various functionalities (reconstruction, prediction, and simulation), and a more complete acknowledgements section.

It also adds the `image-to-3d` pipeline tag, ensuring the model can be found at https://huggingface.co/models?pipeline_tag=image-to-3d, and correctly identifies `diffusers` as a dependency.

Files changed (1) hide show

README.md +424 -5

README.md CHANGED Viewed

@@ -1,8 +1,10 @@
 ---
 license: mit
 ---
-# FluidNexus: 3D Fluid Reconstruction and Prediction From a Single Video
 [![arXiv](https://img.shields.io/badge/arXiv-2503.04720-b31b1b)](https://arxiv.org/abs/2503.04720)
 [![Paper PDF](https://img.shields.io/badge/Paper-PDF-blue)](https://arxiv.org/pdf/2503.04720)
@@ -11,16 +13,433 @@ license: mit
 [![Hugging Face Datasets](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Datasets-orange)](https://huggingface.co/datasets/yuegao/FluidNexusDatasets)
 [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-gold)](https://huggingface.co/yuegao/FluidNexusModels)
-[Yue Gao*](https://yuegao.me/), [Hong-Xing "Koven" Yu*](https://kovenyu.com/), [Bo Zhu](https://faculty.cc.gatech.edu/~bozhu/), [Jiajun Wu](https://jiajunwu.com/)
 [Stanford University](https://svl.stanford.edu/); [Microsoft](https://microsoft.com/); [Georgia Institute of Technology](https://www.gatech.edu/)
 \* denotes equal contribution
-## Citation
-If you find these datasets useful for your research, please cite our paper:
 ```bibtex
 @inproceedings{gao2025fluidnexus,
@@ -30,4 +449,4 @@ If you find these datasets useful for your research, please cite our paper:
     month     = {June},
     year      = {2025},
 }
-```

 ---
 license: mit
+pipeline_tag: image-to-3d
+library_name: diffusers
 ---
+# FluidNexus: 3D Fluid Reconstruction and Prediction From a Single Video
 [![arXiv](https://img.shields.io/badge/arXiv-2503.04720-b31b1b)](https://arxiv.org/abs/2503.04720)
 [![Paper PDF](https://img.shields.io/badge/Paper-PDF-blue)](https://arxiv.org/pdf/2503.04720)
 [![Hugging Face Datasets](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Datasets-orange)](https://huggingface.co/datasets/yuegao/FluidNexusDatasets)
 [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-gold)](https://huggingface.co/yuegao/FluidNexusModels)
+[**Yue Gao**\*](https://yuegao.me/), [**Hong-Xing "Koven" Yu**\*](https://kovenyu.com/), [**Bo Zhu**](https://faculty.cc.gatech.edu/~bozhu/), [**Jiajun Wu**](https://jiajunwu.com/)
 [Stanford University](https://svl.stanford.edu/); [Microsoft](https://microsoft.com/); [Georgia Institute of Technology](https://www.gatech.edu/)
 \* denotes equal contribution
+![FluidNexus Teaser](https://github.com/ueoo/FluidNexus/raw/main/assets/teaser_vel.gif)
+## Abstract
+We study reconstructing and predicting 3D fluid appearance and velocity from a single video. Current methods require multi-view videos for fluid reconstruction. We present FluidNexus, a novel framework that bridges video generation and physics simulation to tackle this task. Our key insight is to synthesize multiple novel-view videos as references for reconstruction. FluidNexus consists of two key components: (1) a novel-view video synthesizer that combines frame-wise view synthesis with video diffusion refinement for generating realistic videos, and (2) a physics-integrated particle representation coupling differentiable simulation and rendering to simultaneously facilitate 3D fluid reconstruction and prediction. To evaluate our approach, we collect two new real-world fluid datasets featuring textured backgrounds and object interactions. Our method enables dynamic novel view synthesis, future prediction, and interaction simulation from a single fluid video.
+## 🚀 Get Started
+> Don’t forget to update all `/path/to/FluidNexusRoot` to your real path. Find & Replace is your friend!
+### Set Up Root Folder and Python Environment
+```shell
+mkdir -p /path/to/FluidNexusRoot
+cd /path/to/FluidNexusRoot
+git clone https://github.com/ueoo/FluidNexus.git
+cd FluidNexus
+conda env create -f fluid_nexus.yml
+conda activate fluid_nexus
+# Install the 3D Gaussian Splatting submodules
+pip install git+https://github.com/graphdeco-inria/diff-gaussian-rasterization.git
+pip install git+https://github.com/facebookresearch/pytorch3d.git@stable
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+pip install submodules/gaussian_rasterization_ch3
+pip install submodules/gaussian_rasterization_ch1
+pip install submodules/simple-knn
+pip install git+https://github.com/openai/CLIP.git
+pip install xformers --index-url https://download.pytorch.org/whl/cu124
+```
+### Download the Datasets
+Our **FluidNexus-Smoke** and **FluidNexus-Ball** datasets each include 120 scenes. Every scene contains 5 synchronized multi-view videos, with cameras arranged along a horizontal arc of approximately 120°.
+*   **FluidNexusSmoke** and **FluidNexusBall**: Processed datasets containing one example sample used in our paper.
+*   **FluidNexusSmokeAll** and **FluidNexusBallAll**: All samples processed into frames, usable within the FluidNexus framework.
+*   **FluidNexusSmokeAllRaw** and **FluidNexusBallAllRaw**: Raw videos of all samples as originally captured.
+> For a quick start, just download either FluidNexusSmoke or FluidNexusBall. The ones labeled ‘All’ contain all the datasets we collected; you should use them only if you want to finetune Zero123 or CogVideo-X, or perform a thorough evaluation on the entire dataset.
+For **ScalarFlow**, please refer to the original [website](https://ge.in.tum.de/publications/2019-scalarflow-eckert/).
+```shell
+cd /path/to/FluidNexusRoot
+# Download FluidNexus-Smoke FluidNexus-Ball ScalarReal datasets from Hugging Face
+# To use the full dataset, you can clone it directly from HF:
+# git clone https://huggingface.co/datasets/yuegao/FluidNexusDatasets
+cd FluidNexusDatasets
+# conda install -c conda-forge git-lfs
+# git lfs install
+# git lfs pull
+# If you only want to download the two without ‘All’, you can do so with:
+# wget https://huggingface.co/datasets/yuegao/FluidNexusDatasets/resolve/main/FluidNexusBall.zip?download=true
+# wget https://huggingface.co/datasets/yuegao/FluidNexusDatasets/resolve/main/FluidNexusSmoke.zip?download=true
+unzip FluidNexusBall.zip
+# unzip FluidNexusBallAll.zip
+# unzip FluidNexusBallAllRaw.zip
+unzip FluidNexusSmoke.zip
+# unzip FluidNexusSmokeAll.zip
+# unzip FluidNexusSmokeAllRaw.zip
+unzip ScalarReal.zip
+mv FluidNexusBall /path/to/FluidNexusRoot
+# mv FluidNexusBallAll /path/to/FluidNexusRoot
+# mv FluidNexusBallAllRaw /path/to/FluidNexusRoot
+mv FluidNexusSmoke /path/to/FluidNexusRoot
+# mv FluidNexusSmokeAll /path/to/FluidNexusRoot
+# mv FluidNexusSmokeAllRaw /path/to/FluidNexusRoot
+mv ScalarReal /path/to/FluidNexusRoot
+```
+### Frame-wise Novel View Synthesis
+#### 1. Convert the frames to Zero123 input frames and create the cameras
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/DataProcessing
+python convert_original_to_zero123.py
+# note: update the dataset_name in create_zero123_cams.py first
+python create_zero123_cams.py
+```
+#### 2. Download the pretrained Zero123 and CogVideoX models
+```shell
+cd /path/to/FluidNexusRoot
+# Zero123 base models
+mkdir -p zero123_weights
+cd zero123_weights
+wget https://zero123.cs.columbia.edu/assets/zero123-xl.ckpt
+# CogVideoX base models
+mkdir -p cogvideox-sat
+# Please refer to the CogVideoX repo, we use the 1.0 version
+# https://github.com/THUDM/CogVideo/blob/main/sat/README.md
+# Our finetuned models
+git clone https://huggingface.co/yuegao/FluidNexusModels
+cd FluidNexusModels
+mv zero123_finetune_logs /path/to/FluidNexusRoot
+mv cogvideox_lora_ckpts /path/to/FluidNexusRoot
+```
+#### 3. Inference the frame-wise novel view synthesis model
+Take `FluidNexus-Smoke` as an example, we assume the camera 2 is the middle camera, which is used as input:
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/Zero123
+python inference/infer_fluid_nexus_smoke.py --tgt_cam 0
+python inference/infer_fluid_nexus_smoke.py --tgt_cam 1
+python inference/infer_fluid_nexus_smoke.py --tgt_cam 3
+python inference/infer_fluid_nexus_smoke.py --tgt_cam 4
+```
+### Generative Video Refinement
+#### 1. Convert Zero123 output frames to CogVideoX input frames
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/DataProcessing
+python convert_zero123_to_cogvideox.py
+```
+#### 2. Inference the video generative models
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/CogVideoX
+bash tools_gen/gen_zero123_pi2v_long_fluid_nexus_smoke.sh
+bash tools_gen/gen_zero123_pi2v_long_fluid_nexus_ball.sh
+bash tools_gen/gen_zero123_pi2v_long_scalar_real.sh
+```
+#### 3. Convert the video gen output frames to original frame format
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/DataProcessing
+python convert_cogvideox_to_original.py
+```
+### Fluid Dynamics Reconstruction
+#### 1. Optimize the background
+Skip this step for ScalarReal dataset
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+# For FluidNeuxs-Smoke
+bash tools_fluid_nexus/smoke_train_background.sh
+# For FluidNeuxs-Ball
+bash tools_fluid_nexus/ball_train_background.sh
+```
+#### 2. Optimize the physical particles
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+# For FluidNeuxs-Smoke
+bash tools_fluid_nexus/smoke_train_dynamics_physical.sh
+# For FluidNeuxs-Ball
+bash tools_fluid_nexus/ball_train_dynamics_physical.sh
+# For ScalarReal
+bash tools_scalar_real/train_physical_particle.sh
+```
+#### 3. Optimize the visual particles
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+# For FluidNeuxs-Smoke
+bash tools_fluid_nexus/smoke_train_dynamics_visual.sh
+# For FluidNeuxs-Ball
+bash tools_fluid_nexus/ball_train_dynamics_visual.sh
+# For ScalarReal
+bash tools_scalar_real/train_visual_particle.sh
+```
+🎊🎊 The results are located in `training_render`! 🎊🎊
+## 🕰️ Future Prediction
+### Physics simulation
+Physics simulation is used to render rough multi-view future prediction frames.
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+# For FluidNeuxs-Smoke
+bash tools_fluid_nexus/smoke_future_simulation.sh
+# For FluidNeuxs-Ball
+bash tools_fluid_nexus/ball_future_simulation.sh
+# For ScalarReal
+bash tools_scalar_real/future_simulation.sh
+```
+### Convert the simulation results to CogVideoX input format
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/DataProcessing
+# FluidNexus-Smoke
+# update the experiment name first
+python convert_simulation_original_to_cogvideox.py
+# FluidNexus-Ball
+# update the experiment name first
+python convert_simulation_original_to_cogvideox.py
+# ScalarReal
+python convert_simulation_original_to_cogvideox_unshift.py
+```
+### Generative video refinement on future prediction
+Refine the rough multi-view frames.
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/CogVideoX
+bash tools_gen/gen_future_pi2v_fluid_nexus_smoke.sh
+bash tools_gen/gen_future_pi2v_fluid_nexus_ball.sh
+bash tools_gen/gen_future_pi2v_scalar_real.sh
+```
+### Fluid dynamics reconstruction with future prediction
+#### 1. Optimize the physical particles with future prediction
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+# For FluidNeuxs-Smoke
+bash tools_fluid_nexus/smoke_train_dynamics_physical_future.sh
+# For FluidNeuxs-Ball
+bash tools_fluid_nexus/ball_train_dynamics_physical_future.sh
+# For ScalarReal
+bash tools_scalar_real/train_physical_particle_future.sh
+```
+#### 2. Optimize the visual particles with future prediction
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+# For FluidNeuxs-Smoke
+bash tools_fluid_nexus/smoke_train_dynamics_visual_future.sh
+# For FluidNeuxs-Ball
+bash tools_fluid_nexus/ball_train_dynamics_visual_future.sh
+# For ScalarReal
+bash tools_scalar_real/train_visual_particle_future.sh
+```
+## 💨 Counterfactual Interaction Simulation - Wind
+### Physics simulation with wind
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+bash tools_fluid_nexus/smoke_wind_simulation.sh
+```
+### Convert the simulation results to CogVideoX format
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/DataProcessing
+# FluidNexus-Smoke wind interaction
+# update the experiment name first
+python convert_simulation_original_to_cogvideox.py
+```
+### Generative video refinement with wind
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/CogVideoX
+bash tools_gen/gen_future_pi2v_fluid_nexus_smoke_wind.sh
+```
+### Fluid dynamics reconstruction with wind
+#### 1. Optimize the physical particles with wind
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+bash tools_fluid_nexus/smoke_train_dynamics_physical_wind.sh
+```
+#### 2. Optimize the visual particles with wind
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+bash fluid_dynamics/tools_fluid_nexus/smoke_train_dynamics_visual_wind.sh
+```
+## 🔮 Counterfactual Interaction Simulation - Object
+### Fluid dynamics reconstruction with object
+#### 1. Optimize the physical particles with object
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+bash tools_fluid_nexus/object_train_dynamics_physical.sh
+```
+#### 2. Optimize the visual particles with object
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/FluidDynamics
+bash fluid_dynamics/tools_fluid_nexus/object_train_dynamics_visual.sh
+```
+## 🚞 Zero123 Finetuning
+### Create Zero123 datasets
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/DataProcessing
+# FluidNexus-Smoke
+bash create_zero123_fluid_nexus_smoke.sh
+# FluidNexus-Ball
+bash create_zero123_fluid_nexus_ball.sh
+# ScalarFlow
+bash create_zero123_scalar_flow.sh
+```
+### Finetune Zero123 models
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/Zero123
+# FluidNexus-Smoke
+bash tools/train_fluid_nexus_smoke.sh
+# FluidNexus-Ball
+bash tools/train_fluid_nexus_ball.sh
+# ScalarFlow
+bash tools/train_scalar_flow.sh
+```
+## 🚂 CogVideoX LoRA Finetuning
+### Create CogVideoX datasets
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/DataProcessing
+# FluidNexus-Smoke
+bash create_cogvideox_fluid_nexus_smoke.sh
+# FluidNexus-Ball
+bash create_cogvideox_fluid_nexus_ball.sh
+# ScalarFlow
+bash create_cogvideox_scalar_flow.sh
+```
+### Finetune CogVideoX models
+```shell
+cd /path/to/FluidNexusRoot/FluidNexus/CogVideoX
+# FluidNexus-Smoke
+bash tools_finetune/finetune_pi2v_fluid_nexus_smoke.sh
+# FluidNexus-Ball
+bash tools_finetune/finetune_pi2v_fluid_nexus_ball.sh
+# ScalarFlow
+bash tools_finetune/finetune_pi2v_scalar_flow.sh
+```
+## 🌴 Acknowledgements
+Thanks to these great repositories: [SpacetimeGaussians](https://github.com/oppo-us-research/SpacetimeGaussians), [3DGS](https://github.com/graphdeco-inria/gaussian-splatting), [HyFluid](https://github.com/y-zheng18/HyFluid), [CogVideo](https://github.com/THUDM/CogVideo), [Zero123](https://github.com/cvlab-columbia/zero123), [diffusers](https://github.com/huggingface/diffusers) and many other inspiring works in the community.
+We sincerely thank the anonymous reviewers of CVPR 2025 for their helpful feedbacks.
+## ⭐️ Citation
+If you find this code useful for your research, please cite our paper:
 ```bibtex
 @inproceedings{gao2025fluidnexus,
     month     = {June},
     year      = {2025},
 }
+```