Image-to-Video
Diffusers
Safetensors

Improve model card metadata and links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +8 -3
README.md CHANGED
@@ -1,6 +1,9 @@
1
  ---
2
  license: cc-by-4.0
 
 
3
  ---
 
4
  <div id="top" align="center">
5
 
6
  # SWoMo: Neuro-Symbolic World Model for Cataract Surgery Simulation (MICCAI 2026 - Early Accept)
@@ -9,10 +12,12 @@ license: cc-by-4.0
9
  [![arXiv](https://img.shields.io/badge/arXiv-2605.16530-b31b1b.svg)](https://arxiv.org/abs/2605.16530)
10
  [![Project Page](https://img.shields.io/badge/Project-Page-green)](https://ssharvienkumar.github.io/SWoMo/)
11
  [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/SsharvienKumar/SWoMo)
12
- [![GitHub](https://img.shields.io/badge/GitHub-IntrekSAM-Yellow?logo=github)](https://github.com/MECLabTUDA/IntrekSAM)
13
 
14
  </div>
15
 
 
 
16
  ***This framework provides ability to use any combination of text, graph, image and video as conditioning for video synthesisation. We have provided sample configs to run training and inference for all these combinations. Feel free to use our work for comparisons and to cite it!***
17
 
18
  ## πŸ”‘ Key Features
@@ -33,7 +38,7 @@ conda activate swomo
33
  ## πŸ’Ύ Dataset Preparation and Annotation Tools
34
  We released our interactive SAM2-based annotation tool in a separate repository: [IntrekSAM](https://github.com/MECLabTUDA/IntrekSAM). In our research, we found that there was no existing tool for video segmentation annotation that is free, open-source, locally deployable, easily modifiable, supports multi-class segmentation, and is simple to set up. Therefore, we rewrote the GUI in Python while still keeping the original SAM2 backend.
35
 
36
- We also make our processed Cataract-1k data available on [Hugging Face](https://huggingface.co/SsharvienKumar/SWoMo/tree/main/datasets), including real videos, simulated videos, simulated segmentations, and scene graphs. If you would like to use our **manually annotated segmentations of the real videos (at 16 fps)** for the 1,068 videos from Cataract-1K and 50 videos from CATARACTS, please contact me via the email address in the paper. I would also be happy to share additional annotations described in the paper, such as phase labels and tracking point annotation, upon request.
37
 
38
 
39
  ## 🏁 Checkpoints
@@ -102,7 +107,7 @@ python train.py --config configs/training/training_img_graph_vid_cataracts -n sw
102
 
103
  ## πŸ“œ Citations
104
  If you are using SWoMo for your paper, please cite the following paper:
105
- ```
106
  @article{sivakumar2026swomo,
107
  title={SWoMo: Neuro-Symbolic World Model for Cataract Surgery Simulation},
108
  author={Sivakumar, Ssharvien Kumar and Johnson, Akwele and Dhingra, Anirudh and Frisch, Yannik and Ghazaei, Ghazal and Mukhopadhyay, Anirban},
 
1
  ---
2
  license: cc-by-4.0
3
+ library_name: diffusers
4
+ pipeline_tag: image-to-video
5
  ---
6
+
7
  <div id="top" align="center">
8
 
9
  # SWoMo: Neuro-Symbolic World Model for Cataract Surgery Simulation (MICCAI 2026 - Early Accept)
 
12
  [![arXiv](https://img.shields.io/badge/arXiv-2605.16530-b31b1b.svg)](https://arxiv.org/abs/2605.16530)
13
  [![Project Page](https://img.shields.io/badge/Project-Page-green)](https://ssharvienkumar.github.io/SWoMo/)
14
  [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/SsharvienKumar/SWoMo)
15
+ [![GitHub](https://img.shields.io/badge/GitHub-SWoMo-Yellow?logo=github)](https://github.com/MECLabTUDA/SWoMo)
16
 
17
  </div>
18
 
19
+ SWoMo is a neuro-symbolic world model for cataract surgery simulation that decouples motion generation from visual realism. It was presented in the paper [SWoMo: Neuro-Symbolic World Model for Cataract Surgery Simulation](https://huggingface.co/papers/2605.16530).
20
+
21
  ***This framework provides ability to use any combination of text, graph, image and video as conditioning for video synthesisation. We have provided sample configs to run training and inference for all these combinations. Feel free to use our work for comparisons and to cite it!***
22
 
23
  ## πŸ”‘ Key Features
 
38
  ## πŸ’Ύ Dataset Preparation and Annotation Tools
39
  We released our interactive SAM2-based annotation tool in a separate repository: [IntrekSAM](https://github.com/MECLabTUDA/IntrekSAM). In our research, we found that there was no existing tool for video segmentation annotation that is free, open-source, locally deployable, easily modifiable, supports multi-class segmentation, and is simple to set up. Therefore, we rewrote the GUI in Python while still keeping the original SAM2 backend.
40
 
41
+ We also make our processed Cataract-1k data available on [Hugging Face](https://huggingface.co/SsharvienKumar/SWoMo/tree/main/datasets), including real videos, simulated videos, simulated segmentations, and scene graphs. If you would like to use our **manually annotated segmentations of the real videos (at 16 fps)** for the 1,068 videos from Cataract-1K and 50 videos from CATARACTS, please contact the authors via the email address in the paper. We would also be happy to share additional annotations described in the paper, such as phase labels and tracking point annotation, upon request.
42
 
43
 
44
  ## 🏁 Checkpoints
 
107
 
108
  ## πŸ“œ Citations
109
  If you are using SWoMo for your paper, please cite the following paper:
110
+ ```bibtex
111
  @article{sivakumar2026swomo,
112
  title={SWoMo: Neuro-Symbolic World Model for Cataract Surgery Simulation},
113
  author={Sivakumar, Ssharvien Kumar and Johnson, Akwele and Dhingra, Anirudh and Frisch, Yannik and Ghazaei, Ghazal and Mukhopadhyay, Anirban},