Image-to-Video
Diffusers
Safetensors
SsharvienKumar commited on
Commit
1474dbd
ยท
verified ยท
1 Parent(s): 8ed8b89

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -14,8 +14,6 @@ license: cc-by-4.0
14
 
15
  ***This framework provides ability to use any combination of text, graph, image and video as conditioning for video synthesisation. We have provided sample configs to run training and inference for all these combinations. Feel free to use our work for comparisons and to cite it!***
16
 
17
- ***STILL NEED TO UPDATE ARXIV, HUGGINGFACE, GIVE CONTACT INFO***
18
-
19
  ## ๐Ÿ”‘ Key Features
20
  - SWoMo is a neuro-symbolic world model for surgical simulation that decouples interaction dynamics from visual appearance.
21
  - Using an inverse pairing strategy, real surgical videos are reconstructed in a simulator to create paired data for training a video diffusion model for sim-to-real translation, with intermediate scene graphs serving as a constraint regularizer.
@@ -34,7 +32,7 @@ conda activate swomo
34
  ## ๐Ÿ’พ Dataset Preparation and Annotation Tools
35
  We released our interactive SAM2-based annotation tool in a separate repository: [IntrekSAM](https://github.com/MECLabTUDA/IntrekSAM). In our research, we found that there was no existing tool for video segmentation annotation that is free, open-source, locally deployable, easily modifiable, supports multi-class segmentation, and is simple to set up. Therefore, we rewrote the GUI in Python while still keeping the original SAM2 backend.
36
 
37
- We also make our processed Cataract-1k data available on [Hugging Face](https://huggingface.co/SsharvienKumar/SWoMo/tree/main/datasets), including real videos, simulated videos, simulated segmentations, and scene graphs. If you would like to use our **manually annotated segmentations of the real videos (at 16 fps)** for the 1,068 videos from Cataract-1K and 50 videos from CATARACTS, please contact me here [TODO]. I would also be happy to share additional annotations described in the paper, such as phase labels and tracking point annotation, upon request.
38
 
39
 
40
  ## ๐Ÿ Checkpoints
 
14
 
15
  ***This framework provides ability to use any combination of text, graph, image and video as conditioning for video synthesisation. We have provided sample configs to run training and inference for all these combinations. Feel free to use our work for comparisons and to cite it!***
16
 
 
 
17
  ## ๐Ÿ”‘ Key Features
18
  - SWoMo is a neuro-symbolic world model for surgical simulation that decouples interaction dynamics from visual appearance.
19
  - Using an inverse pairing strategy, real surgical videos are reconstructed in a simulator to create paired data for training a video diffusion model for sim-to-real translation, with intermediate scene graphs serving as a constraint regularizer.
 
32
  ## ๐Ÿ’พ Dataset Preparation and Annotation Tools
33
  We released our interactive SAM2-based annotation tool in a separate repository: [IntrekSAM](https://github.com/MECLabTUDA/IntrekSAM). In our research, we found that there was no existing tool for video segmentation annotation that is free, open-source, locally deployable, easily modifiable, supports multi-class segmentation, and is simple to set up. Therefore, we rewrote the GUI in Python while still keeping the original SAM2 backend.
34
 
35
+ We also make our processed Cataract-1k data available on [Hugging Face](https://huggingface.co/SsharvienKumar/SWoMo/tree/main/datasets), including real videos, simulated videos, simulated segmentations, and scene graphs. If you would like to use our **manually annotated segmentations of the real videos (at 16 fps)** for the 1,068 videos from Cataract-1K and 50 videos from CATARACTS, please contact me via the email address in the paper. I would also be happy to share additional annotations described in the paper, such as phase labels and tracking point annotation, upon request.
36
 
37
 
38
  ## ๐Ÿ Checkpoints