gvide commited on
Commit
94150bb
Β·
verified Β·
1 Parent(s): bf41e1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -6
README.md CHANGED
@@ -1,8 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- license: mit
3
- tags:
4
- - SLAM
5
- - Visual Odometry
6
- - Computer Vision
7
- - Relative Camera Pose Estimation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
 
1
+ # πŸ”„ CycleVO
2
+
3
+ <div align="center">
4
+
5
+ ![license: mit](https://img.shields.io/badge/license-MIT-blue)
6
+ ![tags: SLAM](https://img.shields.io/badge/tags-SLAM-brightgreen)
7
+ ![tags: Visual Odometry](https://img.shields.io/badge/tags-Visual%20Odometry-brightgreen)
8
+ ![tags: Computer Vision](https://img.shields.io/badge/tags-Computer%20Vision-brightgreen)
9
+ ![tags: Relative Camera Pose Estimation](https://img.shields.io/badge/tags-Relative%20Camera%20Pose%20Estimation-brightgreen)
10
+
11
+ **Part of the BodySLAM Framework for Endoscopic Surgical Applications**
12
+
13
+ [Paper](https://arxiv.org/abs/2408.03078) | [GitHub](https://github.com/GuidoManni/BodySLAM)
14
+
15
+ </div>
16
+
17
  ---
18
+
19
+ ## πŸ“Œ Overview
20
+
21
+ CycleVO is an unsupervised monocular pose estimation model designed to robustly estimate the relative camera pose between consecutive frames from endoscopic video. It addresses challenges such as low-texture surfaces and significant illumination variations common in surgical environments.
22
+
23
+ <div align="center">
24
+ <img src="https://via.placeholder.com/800x400?text=CycleVO+Architecture" alt="CycleVO Architecture Diagram" width="80%"/>
25
+ </div>
26
+
27
+ ## ✨ Key Features
28
+
29
+ - **πŸ”„ Unsupervised Learning via Cycle Consistency**: Inspired by CycleGAN and InfoGAN
30
+ - **⚑ Competitive Performance and Speed**: Low inference time compared to state-of-the-art methods
31
+ - **πŸ”Œ Easy Integration with SLAM Pipelines**: Provides ready-to-use motion matrices
32
+
33
+ ## 🧠 Model Details
34
+
35
+ CycleVO learns to estimate the relative motion (i.e., camera pose) between consecutive endoscopic frames. The model predicts a motion matrix 𝑀=[𝑅,𝑑<sub>unscaled</sub>,1,0] using a generator encoder architecture augmented with a pose estimation tail.
36
+
37
+ | **Developed by** | Guido Manni, Clemente Lauretti, Francesco Prata, Rocco Papalia, Loredana Zollo, Paolo Soda |
38
+ |:-----------------|:--------------------------------------------------------------------------------------------|
39
+ | **Model Type** | Unsupervised Monocular Visual Odometry / Relative Camera Pose Estimation |
40
+ | **License** | MIT |
41
+ | **Training** | From scratch using a large-scale internal endoscopic dataset |
42
+
43
+ ## πŸš€ Getting Started
44
+
45
+ For complete documentation, please refer to the [GitHub repository](https://github.com/yourusername/BodySLAM).
46
+
47
+
48
+ ## πŸ” Use Cases
49
+
50
+ ### βœ… Ideal Applications
51
+
52
+ - **Surgical Navigation**: Real-time guidance during minimally invasive procedures
53
+ - **3D Reconstruction**: Enhanced mapping of surgical scenes
54
+ - **Depth Perception**: Accurate pose estimates to complement monocular depth predictors
55
+
56
+ ### β›” Out-of-Scope Applications
57
+
58
+ - General-purpose visual odometry without proper domain adaptation
59
+
60
+ ## πŸ“ˆ Training Details
61
+
62
+ - **Dataset**: 300+ hours of endoscopic videos from 100 patients (gastroscopy and prostatectomy)
63
+ - **Preprocessing**: Frame extraction with 128Γ—128 pixel center crop
64
+ - **Loss Function**: Combined adversarial, image cycle consistency, and pose cycle consistency losses
65
+ - **Optimizer**: Adam with standard learning rate schedules
66
+
67
+ ## πŸ›‘οΈ Limitations & Recommendations
68
+
69
+ - **Inherent Scale Ambiguity**: Common in monocular systems
70
+ - **Domain Specificity**: Trained solely on endoscopic data
71
+ - **Clinical Deployment**: Requires thorough validation and clinical trials
72
+
73
+ **We recommend**:
74
+ - Validating the model thoroughly in your target environment
75
+ - Integrating additional sensors when possible
76
+ - Collaborating with clinical experts before surgical deployment
77
+
78
+ ## πŸ“š Citation
79
+
80
+ ```bibtex
81
+ @misc{manni2024bodyslamgeneralizedmonocularvisual,
82
+ title={BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications},
83
+ author={G. Manni and C. Lauretti and F. Prata and R. Papalia and L. Zollo and P. Soda},
84
+ year={2024},
85
+ eprint={2408.03078},
86
+ archivePrefix={arXiv},
87
+ primaryClass={cs.CV},
88
+ url={https://arxiv.org/abs/2408.03078}
89
+ }
90
+ ```
91
+
92
+ ## πŸ“– Glossary
93
+
94
+ - **Cycle Consistency Loss**: Enforces agreement between original and reconstructed inputs after transformations
95
+ - **Motion Matrix (M)**: Composed of rotation (R) and unscaled translation vector (t<sub>unscaled</sub>)
96
+ - **ATE/RTE/RRE**: Absolute Trajectory Error, Relative Trajectory Error, Relative Rotation Error
97
+
98
+ ## πŸ“« Contact
99
+
100
+ For questions or further information, please contact:
101
+ **Guido Manni** - [guido.manni@unicampus.it](mailto:guido.manni@unicampus.it)
102
+
103
  ---