Update README.md
Browse files
README.md
CHANGED
|
@@ -1,8 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
-
|
| 6 |
-
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
|
|
|
| 1 |
+
# π CycleVO
|
| 2 |
+
|
| 3 |
+
<div align="center">
|
| 4 |
+
|
| 5 |
+

|
| 6 |
+

|
| 7 |
+

|
| 8 |
+

|
| 9 |
+

|
| 10 |
+
|
| 11 |
+
**Part of the BodySLAM Framework for Endoscopic Surgical Applications**
|
| 12 |
+
|
| 13 |
+
[Paper](https://arxiv.org/abs/2408.03078) | [GitHub](https://github.com/GuidoManni/BodySLAM)
|
| 14 |
+
|
| 15 |
+
</div>
|
| 16 |
+
|
| 17 |
---
|
| 18 |
+
|
| 19 |
+
## π Overview
|
| 20 |
+
|
| 21 |
+
CycleVO is an unsupervised monocular pose estimation model designed to robustly estimate the relative camera pose between consecutive frames from endoscopic video. It addresses challenges such as low-texture surfaces and significant illumination variations common in surgical environments.
|
| 22 |
+
|
| 23 |
+
<div align="center">
|
| 24 |
+
<img src="https://via.placeholder.com/800x400?text=CycleVO+Architecture" alt="CycleVO Architecture Diagram" width="80%"/>
|
| 25 |
+
</div>
|
| 26 |
+
|
| 27 |
+
## β¨ Key Features
|
| 28 |
+
|
| 29 |
+
- **π Unsupervised Learning via Cycle Consistency**: Inspired by CycleGAN and InfoGAN
|
| 30 |
+
- **β‘ Competitive Performance and Speed**: Low inference time compared to state-of-the-art methods
|
| 31 |
+
- **π Easy Integration with SLAM Pipelines**: Provides ready-to-use motion matrices
|
| 32 |
+
|
| 33 |
+
## π§ Model Details
|
| 34 |
+
|
| 35 |
+
CycleVO learns to estimate the relative motion (i.e., camera pose) between consecutive endoscopic frames. The model predicts a motion matrix π=[π
,π‘<sub>unscaled</sub>,1,0] using a generator encoder architecture augmented with a pose estimation tail.
|
| 36 |
+
|
| 37 |
+
| **Developed by** | Guido Manni, Clemente Lauretti, Francesco Prata, Rocco Papalia, Loredana Zollo, Paolo Soda |
|
| 38 |
+
|:-----------------|:--------------------------------------------------------------------------------------------|
|
| 39 |
+
| **Model Type** | Unsupervised Monocular Visual Odometry / Relative Camera Pose Estimation |
|
| 40 |
+
| **License** | MIT |
|
| 41 |
+
| **Training** | From scratch using a large-scale internal endoscopic dataset |
|
| 42 |
+
|
| 43 |
+
## π Getting Started
|
| 44 |
+
|
| 45 |
+
For complete documentation, please refer to the [GitHub repository](https://github.com/yourusername/BodySLAM).
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
## π Use Cases
|
| 49 |
+
|
| 50 |
+
### β
Ideal Applications
|
| 51 |
+
|
| 52 |
+
- **Surgical Navigation**: Real-time guidance during minimally invasive procedures
|
| 53 |
+
- **3D Reconstruction**: Enhanced mapping of surgical scenes
|
| 54 |
+
- **Depth Perception**: Accurate pose estimates to complement monocular depth predictors
|
| 55 |
+
|
| 56 |
+
### β Out-of-Scope Applications
|
| 57 |
+
|
| 58 |
+
- General-purpose visual odometry without proper domain adaptation
|
| 59 |
+
|
| 60 |
+
## π Training Details
|
| 61 |
+
|
| 62 |
+
- **Dataset**: 300+ hours of endoscopic videos from 100 patients (gastroscopy and prostatectomy)
|
| 63 |
+
- **Preprocessing**: Frame extraction with 128Γ128 pixel center crop
|
| 64 |
+
- **Loss Function**: Combined adversarial, image cycle consistency, and pose cycle consistency losses
|
| 65 |
+
- **Optimizer**: Adam with standard learning rate schedules
|
| 66 |
+
|
| 67 |
+
## π‘οΈ Limitations & Recommendations
|
| 68 |
+
|
| 69 |
+
- **Inherent Scale Ambiguity**: Common in monocular systems
|
| 70 |
+
- **Domain Specificity**: Trained solely on endoscopic data
|
| 71 |
+
- **Clinical Deployment**: Requires thorough validation and clinical trials
|
| 72 |
+
|
| 73 |
+
**We recommend**:
|
| 74 |
+
- Validating the model thoroughly in your target environment
|
| 75 |
+
- Integrating additional sensors when possible
|
| 76 |
+
- Collaborating with clinical experts before surgical deployment
|
| 77 |
+
|
| 78 |
+
## π Citation
|
| 79 |
+
|
| 80 |
+
```bibtex
|
| 81 |
+
@misc{manni2024bodyslamgeneralizedmonocularvisual,
|
| 82 |
+
title={BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications},
|
| 83 |
+
author={G. Manni and C. Lauretti and F. Prata and R. Papalia and L. Zollo and P. Soda},
|
| 84 |
+
year={2024},
|
| 85 |
+
eprint={2408.03078},
|
| 86 |
+
archivePrefix={arXiv},
|
| 87 |
+
primaryClass={cs.CV},
|
| 88 |
+
url={https://arxiv.org/abs/2408.03078}
|
| 89 |
+
}
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
## π Glossary
|
| 93 |
+
|
| 94 |
+
- **Cycle Consistency Loss**: Enforces agreement between original and reconstructed inputs after transformations
|
| 95 |
+
- **Motion Matrix (M)**: Composed of rotation (R) and unscaled translation vector (t<sub>unscaled</sub>)
|
| 96 |
+
- **ATE/RTE/RRE**: Absolute Trajectory Error, Relative Trajectory Error, Relative Rotation Error
|
| 97 |
+
|
| 98 |
+
## π« Contact
|
| 99 |
+
|
| 100 |
+
For questions or further information, please contact:
|
| 101 |
+
**Guido Manni** - [guido.manni@unicampus.it](mailto:guido.manni@unicampus.it)
|
| 102 |
+
|
| 103 |
---
|