|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- Visual Odometry |
|
|
- Deep Learning |
|
|
- Computer Vision |
|
|
--- |
|
|
# π CycleVO |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
**Part of the BodySLAM Framework for Endoscopic Surgical Applications** |
|
|
|
|
|
[Paper](https://arxiv.org/abs/2408.03078) | [GitHub](https://github.com/GuidoManni/BodySLAM) |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## π Overview |
|
|
|
|
|
CycleVO is an unsupervised monocular pose estimation model designed to robustly estimate the relative camera pose between consecutive frames from endoscopic video. It addresses challenges such as low-texture surfaces and significant illumination variations common in surgical environments. |
|
|
|
|
|
<div align="center"> |
|
|
<img src="CycleVO_architecture.png" alt="CycleVO Architecture Diagram" width="80%"/> |
|
|
</div> |
|
|
|
|
|
## β¨ Key Features |
|
|
|
|
|
- **π Unsupervised Learning via Cycle Consistency**: Inspired by CycleGAN and InfoGAN |
|
|
- **β‘ Competitive Performance and Speed**: Low inference time compared to state-of-the-art methods |
|
|
- **π Easy Integration with SLAM Pipelines**: Provides ready-to-use motion matrices |
|
|
|
|
|
## π§ Model Details |
|
|
|
|
|
CycleVO learns to estimate the relative motion (i.e., camera pose) between consecutive endoscopic frames. The model predicts a motion matrix π=[π
,π‘<sub>unscaled</sub>,1,0] using a generator encoder architecture augmented with a pose estimation tail. |
|
|
|
|
|
| **Developed by** | Guido Manni, Clemente Lauretti, Francesco Prata, Rocco Papalia, Loredana Zollo, Paolo Soda | |
|
|
|:-----------------|:--------------------------------------------------------------------------------------------| |
|
|
| **Model Type** | Unsupervised Monocular Visual Odometry / Relative Camera Pose Estimation | |
|
|
| **License** | MIT | |
|
|
| **Training** | From scratch using a large-scale internal endoscopic dataset | |
|
|
|
|
|
## π Getting Started |
|
|
|
|
|
For complete documentation, please refer to the [GitHub repository](https://github.com/yourusername/BodySLAM). |
|
|
|
|
|
|
|
|
## π Use Cases |
|
|
|
|
|
### β
Ideal Applications |
|
|
|
|
|
- **Surgical Navigation**: Real-time guidance during minimally invasive procedures |
|
|
- **3D Reconstruction**: Enhanced mapping of surgical scenes |
|
|
- **Depth Perception**: Accurate pose estimates to complement monocular depth predictors |
|
|
|
|
|
### β Out-of-Scope Applications |
|
|
|
|
|
- General-purpose visual odometry without proper domain adaptation |
|
|
|
|
|
## π Training Details |
|
|
|
|
|
- **Dataset**: 300+ hours of endoscopic videos from 100 patients (gastroscopy and prostatectomy) |
|
|
- **Preprocessing**: Frame extraction with 128Γ128 pixel center crop |
|
|
- **Loss Function**: Combined adversarial, image cycle consistency, and pose cycle consistency losses |
|
|
- **Optimizer**: Adam with standard learning rate schedules |
|
|
|
|
|
## π‘οΈ Limitations & Recommendations |
|
|
|
|
|
- **Inherent Scale Ambiguity**: Common in monocular systems |
|
|
- **Domain Specificity**: Trained solely on endoscopic data |
|
|
- **Clinical Deployment**: Requires thorough validation and clinical trials |
|
|
|
|
|
**We recommend**: |
|
|
- Validating the model thoroughly in your target environment |
|
|
- Integrating additional sensors when possible |
|
|
- Collaborating with clinical experts before surgical deployment |
|
|
|
|
|
## π Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{manni2024bodyslamgeneralizedmonocularvisual, |
|
|
title={BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications}, |
|
|
author={G. Manni and C. Lauretti and F. Prata and R. Papalia and L. Zollo and P. Soda}, |
|
|
year={2024}, |
|
|
eprint={2408.03078}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2408.03078} |
|
|
} |
|
|
``` |
|
|
|
|
|
## π Glossary |
|
|
|
|
|
- **Cycle Consistency Loss**: Enforces agreement between original and reconstructed inputs after transformations |
|
|
- **Motion Matrix (M)**: Composed of rotation (R) and unscaled translation vector (t<sub>unscaled</sub>) |
|
|
- **ATE/RTE/RRE**: Absolute Trajectory Error, Relative Trajectory Error, Relative Rotation Error |
|
|
|
|
|
## π« Contact |
|
|
|
|
|
For questions or further information, please contact: |
|
|
**Guido Manni** - [guido.manni@unicampus.it](mailto:guido.manni@unicampus.it) |
|
|
|
|
|
--- |