File size: 4,553 Bytes
0037210
 
 
 
 
 
 
 
 
94150bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
539279e
94150bb
 
 
 
 
 
4ef075f
94150bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
539279e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
license: mit
language:
- en
tags:
- Visual Odometry
- Deep Learning
- Computer Vision
---
# πŸ”„ CycleVO

<div align="center">

![license: mit](https://img.shields.io/badge/license-MIT-blue)
![tags: SLAM](https://img.shields.io/badge/tags-SLAM-brightgreen)
![tags: Visual Odometry](https://img.shields.io/badge/tags-Visual%20Odometry-brightgreen)
![tags: Computer Vision](https://img.shields.io/badge/tags-Computer%20Vision-brightgreen)
![tags: Relative Camera Pose Estimation](https://img.shields.io/badge/tags-Relative%20Camera%20Pose%20Estimation-brightgreen)

**Part of the BodySLAM Framework for Endoscopic Surgical Applications**

[Paper](https://arxiv.org/abs/2408.03078) | [GitHub](https://github.com/GuidoManni/BodySLAM)

</div>

---

## πŸ“Œ Overview

CycleVO is an unsupervised monocular pose estimation model designed to robustly estimate the relative camera pose between consecutive frames from endoscopic video. It addresses challenges such as low-texture surfaces and significant illumination variations common in surgical environments.

<div align="center">
  <img src="CycleVO_architecture.png" alt="CycleVO Architecture Diagram" width="80%"/>
</div>

## ✨ Key Features

- **πŸ”„ Unsupervised Learning via Cycle Consistency**: Inspired by CycleGAN and InfoGAN
- **⚑ Competitive Performance and Speed**: Low inference time compared to state-of-the-art methods
- **πŸ”Œ Easy Integration with SLAM Pipelines**: Provides ready-to-use motion matrices

## 🧠 Model Details

CycleVO learns to estimate the relative motion (i.e., camera pose) between consecutive endoscopic frames. The model predicts a motion matrix 𝑀=[𝑅,𝑑<sub>unscaled</sub>,1,0] using a generator encoder architecture augmented with a pose estimation tail.

| **Developed by** | Guido Manni, Clemente Lauretti, Francesco Prata, Rocco Papalia, Loredana Zollo, Paolo Soda |
|:-----------------|:--------------------------------------------------------------------------------------------|
| **Model Type**   | Unsupervised Monocular Visual Odometry / Relative Camera Pose Estimation                    |
| **License**      | MIT                                                                                         |
| **Training**     | From scratch using a large-scale internal endoscopic dataset                                |

## πŸš€ Getting Started

For complete documentation, please refer to the [GitHub repository](https://github.com/yourusername/BodySLAM).


## πŸ” Use Cases

### βœ… Ideal Applications

- **Surgical Navigation**: Real-time guidance during minimally invasive procedures
- **3D Reconstruction**: Enhanced mapping of surgical scenes
- **Depth Perception**: Accurate pose estimates to complement monocular depth predictors

### β›” Out-of-Scope Applications

- General-purpose visual odometry without proper domain adaptation

## πŸ“ˆ Training Details

- **Dataset**: 300+ hours of endoscopic videos from 100 patients (gastroscopy and prostatectomy)
- **Preprocessing**: Frame extraction with 128Γ—128 pixel center crop
- **Loss Function**: Combined adversarial, image cycle consistency, and pose cycle consistency losses
- **Optimizer**: Adam with standard learning rate schedules

## πŸ›‘οΈ Limitations & Recommendations

- **Inherent Scale Ambiguity**: Common in monocular systems
- **Domain Specificity**: Trained solely on endoscopic data
- **Clinical Deployment**: Requires thorough validation and clinical trials

**We recommend**:
- Validating the model thoroughly in your target environment
- Integrating additional sensors when possible
- Collaborating with clinical experts before surgical deployment

## πŸ“š Citation

```bibtex
@misc{manni2024bodyslamgeneralizedmonocularvisual,
      title={BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications}, 
      author={G. Manni and C. Lauretti and F. Prata and R. Papalia and L. Zollo and P. Soda},
      year={2024},
      eprint={2408.03078},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.03078}
}
```

## πŸ“– Glossary

- **Cycle Consistency Loss**: Enforces agreement between original and reconstructed inputs after transformations
- **Motion Matrix (M)**: Composed of rotation (R) and unscaled translation vector (t<sub>unscaled</sub>)
- **ATE/RTE/RRE**: Absolute Trajectory Error, Relative Trajectory Error, Relative Rotation Error

## πŸ“« Contact

For questions or further information, please contact: 
**Guido Manni** - [guido.manni@unicampus.it](mailto:guido.manni@unicampus.it)

---