CycleVO / README.md

Update README.md

0037210 verified 11 months ago

4.55 kB

	---
	license: mit
	language:
	- en
	tags:
	- Visual Odometry
	- Deep Learning
	- Computer Vision
	---
	# 🔄 CycleVO

	<div align="center">

	![license: mit](https://img.shields.io/badge/license-MIT-blue)
	![tags: SLAM](https://img.shields.io/badge/tags-SLAM-brightgreen)
	![tags: Visual Odometry](https://img.shields.io/badge/tags-Visual%20Odometry-brightgreen)
	![tags: Computer Vision](https://img.shields.io/badge/tags-Computer%20Vision-brightgreen)
	![tags: Relative Camera Pose Estimation](https://img.shields.io/badge/tags-Relative%20Camera%20Pose%20Estimation-brightgreen)

	Part of the BodySLAM Framework for Endoscopic Surgical Applications

	[Paper](https://arxiv.org/abs/2408.03078) \| [GitHub](https://github.com/GuidoManni/BodySLAM)

	</div>

	---

	## 📌 Overview

	CycleVO is an unsupervised monocular pose estimation model designed to robustly estimate the relative camera pose between consecutive frames from endoscopic video. It addresses challenges such as low-texture surfaces and significant illumination variations common in surgical environments.

	<div align="center">
	<img src="CycleVO_architecture.png" alt="CycleVO Architecture Diagram" width="80%"/>
	</div>

	## ✨ Key Features

	- 🔄 Unsupervised Learning via Cycle Consistency: Inspired by CycleGAN and InfoGAN
	- ⚡ Competitive Performance and Speed: Low inference time compared to state-of-the-art methods
	- 🔌 Easy Integration with SLAM Pipelines: Provides ready-to-use motion matrices

	## 🧠 Model Details

	CycleVO learns to estimate the relative motion (i.e., camera pose) between consecutive endoscopic frames. The model predicts a motion matrix 𝑀=[𝑅,𝑡<sub>unscaled</sub>,1,0] using a generator encoder architecture augmented with a pose estimation tail.

	\| Developed by \| Guido Manni, Clemente Lauretti, Francesco Prata, Rocco Papalia, Loredana Zollo, Paolo Soda \|
	\|:-----------------\|:--------------------------------------------------------------------------------------------\|
	\| Model Type \| Unsupervised Monocular Visual Odometry / Relative Camera Pose Estimation \|
	\| License \| MIT \|
	\| Training \| From scratch using a large-scale internal endoscopic dataset \|

	## 🚀 Getting Started

	For complete documentation, please refer to the [GitHub repository](https://github.com/yourusername/BodySLAM).


	## 🔍 Use Cases

	### ✅ Ideal Applications

	- Surgical Navigation: Real-time guidance during minimally invasive procedures
	- 3D Reconstruction: Enhanced mapping of surgical scenes
	- Depth Perception: Accurate pose estimates to complement monocular depth predictors

	### ⛔ Out-of-Scope Applications

	- General-purpose visual odometry without proper domain adaptation

	## 📈 Training Details

	- Dataset: 300+ hours of endoscopic videos from 100 patients (gastroscopy and prostatectomy)
	- Preprocessing: Frame extraction with 128×128 pixel center crop
	- Loss Function: Combined adversarial, image cycle consistency, and pose cycle consistency losses
	- Optimizer: Adam with standard learning rate schedules

	## 🛡️ Limitations & Recommendations

	- Inherent Scale Ambiguity: Common in monocular systems
	- Domain Specificity: Trained solely on endoscopic data
	- Clinical Deployment: Requires thorough validation and clinical trials

	We recommend:
	- Validating the model thoroughly in your target environment
	- Integrating additional sensors when possible
	- Collaborating with clinical experts before surgical deployment

	## 📚 Citation

	```bibtex
	@misc{manni2024bodyslamgeneralizedmonocularvisual,
	title={BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications},
	author={G. Manni and C. Lauretti and F. Prata and R. Papalia and L. Zollo and P. Soda},
	year={2024},
	eprint={2408.03078},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2408.03078}
	}
	```

	## 📖 Glossary

	- Cycle Consistency Loss: Enforces agreement between original and reconstructed inputs after transformations
	- Motion Matrix (M): Composed of rotation (R) and unscaled translation vector (t<sub>unscaled</sub>)
	- ATE/RTE/RRE: Absolute Trajectory Error, Relative Trajectory Error, Relative Rotation Error

	## 📫 Contact

	For questions or further information, please contact:
	Guido Manni - [guido.manni@unicampus.it](mailto:guido.manni@unicampus.it)

	---