File size: 6,118 Bytes
23aa901 448894f 23aa901 842298a af5a30e a7dde2b af5a30e b8fcab7 842298a 448894f 7fa4947 448894f 7fa4947 448894f d7056ac b8fcab7 448894f 52b34ed 7fa4947 448894f af5a30e 448894f 23aa901 f0390b2 23aa901 f0390b2 23aa901 f0390b2 23aa901 842298a b8fcab7 5165eac af5a30e 5165eac 448894f 69f01c7 218337f 69f01c7 e7cb7c4 69f01c7 218337f 69f01c7 e7cb7c4 448894f e7cb7c4 a358e4e e7cb7c4 218337f e7cb7c4 448894f e7cb7c4 a358e4e e7cb7c4 218337f e7cb7c4 69f01c7 448894f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | ---
license: apache-2.0
pipeline_tag: image-classification
tags:
- medical
- surgical
- endoscopy
---
<div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/cE7UgFfJJ2gUHJr0SSEhc.png"> </img>
</div>
[📚 Paper](https://arxiv.org/abs/2503.19740) - [🤖 GitHub](https://github.com/visurg-ai/LEMON)
This repository provides the models used in the data curation pipeline for the paper [LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings](https://arxiv.org/abs/2503.19740). These models assist in constructing the LEMON dataset by filtering and processing surgical video content.
For more details about the LEMON dataset and our LemonFM foundation model, please visit our [GitHub repository](https://github.com/visurg-ai/LEMON).
## Citation
If you use our dataset, model, or code in your research, please cite our paper:
```bibtex
@misc{che2025lemonlargeendoscopicmonocular,
title={LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings},
author={Chengan Che and Chao Wang and Tom Vercauteren messenger, Sophia Tsoka and Luis C. Garcia-Peraza-Herrera},
year={2025},
eprint={2503.19740},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.19740},
}
```
## Model Overview
This Hugging Face repository includes video storyboard classification models, frame classification models, and non-surgical object detection models. The model loader file can be found at [model_loader.py](https://huggingface.co/visurg/Surg3M_curation_models/blob/main/model_loader.py).
<div align="center">
<table style="margin-left: auto; margin-right: auto;">
<tr>
<th>Model</th>
<th>Architecture</th>
<th colspan="5">Download</th>
</tr>
<tr>
<td>Video storyboard classification models</td>
<td>ResNet-18</td>
<td><a href="https://huggingface.co/visurg/Surg3M_curation_models/tree/main/video_storyboard_classification">Full ckpt</a></td>
</tr>
<tr>
<td>Frame classification models</td>
<td>ResNet-18</td>
<td><a href="https://huggingface.co/visurg/Surg3M_curation_models/tree/main/frame_classification">Full ckpt</a></td>
</tr>
<tr>
<td>Non-surgical object detection models</td>
<td>Yolov8-Nano</td>
<td><a href="https://huggingface.co/visurg/Surg3M_curation_models/tree/main/nonsurgical_object_detection">Full ckpt</a></td>
</tr>
</table>
</div>
The data curation pipeline leading to the clean videos in the LEMON dataset is as follows:
<div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/jzw36jlPT-V_I-Vm01OzO.png"> </img>
</div>
## Usage
### Video classification models
**Video classification models** are employed in step **2** of the data curation pipeline to classify a video storyboard as either surgical or non-surgical:
```python
import torch
import torchvision
from PIL import Image
from model_loader import build_model
# Load the model
net = build_model(mode='classify')
model_path = 'Video storyboard classification models'
# Enable multi-GPU support
net = torch.nn.DataParallel(net)
torch.backends.cudnn.benchmark = True
state = torch.load(model_path, map_location=torch.device('cpu'))
net.load_state_dict(state['net'])
net.eval()
# Load the video storyboard and convert it to a PyTorch tensor
img_path = 'path/to/your/image.jpg'
img = Image.open(img_path)
img = img.resize((224, 224))
transform = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(
(0.4299694, 0.29676908, 0.27707579),
(0.24373249, 0.20208984, 0.19319402)
)
])
img_tensor = transform(img).unsqueeze(0).to('cuda')
# Extract features from the image
outputs = net(img_tensor)
```
### Frame classification models
**Frame classification models** are used in step **3** of the data curation pipeline to classify a frame as either surgical or non-surgical:
```python
import torch
import torchvision
from PIL import Image
from model_loader import build_model
# Load the model
net = build_model(mode='classify')
model_path = 'Frame classification models'
# Enable multi-GPU support
net = torch.nn.DataParallel(net)
torch.backends.cudnn.benchmark = True
state = torch.load(model_path, map_location=torch.device('cpu'))
net.load_state_dict(state['net'])
net.eval()
img_path = 'path/to/your/image.jpg'
img = Image.open(img_path)
img = img.resize((224, 224))
transform = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(
(0.4299694, 0.29676908, 0.27707579),
(0.24373249, 0.20208984, 0.19319402)
)
])
img_tensor = transform(img).unsqueeze(0).to('cuda')
# Extract features from the image
outputs = net(img_tensor)
```
### Non-surgical object detection models
**Non-surgical object detection models** are used to obliterate the non-surgical region in the surgical frames (e.g. user interface information):
```python
import torch
import torchvision
from PIL import Image
from model_loader import build_model
# Load the model
net = build_model(mode='mask')
model_path = 'Frame classification models'
# Enable multi-GPU support
net = torch.nn.DataParallel(net)
torch.backends.cudnn.benchmark = True
state = torch.load(model_path, map_location=torch.device('cpu'))
net.load_state_dict(state['net'])
net.eval()
img_path = 'path/to/your/image.jpg'
img = Image.open(img_path)
img = img.resize((224, 224))
transform = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(
(0.4299694, 0.29676908, 0.27707579),
(0.24373249, 0.20208984, 0.19319402)
)
])
img_tensor = transform(img).unsqueeze(0).to('cuda')
# Extract features from the image
outputs = net(img_tensor)
``` |