nielsr HF Staff

Add pipeline tag and improve model card

448894f verified 3 days ago

6.12 kB

license: apache-2.0
pipeline_tag: image-classification
tags:
  - medical
  - surgical
  - endoscopy

📚 Paper - 🤖 GitHub

This repository provides the models used in the data curation pipeline for the paper LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings. These models assist in constructing the LEMON dataset by filtering and processing surgical video content.

For more details about the LEMON dataset and our LemonFM foundation model, please visit our GitHub repository.

Citation

If you use our dataset, model, or code in your research, please cite our paper:

@misc{che2025lemonlargeendoscopicmonocular,
      title={LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings}, 
      author={Chengan Che and Chao Wang and Tom Vercauteren messenger, Sophia Tsoka and Luis C. Garcia-Peraza-Herrera},
      year={2025},
      eprint={2503.19740},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.19740}, 
}

Model Overview

This Hugging Face repository includes video storyboard classification models, frame classification models, and non-surgical object detection models. The model loader file can be found at model_loader.py.

Model	Architecture	Download
Video storyboard classification models	ResNet-18	Full ckpt
Frame classification models	ResNet-18	Full ckpt
Non-surgical object detection models	Yolov8-Nano	Full ckpt

The data curation pipeline leading to the clean videos in the LEMON dataset is as follows:

Usage

Video classification models

Video classification models are employed in step 2 of the data curation pipeline to classify a video storyboard as either surgical or non-surgical:

import torch
import torchvision
from PIL import Image
from model_loader import build_model

# Load the model
net = build_model(mode='classify')
model_path = 'Video storyboard classification models'

# Enable multi-GPU support
net = torch.nn.DataParallel(net)
torch.backends.cudnn.benchmark = True
state = torch.load(model_path, map_location=torch.device('cpu'))
net.load_state_dict(state['net'])
net.eval()

# Load the video storyboard and convert it to a PyTorch tensor
img_path = 'path/to/your/image.jpg'
img = Image.open(img_path)
img = img.resize((224, 224))
transform = torchvision.transforms.Compose([
 torchvision.transforms.ToTensor(),
 torchvision.transforms.Normalize(
     (0.4299694, 0.29676908, 0.27707579), 
     (0.24373249, 0.20208984, 0.19319402)
 )
])
img_tensor = transform(img).unsqueeze(0).to('cuda')

# Extract features from the image
outputs = net(img_tensor)

Frame classification models

Frame classification models are used in step 3 of the data curation pipeline to classify a frame as either surgical or non-surgical:

import torch
import torchvision
from PIL import Image
from model_loader import build_model

# Load the model
net = build_model(mode='classify')
model_path = 'Frame classification models'

# Enable multi-GPU support
net = torch.nn.DataParallel(net)
torch.backends.cudnn.benchmark = True
state = torch.load(model_path, map_location=torch.device('cpu'))
net.load_state_dict(state['net'])
net.eval()

img_path = 'path/to/your/image.jpg'
img = Image.open(img_path)
img = img.resize((224, 224))
transform = torchvision.transforms.Compose([
 torchvision.transforms.ToTensor(),
 torchvision.transforms.Normalize(
     (0.4299694, 0.29676908, 0.27707579), 
     (0.24373249, 0.20208984, 0.19319402)
 )
])
img_tensor = transform(img).unsqueeze(0).to('cuda')

# Extract features from the image
outputs = net(img_tensor)

Non-surgical object detection models

Non-surgical object detection models are used to obliterate the non-surgical region in the surgical frames (e.g. user interface information):

import torch
import torchvision
from PIL import Image
from model_loader import build_model

# Load the model
net = build_model(mode='mask')
model_path = 'Frame classification models'

# Enable multi-GPU support
net = torch.nn.DataParallel(net)
torch.backends.cudnn.benchmark = True
state = torch.load(model_path, map_location=torch.device('cpu'))
net.load_state_dict(state['net'])
net.eval()

img_path = 'path/to/your/image.jpg'
img = Image.open(img_path)
img = img.resize((224, 224))
transform = torchvision.transforms.Compose([
 torchvision.transforms.ToTensor(),
 torchvision.transforms.Normalize(
     (0.4299694, 0.29676908, 0.27707579), 
     (0.24373249, 0.20208984, 0.19319402)
 )
])
img_tensor = transform(img).unsqueeze(0).to('cuda')

# Extract features from the image
outputs = net(img_tensor)