PubMed-Ophtha Detection Models

This repository contains the three detection and classification models used in the PubMed-Ophtha dataset pipeline for parsing ophthalmological figures from scientific publications.

Paper: Hallitschke V.J., Eickhoff C., Berens P. PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature. arXiv:2605.02720 (2026).

Models

The repository contains three model checkpoints under models/:

Directory Checkpoint Framework Architecture Task Classes
imaging_type_detection_1515892632/ model_0003909.pth PyTorch (Detectron2) RetinaNet + ResNet FPN Image type detection CFP, OCT, Retinal Imaging, Other
panel_detection_1020880423/ model_0026865.pth PyTorch (Detectron2) RetinaNet + ResNet FPN Panel & identifier detection Panel, Label
mark_status_classifier_482239176/ model_epoch_7.pth PyTorch ResNet-50 Mark status classification Plain, Annotated

Each Detectron2 model directory also contains a config.yaml required for inference.

Panel Detection Model

Detects panels and panel identifier labels (e.g. "A", "B") within multi-panel figures. Trained on the PubMed-Ophtha-Annotation dataset merged with PanelSeg and ImageCLEF2016, starting from an ImageCLEF2016-pretrained checkpoint.

  • mAP@0.50: 0.909 (panels), 0.903 (panel identifiers)
  • mAP@0.95: 0.532 (panels), 0.018 (panel identifiers)

Image Type Detection Model

Detects individual images within a panel and assigns each a retinal imaging modality: color fundus photography (CFP), optical coherence tomography (OCT), retinal imaging (ultra-wide field / fluorescein angiography), or other (graphs, ultrasound, etc.).

Mark Status Classifier

A ResNet-50 binary classifier applied to cropped image regions detected by the image type model. Predicts whether an image contains annotation marks such as arrows, dots, or bounding boxes.

  • Accuracy: 89.5% on the held-out test set

Usage

Models are consumed by the pubmed-ophtha Python package. Download all weights with:

pip install pubmed-ophtha
pubmed-ophtha-split pull-models --local-dir .

Or download directly via huggingface_hub:

from huggingface_hub import snapshot_download

snapshot_download(repo_id="pubmed-ophtha/detection-models", local_dir=".")

After downloading, run inference via the DetectronFigureSplitter:

from pubmed_ophtha.figure_splitting.detectron_figure_splitter import DetectronFigureSplitter
from pubmed_ophtha.const.models import get_default_model_args

splitter = DetectronFigureSplitter(**get_default_model_args())

with open("figure.png", "rb") as f:
    image_bytes = f.read()

predictions = splitter.predict(image_bytes)
# Keys: pred_boxes, pred_classes, scores,
#       secondary_pred_classes, secondary_scores, keep_after_nms

pred_classes contains Panel/Label detections from the panel detection model followed by CFP/OCT/Retinal Imaging/Other detections from the image type model. secondary_pred_classes contains the Plain/Annotated mark status for each image detection (set to "None" for panel detections).

Training

Both RetinaNet models use a ResNet backbone with FPN, finetuned from an ImageCLEF2016-pretrained Detectron2 checkpoint on the PubMed-Ophtha-Annotation dataset. The ResNet-50 classifier was trained from an ImageNet-pretrained checkpoint for 35 epochs with random cropping, flips, affine transformations, and color augmentation.

Dataset

The ground-truth annotations used for training are available as part of the PubMed-Ophtha dataset: huggingface.co/datasets/pubmed-ophtha/PubMed-Ophtha

Citation

@article{hallitschke2026pubmed,
  title={{PubMed-Ophtha}: An open resource for training ophthalmology vision-language models on scientific literature},
  author={Hallitschke, Verena Jasmin and Eickhoff, Carsten and Berens, Philipp},
  journal={arXiv preprint arXiv:2605.02720},
  year={2026}
}

License

MIT — see LICENSE.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for pubmed-ophtha/detection-models