| --- |
| license: cc-by-4.0 |
| library_name: erdes |
| tags: |
| - ocular-ultrasound |
| - medical-imaging |
| - 3d-classification |
| - retinal-detachment |
| pipeline_tag: image-classification |
| --- |
| |
| # VIT — Normal Vs Rd |
|
|
| Trained model weights for **retinal detachment classification (normal vs. RD)** using ocular ultrasound videos. |
|
|
| | Resource | Link | |
| |----------|------| |
| | Paper | [](https://arxiv.org/abs/2508.04735) | |
| | Dataset | [](https://huggingface.co/datasets/pcvlab/erdes) [](https://zenodo.org/records/18644370) | |
| | Checkpoints | [](https://zenodo.org/records/18821031) | |
| | Code | [](https://github.com/OSUPCVLab/ERDES) | |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |----------|-------| |
| | Architecture | ViT (img_size=(96,128,128), patch_size=7, hidden_size=768, num_layers=4, num_heads=4) | |
| | Input modality | 3D ocular ultrasound video | |
| | Input shape | `[1, 96, 128, 128]` (C, D, H, W) | |
| | Pooling | Global Average Pooling | |
| | Output | Binary classification (sigmoid) | |
| |
| ## Labels |
| |
| | Label | Class | |
| |-------|-------| |
| | 0 | Normal | |
| | 1 | Retinal Detachment | |
| |
| ## Usage |
| |
| ```bash |
| pip install git+https://github.com/OSUPCVLab/ERDES.git ultralytics |
| ``` |
| |
| ```python |
| import torch |
| import numpy as np |
| from huggingface_hub import hf_hub_download |
| from safetensors.torch import load_file |
| from ultralytics import YOLO |
| from erdes.models.components.cls_model import ViTClassifier |
| from erdes.data.components.utils import resize |
|
|
| # --- 1. Load YOLO for ocular globe detection --- |
| yolo = YOLO(hf_hub_download("pcvlab/yolov8_ocular_ultrasound_globe_detection", "yolov8_ocular_ultrasound_globe_detection.pt")) |
|
|
| # --- 2. Crop your POCUS ultrasound video using YOLO (finds largest globe bbox across all frames) --- |
| def crop_video(video_path, model, conf=0.8): |
| # First pass: find the largest bounding box across all frames |
| area_max, cropping_bbox = 0, None |
| for frame in model.predict(video_path, stream=True, verbose=False, conf=conf): |
| if len(frame.boxes.xywhn): |
| bbox = frame.boxes.xywhn[0].cpu().numpy() |
| area = bbox[2] * bbox[3] |
| if area > area_max: |
| area_max, cropping_bbox = area, bbox |
| |
| if cropping_bbox is None: |
| raise ValueError("YOLO could not detect ocular globe in video.") |
| |
| # Second pass: crop every frame with the largest bbox |
| frames = [] |
| for frame in model.predict(video_path, stream=True, verbose=False, conf=conf): |
| img = frame.orig_img # [H, W, C] BGR |
| h, w, _ = img.shape |
| x_c, y_c, bw, bh = cropping_bbox |
| x1, y1 = int((x_c - bw/2) * w), int((y_c - bh/2) * h) |
| x2, y2 = int((x_c + bw/2) * w), int((y_c + bh/2) * h) |
| frames.append(img[y1:y2, x1:x2]) |
| |
| return np.stack(frames) # [D, H, W, C] |
| |
| frames = crop_video("your_video.mp4", yolo) # [D, H, W, C] |
|
|
| # --- 3. Preprocess --- |
| video = torch.from_numpy(frames).float() # [D, H, W, C] |
| video = video.permute(3, 0, 1, 2) # [C, D, H, W] |
| if video.shape[0] == 3: |
| video = video.mean(dim=0, keepdim=True) # grayscale [1, D, H, W] |
| video = resize((96, 128, 128))(video) / 255.0 # pad + resize + normalize |
| video = video.unsqueeze(0) # [1, 1, 96, 128, 128] |
| |
| # --- 4. Load model and run inference --- |
| model = ViTClassifier(in_channels=1, num_classes=1, img_size=[96, 128, 128], patch_size=7, hidden_size=768, num_layers=4, num_heads=4) |
| weights = load_file(hf_hub_download("pcvlab/vit_normal_vs_rd", "model.safetensors")) |
| model.load_state_dict(weights) |
| model.eval() |
|
|
| with torch.no_grad(): |
| logit = model(video) |
| prob = torch.sigmoid(logit).item() |
| pred = int(prob > 0.5) |
| |
| labels = {'0': 'Normal', '1': 'Retinal Detachment'} |
| print(f"Prediction: {labels[str(pred)]} (confidence: {prob:.3f})") |
| ``` |
| |
| ## Citation |
| |
| If you use this model, please cite the ERDES paper: |
| |
| ```bibtex |
| @misc{ozkut2026erdes, |
| title={ERDES: A Benchmark Video Dataset for Retinal Detachment and Macular Status Classification in Ocular Ultrasound}, |
| author={Yasemin Ozkut and Pouyan Navard and Srikar Adhikari and Elaine Situ-LaCasse and Josie Acu{\~n}a and Adrienne Yarnish and Alper Yilmaz}, |
| year={2026}, |
| eprint={2508.04735}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV}, |
| url={https://arxiv.org/abs/2508.04735} |
| } |
| ``` |
| |