Upload README.md with huggingface_hub

5bbf65d verified 16 days ago

4.71 kB

	---
	license: cc-by-4.0
	library_name: erdes
	tags:
	- ocular-ultrasound
	- medical-imaging
	- 3d-classification
	- retinal-detachment
	pipeline_tag: image-classification
	---

	# VIT — Normal Vs Rd

	Trained model weights for retinal detachment classification (normal vs. RD) using ocular ultrasound videos.

	\| Resource \| Link \|
	\|----------\|------\|
	\| Paper \| [![arXiv](https://img.shields.io/badge/arXiv-2508.04735-b31b1b.svg)](https://arxiv.org/abs/2508.04735) \|
	\| Dataset \| [![HF Dataset](https://img.shields.io/badge/🤗-Dataset-yellow)](https://huggingface.co/datasets/pcvlab/erdes) [![Zenodo](https://img.shields.io/badge/Zenodo-Dataset-blue)](https://zenodo.org/records/18644370) \|
	\| Checkpoints \| [![Zenodo](https://img.shields.io/badge/Zenodo-Checkpoints-blue)](https://zenodo.org/records/18821031) \|
	\| Code \| [![GitHub](https://img.shields.io/badge/GitHub-OSUPCVLab/ERDES-black?logo=github)](https://github.com/OSUPCVLab/ERDES) \|

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| ViT (img_size=(96,128,128), patch_size=7, hidden_size=768, num_layers=4, num_heads=4) \|
	\| Input modality \| 3D ocular ultrasound video \|
	\| Input shape \| `[1, 96, 128, 128]` (C, D, H, W) \|
	\| Pooling \| Global Average Pooling \|
	\| Output \| Binary classification (sigmoid) \|

	## Labels

	\| Label \| Class \|
	\|-------\|-------\|
	\| 0 \| Normal \|
	\| 1 \| Retinal Detachment \|

	## Usage

	```bash
	pip install git+https://github.com/OSUPCVLab/ERDES.git ultralytics
	```

	```python
	import torch
	import numpy as np
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file
	from ultralytics import YOLO
	from erdes.models.components.cls_model import ViTClassifier
	from erdes.data.components.utils import resize

	# --- 1. Load YOLO for ocular globe detection ---
	yolo = YOLO(hf_hub_download("pcvlab/yolov8_ocular_ultrasound_globe_detection", "yolov8_ocular_ultrasound_globe_detection.pt"))

	# --- 2. Crop your POCUS ultrasound video using YOLO (finds largest globe bbox across all frames) ---
	def crop_video(video_path, model, conf=0.8):
	# First pass: find the largest bounding box across all frames
	area_max, cropping_bbox = 0, None
	for frame in model.predict(video_path, stream=True, verbose=False, conf=conf):
	if len(frame.boxes.xywhn):
	bbox = frame.boxes.xywhn[0].cpu().numpy()
	area = bbox[2] * bbox[3]
	if area > area_max:
	area_max, cropping_bbox = area, bbox

	if cropping_bbox is None:
	raise ValueError("YOLO could not detect ocular globe in video.")

	# Second pass: crop every frame with the largest bbox
	frames = []
	for frame in model.predict(video_path, stream=True, verbose=False, conf=conf):
	img = frame.orig_img # [H, W, C] BGR
	h, w, _ = img.shape
	x_c, y_c, bw, bh = cropping_bbox
	x1, y1 = int((x_c - bw/2) * w), int((y_c - bh/2) * h)
	x2, y2 = int((x_c + bw/2) * w), int((y_c + bh/2) * h)
	frames.append(img[y1:y2, x1:x2])

	return np.stack(frames) # [D, H, W, C]

	frames = crop_video("your_video.mp4", yolo) # [D, H, W, C]

	# --- 3. Preprocess ---
	video = torch.from_numpy(frames).float() # [D, H, W, C]
	video = video.permute(3, 0, 1, 2) # [C, D, H, W]
	if video.shape[0] == 3:
	video = video.mean(dim=0, keepdim=True) # grayscale [1, D, H, W]
	video = resize((96, 128, 128))(video) / 255.0 # pad + resize + normalize
	video = video.unsqueeze(0) # [1, 1, 96, 128, 128]

	# --- 4. Load model and run inference ---
	model = ViTClassifier(in_channels=1, num_classes=1, img_size=[96, 128, 128], patch_size=7, hidden_size=768, num_layers=4, num_heads=4)
	weights = load_file(hf_hub_download("pcvlab/vit_normal_vs_rd", "model.safetensors"))
	model.load_state_dict(weights)
	model.eval()

	with torch.no_grad():
	logit = model(video)
	prob = torch.sigmoid(logit).item()
	pred = int(prob > 0.5)

	labels = {'0': 'Normal', '1': 'Retinal Detachment'}
	print(f"Prediction: {labels[str(pred)]} (confidence: {prob:.3f})")
	```

	## Citation

	If you use this model, please cite the ERDES paper:

	```bibtex
	@misc{ozkut2026erdes,
	title={ERDES: A Benchmark Video Dataset for Retinal Detachment and Macular Status Classification in Ocular Ultrasound},
	author={Yasemin Ozkut and Pouyan Navard and Srikar Adhikari and Elaine Situ-LaCasse and Josie Acu{\~n}a and Adrienne Yarnish and Alper Yilmaz},
	year={2026},
	eprint={2508.04735},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2508.04735}
	}
	```