--- pipeline_tag: image-classification tags: - model_hub_mixin - pytorch_model_hub_mixin - OCT - classification - retinal-imaging --- ## RetFiner-VisionFM This repo contains the weights of RetFiner-VisionFM from the paper [RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models](https://arxiv.org/abs/2506.22149). Project page: [RetFiner](https://github.com/ronnief1/RetFiner) ### Required Libraries This model requires specific Python libraries: ```bash torch==2.4.1+cu118 timm==0.4.12 torchvision==0.19.1+cu118 ``` To use the model, please download the ViT implementation from [visionfm_hf.py](https://github.com/ronnief1/RetFiner/blob/main/RetFiner/visionfm_hf.py) Note: if you are using this for downstream inference, adjust the num_classes based on the target dataset. ```python import torch import torch.nn as nn from huggingface_hub import PyTorchModelHubMixin from visionfm_hf import VisionTransformer class RetFiner(nn.Module, PyTorchModelHubMixin): """RetFiner: Fine-tuned ViT models for retinal image analysis""" def __init__(self, model_name: str = "RetFiner-VisionFM", num_classes: int = 2, **kwargs): super().__init__() self.model = VisionTransformer(return_all_tokens=True, qkv_bias=True) self.config = { **kwargs } def forward(self, x): return self.model(x) model = RetFiner.from_pretrained('ronnief1/RetFiner-VisionFM') ``` ## Citation Please cite the original paper if you use this model: ```python @misc{fecso2025retfinervisionlanguagerefinementscheme, title={RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models}, author={Ronald Fecso and José Morano and Ursula Schmidt-Erfurth and Hrvoje Bogunović}, year={2025}, eprint={2506.22149}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2506.22149}, } ```