| | --- |
| | language: |
| | - en |
| | license: apache-2.0 |
| | tags: |
| | - medical |
| | - vision |
| | - image-classification |
| | - multi-label |
| | - fundus-imaging |
| | - ophthalmology |
| | - efficientnet |
| | datasets: |
| | - ODIR-5K |
| | - RFMiD |
| | metrics: |
| | - f1 |
| | - map |
| | --- |
| | |
| | # OcuNet v4 - Multi-Label Retinal Disease Classification |
| |
|
| | OcuNet v4 is an advanced multi-label deep learning model designed for ophthalmic disease screening using retinal fundus images. Based on the **EfficientNet-B3** architecture, it classifies images into 30 distinct categories, including 28 specific diseases, a general “Disease Risk” class, and a “Normal” class. The model serves as a robust clinical decision support tool capable of detecting concurring pathologies in a single image. |
| |
|
| | ## Model Details |
| |
|
| | - **Model Type:** Multi-Label Image Classifier |
| | - **Architecture:** EfficientNet-B3 (Pre-trained on ImageNet) |
| | - **Input Resolution:** 384x384 RGB |
| | - **Loss Function:** Asymmetric Loss (Optimized for heavily imbalanced multi-label datasets) |
| | - **Framework:** PyTorch |
| | - **Version:** OcuNet Phase 2 (v4.2.0) |
| |
|
| | ## Intended Use |
| |
|
| | - **Primary Use Case:** Automated screening and diagnosis support of ophthalmic conditions from retinal fundus imagery. |
| | - **Target Audience:** Ophthalmologists, medical practitioners, and researchers in medical imaging. |
| | - **Out of Scope:** This model is intended for clinical *decision support only* and should not replace professional medical diagnosis. |
| |
|
| | ## Dataset |
| |
|
| | The model was trained on a comprehensive compilation of datasets comprising **23,659 images** in total: |
| | 1. **Phase 1 Dataset:** ODIR-5K (Ocular Disease Intelligent Recognition) |
| | 2. **Phase 2 Dataset:** RFMiD (Retinal Fundus Multi-Disease Image Dataset) |
| | 3. **Phase 3 Dataset:** Proprietary Augmented Dataset addressing class imbalances |
| |
|
| | **Data Splits:** |
| | - **Train:** 16,240 images |
| | - **Validation:** 3,709 images |
| | - **Test:** 3,710 images |
| |
|
| | ## Class Distribution (30 Labels) |
| | The model predicts the following conditions: |
| | * `Disease_Risk`, `DR` (Diabetic Retinopathy), `ARMD` (Age-related Macular Degeneration), `MH` (Macular Hole), `DN` (Diabetic Neuropathy), `MYA` (Myopia), `BRVO`, `TSLN`, `ERM`, `LS`, `MS`, `CSR`, `ODC` (Optic Disc Cupping), `CRVO`, `AH`, `ODP`, `ODE`, `AION`, `PT`, `RT`, `RS`, `CRS`, `EDN`, `RPEC`, `MHL`, `CATARACT`, `GLAUCOMA`, `NORMAL`, `RD` (Retinal Detachment), `RP` (Retinitis Pigmentosa) |
| |
|
| | ## Training Configuration |
| | - **Batch Size:** 16 (Gradient Accumulation Steps = 1) |
| | - **Epochs:** 200 (Early stopping triggered at Epoch 104) |
| | - **Optimizer Learning Rate:** 1.00e-07 to 3.00e-04 (Peak) |
| | - **Warmup:** 5 epochs |
| | - **Hardware Profile:** Trained on NVIDIA GeForce RTX 4050 Laptop GPU (6GB VRAM) using advanced training techniques such as EMA (Exponential Moving Average) and class-specific threshold tuning (Calibration mapping). |
| |
|
| | ## Preprocessing & Augmentation |
| | - **Preprocessing:** Fundus ROI Crop (removes black borders) and CLAHE (Contrast Limited Adaptive Histogram Equalization) applied to the green channel. |
| | - **Augmentation:** RandAugment, Random Erasing, Color Jittering, and Geometric transformations (tuned specifically to avoid unrealistic medical artifacts). |
| |
|
| | ## Evaluation Results (Validation Set) |
| |
|
| | The model's optimal performance was achieved at **Epoch 74**: |
| | - **Best Validation mean Average Precision (mAP):** 0.4914 |
| | - **Best Validation F1-Score:** 0.2517 (Macro) |
| |
|
| | *(Note: Multi-label classification with 30 classes containing extremely rare and concurrent pathologies typically yields lower raw F1/mAP metric scales compared to binary classification. Per-class metrics showcase higher reliability on prevalent diseases like DR, Glaucoma, and Myopia).* |
| |
|
| | ## How to run inference |
| |
|
| | You can use the model with the customized prediction pipeline included in the OcuNet repository: |
| |
|
| | ```python |
| | from predict import ImprovedMultiLabelClassifier |
| | |
| | # Initialize the model with confidence thresholds |
| | classifier = ImprovedMultiLabelClassifier( |
| | checkpoint_path="models/ocunetv4.pth", |
| | config_path="config/config.yaml" |
| | ) |
| | |
| | # Run Inference |
| | result = classifier.predict("path/to/retinal_image.jpg") |
| | |
| | # Print detected diseases |
| | print(f"Detected: {result['detected_diseases']}") |
| | for disease, prob in result['probabilities'].items(): |
| | print(f"{disease}: {prob:.2%}") |
| | ``` |
| |
|
| | ## Limitations and Bias |
| | - **Class Imbalance:** Despite using Asymmetric Loss and augmentation, extremely rare anomalies (e.g., MHL, ERM, RT) have fewer representations, which may lead to varying thresholds of sensitivity. |
| | - **Image Quality Reliance:** Performance may degrade significantly if input images exhibit uncorrected poor illumination or lack proper fundus anatomical visibility. |
| | - **Generalization:** Model is trained on adult fundus images; efficacy on pediatric patient populations is untested. |
| |
|
| | ## Disclaimer |
| | This model is developed for research and educational purposes. It must thoroughly undergo clinical trials and obtain appropriate regulatory approval before deployment in real-world clinical environments. |
| |
|