--- license: mit tags: - text-classification - setfit - sentence-embedding - eye-imaging - ophthalmology - medical-imaging - fair-data - eyeact --- # Envision Eye Imaging Classifier SetFit few-shot classifier for identifying eye imaging datasets from scientific metadata. **Developed by**: FAIR Data Innovations Hub in collaboration with the EyeACT Study ## Model Description Uses `Alibaba-NLP/gte-large-en-v1.5` as backbone with 4-class classification: - **EYE_IMAGING (3)**: Actual ophthalmic imaging datasets (fundus, OCT, OCTA, cornea) - **EYE_SOFTWARE (2)**: Code, tools, models for eye imaging - **EDGE_CASE (1)**: Eye research papers, reviews, non-imaging data - **NEGATIVE (0)**: Not eye-related ## Results on Zenodo Tested on 515 Zenodo datasets (filtered to `resource_type=dataset` only): | Class | Count | |-------|-------| | EYE_IMAGING | 120 | | EYE_SOFTWARE | 66 | | EDGE_CASE | 3 | | NEGATIVE | 325 | ### Confidence Distribution (EYE_IMAGING) | Confidence | Count | % | |------------|-------|---| | High (≥0.95) | 117 | 97.5% | | Medium (0.80-0.95) | 2 | 1.7% | | Lower (<0.80) | 1 | 0.8% | ### Data Pipeline - Scraped with datasets-only filter - ZIP contents inspected via HTTP Range requests (31,958 files catalogued) - Genomics files excluded (.fasta, .h5ad, .vcf, etc.) ## Training - **Examples**: 452 (99 positive, 30 software, 90 edge case, 233 negative) - **Epochs**: 2 - **Batch Size**: 16 ## Usage ```python from sentence_transformers import SentenceTransformer import joblib model = SentenceTransformer("jimnoneill/envision-eye-imaging-classifier", trust_remote_code=True) head = joblib.load("model_head.pkl") embeddings = model.encode(["Retinal OCT dataset for diabetic retinopathy"]) predictions = head.predict(embeddings) ``` ## Citation - EyeACT Envision project - FAIR Data Innovations Hub (fairdataihub.org) - Alibaba-NLP/gte-large-en-v1.5 ## Contact EyeACT team: [eyeactstudy.org](https://eyeactstudy.org)