jimnoneill's picture
Upload ENVISION eye imaging classifier v1.0
689732f verified
---
license: mit
tags:
- text-classification
- setfit
- sentence-embedding
- eye-imaging
- ophthalmology
- medical-imaging
- fair-data
- eyeact
---
# Envision Eye Imaging Classifier
SetFit few-shot classifier for identifying eye imaging datasets from scientific metadata.
**Developed by**: FAIR Data Innovations Hub in collaboration with the EyeACT Study
## Model Description
Uses `Alibaba-NLP/gte-large-en-v1.5` as backbone with 4-class classification:
- **EYE_IMAGING (3)**: Actual ophthalmic imaging datasets (fundus, OCT, OCTA, cornea)
- **EYE_SOFTWARE (2)**: Code, tools, models for eye imaging
- **EDGE_CASE (1)**: Eye research papers, reviews, non-imaging data
- **NEGATIVE (0)**: Not eye-related
## Results on Zenodo
Tested on 515 Zenodo datasets (filtered to `resource_type=dataset` only):
| Class | Count |
|-------|-------|
| EYE_IMAGING | 120 |
| EYE_SOFTWARE | 66 |
| EDGE_CASE | 3 |
| NEGATIVE | 325 |
### Confidence Distribution (EYE_IMAGING)
| Confidence | Count | % |
|------------|-------|---|
| High (≥0.95) | 117 | 97.5% |
| Medium (0.80-0.95) | 2 | 1.7% |
| Lower (<0.80) | 1 | 0.8% |
### Data Pipeline
- Scraped with datasets-only filter
- ZIP contents inspected via HTTP Range requests (31,958 files catalogued)
- Genomics files excluded (.fasta, .h5ad, .vcf, etc.)
## Training
- **Examples**: 452 (99 positive, 30 software, 90 edge case, 233 negative)
- **Epochs**: 2
- **Batch Size**: 16
## Usage
```python
from sentence_transformers import SentenceTransformer
import joblib
model = SentenceTransformer("jimnoneill/envision-eye-imaging-classifier", trust_remote_code=True)
head = joblib.load("model_head.pkl")
embeddings = model.encode(["Retinal OCT dataset for diabetic retinopathy"])
predictions = head.predict(embeddings)
```
## Citation
- EyeACT Envision project
- FAIR Data Innovations Hub (fairdataihub.org)
- Alibaba-NLP/gte-large-en-v1.5
## Contact
EyeACT team: [eyeactstudy.org](https://eyeactstudy.org)