File size: 1,964 Bytes
844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 689732f 8b01cd7 689732f 844adb9 8b01cd7 844adb9 689732f 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 844adb9 8b01cd7 d01c30c 8b01cd7 689732f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | ---
license: mit
tags:
- text-classification
- setfit
- sentence-embedding
- eye-imaging
- ophthalmology
- medical-imaging
- fair-data
- eyeact
---
# Envision Eye Imaging Classifier
SetFit few-shot classifier for identifying eye imaging datasets from scientific metadata.
**Developed by**: FAIR Data Innovations Hub in collaboration with the EyeACT Study
## Model Description
Uses `Alibaba-NLP/gte-large-en-v1.5` as backbone with 4-class classification:
- **EYE_IMAGING (3)**: Actual ophthalmic imaging datasets (fundus, OCT, OCTA, cornea)
- **EYE_SOFTWARE (2)**: Code, tools, models for eye imaging
- **EDGE_CASE (1)**: Eye research papers, reviews, non-imaging data
- **NEGATIVE (0)**: Not eye-related
## Results on Zenodo
Tested on 515 Zenodo datasets (filtered to `resource_type=dataset` only):
| Class | Count |
|-------|-------|
| EYE_IMAGING | 120 |
| EYE_SOFTWARE | 66 |
| EDGE_CASE | 3 |
| NEGATIVE | 325 |
### Confidence Distribution (EYE_IMAGING)
| Confidence | Count | % |
|------------|-------|---|
| High (≥0.95) | 117 | 97.5% |
| Medium (0.80-0.95) | 2 | 1.7% |
| Lower (<0.80) | 1 | 0.8% |
### Data Pipeline
- Scraped with datasets-only filter
- ZIP contents inspected via HTTP Range requests (31,958 files catalogued)
- Genomics files excluded (.fasta, .h5ad, .vcf, etc.)
## Training
- **Examples**: 452 (99 positive, 30 software, 90 edge case, 233 negative)
- **Epochs**: 2
- **Batch Size**: 16
## Usage
```python
from sentence_transformers import SentenceTransformer
import joblib
model = SentenceTransformer("jimnoneill/envision-eye-imaging-classifier", trust_remote_code=True)
head = joblib.load("model_head.pkl")
embeddings = model.encode(["Retinal OCT dataset for diabetic retinopathy"])
predictions = head.predict(embeddings)
```
## Citation
- EyeACT Envision project
- FAIR Data Innovations Hub (fairdataihub.org)
- Alibaba-NLP/gte-large-en-v1.5
## Contact
EyeACT team: [eyeactstudy.org](https://eyeactstudy.org)
|