File size: 1,964 Bytes
844adb9
 
 
 
8b01cd7
 
844adb9
8b01cd7
 
 
 
844adb9
 
8b01cd7
844adb9
8b01cd7
844adb9
8b01cd7
844adb9
8b01cd7
844adb9
8b01cd7
844adb9
8b01cd7
 
 
 
844adb9
8b01cd7
844adb9
689732f
 
8b01cd7
 
689732f
 
 
 
844adb9
8b01cd7
844adb9
689732f
 
 
 
 
 
 
 
 
 
 
844adb9
8b01cd7
844adb9
8b01cd7
 
 
844adb9
 
 
 
8b01cd7
 
844adb9
8b01cd7
 
844adb9
8b01cd7
 
844adb9
 
8b01cd7
844adb9
8b01cd7
 
 
844adb9
8b01cd7
d01c30c
8b01cd7
689732f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
license: mit
tags:
- text-classification
- setfit
- sentence-embedding
- eye-imaging
- ophthalmology
- medical-imaging
- fair-data
- eyeact
---

# Envision Eye Imaging Classifier

SetFit few-shot classifier for identifying eye imaging datasets from scientific metadata.

**Developed by**: FAIR Data Innovations Hub in collaboration with the EyeACT Study

## Model Description

Uses `Alibaba-NLP/gte-large-en-v1.5` as backbone with 4-class classification:

- **EYE_IMAGING (3)**: Actual ophthalmic imaging datasets (fundus, OCT, OCTA, cornea)
- **EYE_SOFTWARE (2)**: Code, tools, models for eye imaging
- **EDGE_CASE (1)**: Eye research papers, reviews, non-imaging data
- **NEGATIVE (0)**: Not eye-related

## Results on Zenodo

Tested on 515 Zenodo datasets (filtered to `resource_type=dataset` only):

| Class | Count |
|-------|-------|
| EYE_IMAGING | 120 |
| EYE_SOFTWARE | 66 |
| EDGE_CASE | 3 |
| NEGATIVE | 325 |

### Confidence Distribution (EYE_IMAGING)

| Confidence | Count | % |
|------------|-------|---|
| High (≥0.95) | 117 | 97.5% |
| Medium (0.80-0.95) | 2 | 1.7% |
| Lower (<0.80) | 1 | 0.8% |

### Data Pipeline

- Scraped with datasets-only filter
- ZIP contents inspected via HTTP Range requests (31,958 files catalogued)
- Genomics files excluded (.fasta, .h5ad, .vcf, etc.)

## Training

- **Examples**: 452 (99 positive, 30 software, 90 edge case, 233 negative)
- **Epochs**: 2
- **Batch Size**: 16

## Usage

```python
from sentence_transformers import SentenceTransformer
import joblib

model = SentenceTransformer("jimnoneill/envision-eye-imaging-classifier", trust_remote_code=True)
head = joblib.load("model_head.pkl")

embeddings = model.encode(["Retinal OCT dataset for diabetic retinopathy"])
predictions = head.predict(embeddings)
```

## Citation

- EyeACT Envision project
- FAIR Data Innovations Hub (fairdataihub.org)
- Alibaba-NLP/gte-large-en-v1.5

## Contact

EyeACT team: [eyeactstudy.org](https://eyeactstudy.org)