Text Classification
setfit
Safetensors
mpnet
sentence-embedding
eye-imaging
ophthalmology
medical-imaging
fair-data
eyeact
Instructions to use fairdataihub/envision-eye-imaging-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- setfit
How to use fairdataihub/envision-eye-imaging-classifier with setfit:
from setfit import SetFitModel model = SetFitModel.from_pretrained("fairdataihub/envision-eye-imaging-classifier") - Notebooks
- Google Colab
- Kaggle
Update ENVISION eye imaging classifier v2.0
Browse files- README.md +47 -61
- model.safetensors +1 -1
- model_head.pkl +1 -1
README.md
CHANGED
|
@@ -9,95 +9,81 @@ tags:
|
|
| 9 |
- medical-imaging
|
| 10 |
- fair-data
|
| 11 |
- eyeact
|
| 12 |
-
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
# Envision Eye Imaging Classifier
|
| 16 |
|
| 17 |
-
SetFit
|
| 18 |
|
| 19 |
-
**Developed by**: FAIR Data Innovations Hub
|
| 20 |
|
| 21 |
## Model Description
|
| 22 |
|
| 23 |
Uses `sentence-transformers/all-mpnet-base-v2` as backbone with binary classification:
|
| 24 |
|
| 25 |
-
- **EYE_IMAGING (1)**: Actual ophthalmic imaging datasets (fundus
|
| 26 |
-
- **NEGATIVE (0)**: Everything else (
|
| 27 |
-
|
| 28 |
-
## Results on Zenodo
|
| 29 |
-
|
| 30 |
-
Tested on 515 Zenodo datasets (filtered to `resource_type=dataset` only):
|
| 31 |
-
|
| 32 |
-
| Class | Count |
|
| 33 |
-
|-------|-------|
|
| 34 |
-
| EYE_IMAGING | 60 |
|
| 35 |
-
| NEGATIVE | 455 |
|
| 36 |
-
|
| 37 |
-
## Training
|
| 38 |
-
|
| 39 |
-
- **Base Model**: `sentence-transformers/all-mpnet-base-v2` (768-dimensional embeddings)
|
| 40 |
-
- **Training Examples**: 891 (262 EYE_IMAGING, 629 NEGATIVE)
|
| 41 |
-
- **Positive Data Sources**: Multi-repository (Zenodo, Figshare, Dryad, Kaggle, NEI) — LLM-verified
|
| 42 |
-
- **Negative Data Sources**: Real dataset records from discovery pipelines + targeted hard negatives (non-eye medical imaging, non-eye OCT, eye-adjacent non-imaging)
|
| 43 |
-
- **Epochs**: 2
|
| 44 |
-
- **Batch Size**: 16
|
| 45 |
|
| 46 |
## Validation
|
| 47 |
|
| 48 |
-
###
|
| 49 |
|
| 50 |
-
| Metric |
|
| 51 |
|--------|-------|
|
| 52 |
-
| Accuracy | 0.
|
| 53 |
-
| Macro F1 | 0.
|
| 54 |
-
| EYE_IMAGING F1 | 0.
|
| 55 |
-
|
|
| 56 |
-
| EYE_IMAGING Recall | 0.962 |
|
| 57 |
|
| 58 |
-
###
|
| 59 |
|
| 60 |
-
| Metric |
|
| 61 |
|--------|-------|
|
| 62 |
-
| Accuracy |
|
| 63 |
-
|
|
| 64 |
-
| EYE_IMAGING
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
## Usage
|
| 68 |
|
| 69 |
```python
|
| 70 |
-
from
|
| 71 |
-
import joblib
|
| 72 |
-
|
| 73 |
-
model = SentenceTransformer("fairdataihub/envision-eye-imaging-classifier")
|
| 74 |
-
head = joblib.load("model_head.pkl")
|
| 75 |
|
| 76 |
-
|
| 77 |
-
predictions = head.predict(embeddings)
|
| 78 |
-
probabilities = head.predict_proba(embeddings)
|
| 79 |
|
| 80 |
-
|
| 81 |
-
print(f"Label: {labels[predictions[0]]}")
|
| 82 |
-
print(f"Confidence: {max(probabilities[0]):.3f}")
|
| 83 |
```
|
| 84 |
|
| 85 |
-
## Data Pipeline
|
| 86 |
-
|
| 87 |
-
- Harvests metadata from multiple scientific data repositories (Zenodo, Figshare, DataCite, Kaggle, Dryad, NEI)
|
| 88 |
-
- Classifies records as eye imaging or not
|
| 89 |
-
- Identified eye imaging datasets are registered on the [Envision Portal](https://envisionportal.org)
|
| 90 |
-
|
| 91 |
## Citation
|
| 92 |
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
-
|
| 96 |
-
- EyeACT Study ([eyeactstudy.org](https://eyeactstudy.org))
|
| 97 |
-
- Tunstall et al. (2022). "Efficient Few-Shot Learning Without Prompts" (SetFit)
|
| 98 |
-
- `sentence-transformers/all-mpnet-base-v2`
|
| 99 |
|
| 100 |
## Contact
|
| 101 |
|
| 102 |
-
|
| 103 |
-
- Bhavesh Patel (bpatel@calmi2.org)
|
|
|
|
| 9 |
- medical-imaging
|
| 10 |
- fair-data
|
| 11 |
- eyeact
|
| 12 |
+
datasets:
|
| 13 |
+
- fairdataihub/envision-eye-imaging-training-data
|
| 14 |
---
|
| 15 |
|
| 16 |
# Envision Eye Imaging Classifier
|
| 17 |
|
| 18 |
+
SetFit binary classifier for identifying eye imaging datasets from scientific metadata.
|
| 19 |
|
| 20 |
+
**Developed by**: FAIR Data Innovations Hub in collaboration with the EyeACT Study
|
| 21 |
|
| 22 |
## Model Description
|
| 23 |
|
| 24 |
Uses `sentence-transformers/all-mpnet-base-v2` as backbone with binary classification:
|
| 25 |
|
| 26 |
+
- **EYE_IMAGING (1)**: Actual ophthalmic imaging datasets (fundus, OCT, OCTA, cornea)
|
| 27 |
+
- **NEGATIVE (0)**: Everything else (software, non-imaging eye data, unrelated)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
## Validation
|
| 30 |
|
| 31 |
+
### Spot-check (33 expert-verified Zenodo records)
|
| 32 |
|
| 33 |
+
| Metric | Score |
|
| 34 |
|--------|-------|
|
| 35 |
+
| Accuracy | 0.939 (31/33) |
|
| 36 |
+
| Macro F1 | 0.923 |
|
| 37 |
+
| EYE_IMAGING F1 | 0.889 (P=0.889, R=0.889) |
|
| 38 |
+
| NEGATIVE F1 | 0.958 (P=0.958, R=0.958) |
|
|
|
|
| 39 |
|
| 40 |
+
### Held-out test set (20% stratified split)
|
| 41 |
|
| 42 |
+
| Metric | Score |
|
| 43 |
|--------|-------|
|
| 44 |
+
| Accuracy | 0.940 |
|
| 45 |
+
| Macro F1 | 0.936 |
|
| 46 |
+
| EYE_IMAGING F1 | 0.922 (P=0.887, R=0.959) |
|
| 47 |
+
| NEGATIVE F1 | 0.951 (P=0.975, R=0.929) |
|
| 48 |
+
|
| 49 |
+
### Multi-repository spot-check (6,833 records across 6 sources)
|
| 50 |
+
|
| 51 |
+
| Source | Records | EYE_IMAGING F1 | Precision | Recall |
|
| 52 |
+
|--------|---------|----------------|-----------|--------|
|
| 53 |
+
| Zenodo | 514 | 0.677 | 0.537 | 0.917 |
|
| 54 |
+
| DataCite | 1,836 | 0.866 | 0.858 | 0.874 |
|
| 55 |
+
| Figshare | 2,000 | 0.833 | 0.788 | 0.884 |
|
| 56 |
+
| Kaggle | 732 | 0.739 | 0.939 | 0.610 |
|
| 57 |
+
| Dryad | 89 | 0.764 | 0.750 | 0.778 |
|
| 58 |
+
| NEI | 1,662 | 0.814 | 0.931 | 0.724 |
|
| 59 |
+
| **Overall** | **6,833** | **0.822** | **0.845** | **0.800** |
|
| 60 |
+
|
| 61 |
+
## Training
|
| 62 |
+
|
| 63 |
+
- **Base model**: sentence-transformers/all-mpnet-base-v2 (768-dimensional)
|
| 64 |
+
- **Training data**: 994 examples (365 EYE_IMAGING, 629 NEGATIVE) from multi-repository sources (Zenodo, Figshare, Dryad, Kaggle, NEI)
|
| 65 |
+
- **Dataset**: [fairdataihub/envision-eye-imaging-training-data](https://huggingface.co/datasets/fairdataihub/envision-eye-imaging-training-data)
|
| 66 |
+
- **Epochs**: 10 (early stopping, patience=3)
|
| 67 |
+
- **Batch size**: 16
|
| 68 |
+
- **Learning rate**: 2e-5 (default)
|
| 69 |
+
- **Scheduler**: linear with 10% warmup
|
| 70 |
|
| 71 |
## Usage
|
| 72 |
|
| 73 |
```python
|
| 74 |
+
from setfit import SetFitModel
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
+
model = SetFitModel.from_pretrained("fairdataihub/envision-eye-imaging-classifier")
|
|
|
|
|
|
|
| 77 |
|
| 78 |
+
predictions = model.predict(["Retinal OCT dataset for diabetic retinopathy"])
|
|
|
|
|
|
|
| 79 |
```
|
| 80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
## Citation
|
| 82 |
|
| 83 |
+
- EyeACT Envision project
|
| 84 |
+
- FAIR Data Innovations Hub (fairdataihub.org)
|
| 85 |
+
- sentence-transformers/all-mpnet-base-v2
|
|
|
|
|
|
|
|
|
|
| 86 |
|
| 87 |
## Contact
|
| 88 |
|
| 89 |
+
EyeACT team: [eyeactstudy.org](https://eyeactstudy.org)
|
|
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 437967672
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0d8c84e85ce3f286c1abdecb4eef2bb7fc6e879dc4ca51db0c5d177a6d2f3f4f
|
| 3 |
size 437967672
|
model_head.pkl
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 7007
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0949a5a2a839e79e6d2704fdf88ad6c86efc313060871789482f92abbfd8a7ee
|
| 3 |
size 7007
|