| Specialized vision-language models for clinical ophthalmology (specifically AMD in retinal OCT) | |
| See [paper in npj Digital Medicine](https://www.nature.com/articles/s41746-025-01893-8) | |
| These versions of the model are not applicable for clinical use, as they were developed for research purposes | |
| These models use the 8 bit versions of meta-llama/Meta-Llama-3-8B-Instruct | |
| They were designed to accept a fovea-centered retinal OCT image of size 192x192, with physical pixel dimensions of 7.0×23.4 μm2, from the Topcon scanner | |
| These models also accept an associated textual instruction and outputs a textual response | |
| The results in paper relate to these specifications, and performance cannot be guaranteed for other image types, sizes or anatomical locations in the retina | |
| To use RetinaVLM, first clone: [https://github.com/RobbieHolland/SpecialistVLMs](https://github.com/RobbieHolland/SpecialistVLMs) | |
| And run: `models/retinavlm_wrapper.py model=minigpt4 dataset/task=all pretrained_models=specialist_v5_192px` | |