File size: 3,690 Bytes
92ba255
 
a423288
 
 
 
 
 
 
 
92ba255
a423288
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---

license: cc-by-nc-sa-4.0
language:
- en
tags:
- pathology
- computational pathology
- LUAD
- EGFR
- computational biomarkers
---


# EAGLE

The use of artificial intelligence (AI) models to develop computational biomarkers from H&E-stained digital histopathology images has emerged as a promising diagnostic approach for enhancing clinical management for cancer patients. Computational biomarkers offer several advantages: 1) they are digitally deployed, 2) cost-effective, and 3) do not consume tissue. Despite numerous promising models in the literature, their clinical utility in real-world settings has yet to be established. Assessment for *EGFR* mutations in lung adenocarcinoma is challenged by a need for rapid, accurate results at a low cost while preserving tissue for comprehensive genomic sequencing. Polymerase chain reaction (PCR)-based assays are used to provide rapid results but are less accurate than genomic sequencing and deplete the tissue. Highly accurate and robust computational biomarkers, aided by use of modern foundation models, can fill such a niche. We compiled the largest, international, multi-institutional clinical cohort of digital histopathology images of lung adenocarcinomas (N=8461 cases/slides) to develop and validate a state-of-the-art computational *EGFR* biomarker. The model utilizes an open source foundation model that is fine-tuned for the task of *EGFR* classification. We demonstrate that fine-tuning the foundation model results in improved task-specific performance that generalizes across institutions and scanning protocols with clinical-level performance (mean AUC: internal 0.847, external 0.870). To realize the translation into the clinic as well as investigate its in-real-time (IRT) usability, we conducted the first-of-its-kind prospective silent trial of a computational biomarker on primary samples, achieving an AUC of 0.896. We demonstrate that an AI assisted rapid *EGFR* screening workflow reduces the amount of rapid testing needed by up to 43% while maintaining clinical standard performance. The retrospective and prospective results demonstrate for the first time the clinical utility and efficacy of an H&E-based computational biomarker in a real-world clinical setting.

## Model
The model consists of: 1) a 1.1 billion parameter vision transformer (ViT-g) that encodes high-resolution (20x magnification, 0.5 microns per pixel) 224-pixel patches into a 1,536 feature vector; 2) a gated MIL attention (GMA) aggregator that integrates all encoded patches from a slide into a global slide-level feature representation; and 3) a linear classifier that outputs the probability of an *EGFR* mutation based on the input slide data.
The tile encoder was initialized with [GigaPath](https://huggingface.co/prov-gigapath/prov-gigapath). The model was trained end-to-end for the task of predicting *EGFR* mutational status from H&E slides.

## Model Usage

To get started, first clone the repository with this command:
```bash

  git clone --no-checkout https://huggingface.co/MCCPBR/EAGLE && cd EAGLE

```

Now you can use the following code:
```python

from PIL import Image

import numpy as np

import eagle

import torch

import torchvision.transforms as transforms



# Load model

model = eagle.EAGLE()



# Set up transform

transform = transforms.Compose([

    transforms.ToTensor(),

    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))

])



# Image

img = np.random.randint(0, 256, size=224*224*3).reshape(224,224,3).astype(np.uint8)

img = Image.fromarray(img)

img = transform(img).unsqueeze(0)



# Inference

with torch.no_grad():

    h, att, p = model(img)

```