English
medical
afb / README.md
paul-english's picture
Update README.md
e4f0453 verified
---
datasets:
- arup-ri/kinyoun_afb_50k
language:
- en
tags:
- medical
---
This is an object detection model for finding Acid-fast bacteria (AFB) trained on tiles pulled from Kinyoun stained whole-slide images (WSIs).
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Applied AI & Bioinformatics group within the Research & Innovation unit of ARUP Labs
- **Funded by:** ARUP Laboratories
- **Model type:** Object Detection
- **License:** [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)
- **Finetuned from model:** [ConvNeXt Base pretrained on ImageNet1k](https://docs.pytorch.org/vision/stable/models/generated/torchvision.models.convnext_base.html)
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/arup-ri/afb_detection
- **Paper:** in press, to be updated
## Uses
This model is intended to be used for research and academic purposes only.
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
The model was trained & is intended for detection of AFB on Kinyoun-stained WSIs. It can be applied to tiles from new Kinyoun-stained WSIs without fine-tuning, with the caveats mentioned below.
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
The model was trained exclusively on Kinyoun-stained WSIs and has not been tested on other AFB staining techniques such as Ziehl-Neelsen; it is unknown how well it might generalize to such data.
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
Since the model's training data was limited to a single AFB clinical laboratory at a single institution, it is likely to suffer from some domain shift on data from other laboratories and WSI scanners.
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Fine-tuning on new data could likely help bridge the domain shift to new data.
For research use only!
## How to Get Started with the Model
Training and inference code for this model [is available here](https://github.com/arup-ri/afb_detection).
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
Training & testing datasets: https://huggingface.co/datasets/arup-ri/kinyoun_afb_50k
Training data consisted of approximately 50k 256x256 pixel tiles. Approximately 20% of these tiles have bounding-box-annotated AFB, while the remaining tiles were taken from AFB negative slides to increase the diversity of background debris and artifacts that the model should learn to recognize as _not_ AFB. More details can be found in the paper.
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing
Tiles were extracted from WSIs and resized to a consistent physical size (0.2878 microns per pixel).
Standard image augmentations such as flips, rotatations, weak Gaussian blurring, and brightness adjustments were randomly applied to images during training.
#### Training Hyperparameters
- **Training regime:** <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
See yaml configs and source code for a complete listing and explanation of hyperparameters.
#### Speeds, Sizes, Times
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
On an L40S GPU, a training run to generate this model can be done in about 1-1.5 hours, depending heavily on choice of early stopping and to a lesser extent on hyperparameters such as batch size, etc.
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
Training & testing datasets: https://huggingface.co/datasets/arup-ri/kinyoun_afb_50k
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
Object detection performance was evaluated with precision, recall, and F-score on a small held-out test set of tiles. Note that this test set was, like the training set, enriched for difficult tiles as described in more detail in the paper.
Prediction of WSI labels (AFB positive or negative) was done by computing predicted AFB density, i.e., number of predicted AFB object detections divided by the are of tiles seen by the model, and thresholding this density. See the paper for more details.
### Results
The best object detection F-score achieved on our test set was 0.53, though this would be substantially higher on a test set of randomly sampled tiles instead of a set enriched for difficult tiles.
We quantified the WSI-level predictions with an ROC-like true _negative_ vs false _negative_ rate curve, and our our model achieved and AUC of 0.78 on our validation set of WSIs.
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** We made use of Nvidia L40s and H100 for various training runs with on-premise GPU hardware
- **Hours used:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
## Technical Specifications
### Model Architecture and Objective
[ConvNeXt backbone](https://docs.pytorch.org/vision/stable/models/convnext.html)
with [FCOS object detection head](https://docs.pytorch.org/vision/stable/models/fcos.html). We also experimented with a [ResNet50 backbone](https://docs.pytorch.org/vision/stable/models/resnet.html) and
[Faster R-CNN object detecttion head](https://docs.pytorch.org/vision/stable/models/faster_rcnn.html), but we found moderately improved performance with the ConvNeXt + FCOS pair. [Faster R-CNN also suffered from a nasty bug](https://github.com/pytorch/vision/issues/8206) which FCOS allowed us to circumvent.
### Compute Infrastructure
#### Hardware
On-prem L40S and H100 GPUs.
#### Software
Model architecture used torchvision implementations. [See source code for further detail](https://github.com/arup-ri/afb_detection).
## Citation
**BibTeX:**
```
@article{english_use_2025,
title = {Use of a convolutional neural network for direct detection of acid-fast bacilli from clinical specimens},
volume = {0},
url = {https://journals.asm.org/doi/10.1128/spectrum.00602-25},
doi = {10.1128/spectrum.00602-25},
number = {0},
urldate = {2025-06-25},
journal = {Microbiology Spectrum},
author = {English, Paul and Morrison, Muir J. and Mathison, Blaine and Enrico, Elizabeth and Shean, Ryan and O'Fallon, Brendan and Rupp, Deven and Knight, Katie and Rangel, Alexandra and Gilivary, Jeffrey and Vance, Amanda and Hatch, Haleina and Lin, Leo and Ng, David P. and Shakir, Salika M.},
month = jun,
year = {2025},
note = {Publisher: American Society for Microbiology},
pages = {e00602--25},
file = {Full Text PDF:/Users/paul.english/Zotero/storage/PBFKYY88/English et al. - 2025 - Use of a convolutional neural network for direct detection of acid-fast bacilli from clinical specim.pdf:application/pdf},
}
```