This is an object detection model for finding Acid-fast bacteria (AFB) trained on tiles pulled from Kinyoun stained whole-slide images (WSIs).
Model Details
Model Description
- Developed by: Applied AI & Bioinformatics group within the Research & Innovation unit of ARUP Labs
- Funded by: ARUP Laboratories
- Model type: Object Detection
- License: CC BY-NC-SA 4.0
- Finetuned from model: ConvNeXt Base pretrained on ImageNet1k
Model Sources
- Repository: https://github.com/arup-ri/afb_detection
- Paper: in press, to be updated
Uses
This model is intended to be used for research and academic purposes only.
Direct Use
The model was trained & is intended for detection of AFB on Kinyoun-stained WSIs. It can be applied to tiles from new Kinyoun-stained WSIs without fine-tuning, with the caveats mentioned below.
Out-of-Scope Use
The model was trained exclusively on Kinyoun-stained WSIs and has not been tested on other AFB staining techniques such as Ziehl-Neelsen; it is unknown how well it might generalize to such data.
Bias, Risks, and Limitations
Since the model's training data was limited to a single AFB clinical laboratory at a single institution, it is likely to suffer from some domain shift on data from other laboratories and WSI scanners.
Recommendations
Fine-tuning on new data could likely help bridge the domain shift to new data.
For research use only!
How to Get Started with the Model
Training and inference code for this model is available here.
Training Details
Training Data
Training & testing datasets: https://huggingface.co/datasets/arup-ri/kinyoun_afb_50k
Training data consisted of approximately 50k 256x256 pixel tiles. Approximately 20% of these tiles have bounding-box-annotated AFB, while the remaining tiles were taken from AFB negative slides to increase the diversity of background debris and artifacts that the model should learn to recognize as not AFB. More details can be found in the paper.
Training Procedure
Preprocessing
Tiles were extracted from WSIs and resized to a consistent physical size (0.2878 microns per pixel). Standard image augmentations such as flips, rotatations, weak Gaussian blurring, and brightness adjustments were randomly applied to images during training.
Training Hyperparameters
- Training regime: See yaml configs and source code for a complete listing and explanation of hyperparameters.
Speeds, Sizes, Times
On an L40S GPU, a training run to generate this model can be done in about 1-1.5 hours, depending heavily on choice of early stopping and to a lesser extent on hyperparameters such as batch size, etc.
Evaluation
Testing Data, Factors & Metrics
Testing Data
Training & testing datasets: https://huggingface.co/datasets/arup-ri/kinyoun_afb_50k
Metrics
Object detection performance was evaluated with precision, recall, and F-score on a small held-out test set of tiles. Note that this test set was, like the training set, enriched for difficult tiles as described in more detail in the paper.
Prediction of WSI labels (AFB positive or negative) was done by computing predicted AFB density, i.e., number of predicted AFB object detections divided by the are of tiles seen by the model, and thresholding this density. See the paper for more details.
Results
The best object detection F-score achieved on our test set was 0.53, though this would be substantially higher on a test set of randomly sampled tiles instead of a set enriched for difficult tiles.
We quantified the WSI-level predictions with an ROC-like true negative vs false negative rate curve, and our our model achieved and AUC of 0.78 on our validation set of WSIs.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: We made use of Nvidia L40s and H100 for various training runs with on-premise GPU hardware
- Hours used: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications
Model Architecture and Objective
ConvNeXt backbone with FCOS object detection head. We also experimented with a ResNet50 backbone and Faster R-CNN object detecttion head, but we found moderately improved performance with the ConvNeXt + FCOS pair. Faster R-CNN also suffered from a nasty bug which FCOS allowed us to circumvent.
Compute Infrastructure
Hardware
On-prem L40S and H100 GPUs.
Software
Model architecture used torchvision implementations. See source code for further detail.
Citation
BibTeX:
@article{english_use_2025,
title = {Use of a convolutional neural network for direct detection of acid-fast bacilli from clinical specimens},
volume = {0},
url = {https://journals.asm.org/doi/10.1128/spectrum.00602-25},
doi = {10.1128/spectrum.00602-25},
number = {0},
urldate = {2025-06-25},
journal = {Microbiology Spectrum},
author = {English, Paul and Morrison, Muir J. and Mathison, Blaine and Enrico, Elizabeth and Shean, Ryan and O'Fallon, Brendan and Rupp, Deven and Knight, Katie and Rangel, Alexandra and Gilivary, Jeffrey and Vance, Amanda and Hatch, Haleina and Lin, Leo and Ng, David P. and Shakir, Salika M.},
month = jun,
year = {2025},
note = {Publisher: American Society for Microbiology},
pages = {e00602--25},
file = {Full Text PDF:/Users/paul.english/Zotero/storage/PBFKYY88/English et al. - 2025 - Use of a convolutional neural network for direct detection of acid-fast bacilli from clinical specim.pdf:application/pdf},
}