English
medical

This is an object detection model for finding Acid-fast bacteria (AFB) trained on tiles pulled from Kinyoun stained whole-slide images (WSIs).

Model Details

Model Description

Model Sources

Uses

This model is intended to be used for research and academic purposes only.

Direct Use

The model was trained & is intended for detection of AFB on Kinyoun-stained WSIs. It can be applied to tiles from new Kinyoun-stained WSIs without fine-tuning, with the caveats mentioned below.

Out-of-Scope Use

The model was trained exclusively on Kinyoun-stained WSIs and has not been tested on other AFB staining techniques such as Ziehl-Neelsen; it is unknown how well it might generalize to such data.

Bias, Risks, and Limitations

Since the model's training data was limited to a single AFB clinical laboratory at a single institution, it is likely to suffer from some domain shift on data from other laboratories and WSI scanners.

Recommendations

Fine-tuning on new data could likely help bridge the domain shift to new data.

For research use only!

How to Get Started with the Model

Training and inference code for this model is available here.

Training Details

Training Data

Training & testing datasets: https://huggingface.co/datasets/arup-ri/kinyoun_afb_50k

Training data consisted of approximately 50k 256x256 pixel tiles. Approximately 20% of these tiles have bounding-box-annotated AFB, while the remaining tiles were taken from AFB negative slides to increase the diversity of background debris and artifacts that the model should learn to recognize as not AFB. More details can be found in the paper.

Training Procedure

Preprocessing

Tiles were extracted from WSIs and resized to a consistent physical size (0.2878 microns per pixel). Standard image augmentations such as flips, rotatations, weak Gaussian blurring, and brightness adjustments were randomly applied to images during training.

Training Hyperparameters

  • Training regime: See yaml configs and source code for a complete listing and explanation of hyperparameters.

Speeds, Sizes, Times

On an L40S GPU, a training run to generate this model can be done in about 1-1.5 hours, depending heavily on choice of early stopping and to a lesser extent on hyperparameters such as batch size, etc.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Training & testing datasets: https://huggingface.co/datasets/arup-ri/kinyoun_afb_50k

Metrics

Object detection performance was evaluated with precision, recall, and F-score on a small held-out test set of tiles. Note that this test set was, like the training set, enriched for difficult tiles as described in more detail in the paper.

Prediction of WSI labels (AFB positive or negative) was done by computing predicted AFB density, i.e., number of predicted AFB object detections divided by the are of tiles seen by the model, and thresholding this density. See the paper for more details.

Results

The best object detection F-score achieved on our test set was 0.53, though this would be substantially higher on a test set of randomly sampled tiles instead of a set enriched for difficult tiles.

We quantified the WSI-level predictions with an ROC-like true negative vs false negative rate curve, and our our model achieved and AUC of 0.78 on our validation set of WSIs.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: We made use of Nvidia L40s and H100 for various training runs with on-premise GPU hardware
  • Hours used: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

Technical Specifications

Model Architecture and Objective

ConvNeXt backbone with FCOS object detection head. We also experimented with a ResNet50 backbone and Faster R-CNN object detecttion head, but we found moderately improved performance with the ConvNeXt + FCOS pair. Faster R-CNN also suffered from a nasty bug which FCOS allowed us to circumvent.

Compute Infrastructure

Hardware

On-prem L40S and H100 GPUs.

Software

Model architecture used torchvision implementations. See source code for further detail.

Citation

BibTeX:

@article{english_use_2025,
    title = {Use of a convolutional neural network for direct detection of acid-fast bacilli from clinical specimens},
    volume = {0},
    url = {https://journals.asm.org/doi/10.1128/spectrum.00602-25},
    doi = {10.1128/spectrum.00602-25},
    number = {0},
    urldate = {2025-06-25},
    journal = {Microbiology Spectrum},
    author = {English, Paul and Morrison, Muir J. and Mathison, Blaine and Enrico, Elizabeth and Shean, Ryan and O'Fallon, Brendan and Rupp, Deven and Knight, Katie and Rangel, Alexandra and Gilivary, Jeffrey and Vance, Amanda and Hatch, Haleina and Lin, Leo and Ng, David P. and Shakir, Salika M.},
    month = jun,
    year = {2025},
    note = {Publisher: American Society for Microbiology},
    pages = {e00602--25},
    file = {Full Text PDF:/Users/paul.english/Zotero/storage/PBFKYY88/English et al. - 2025 - Use of a convolutional neural network for direct detection of acid-fast bacilli from clinical specim.pdf:application/pdf},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train arup-ri/afb

Paper for arup-ri/afb