Face Mask Detection

trained 3 object detection models to detect face masks on the kaggle face mask detection dataset. compared YOLOv11, Faster R-CNN and RF-DETR to see which one works best

quick start

pip install -r requirements.txt

# run inference on an image
python inference.py --image your_image.jpg

# run on folder of images
python inference.py --folder ./test_images/ --save

the default model is faster r-cnn since it got the best accuracy. if you want to use yolo instead

python inference.py --image test.jpg --model yolo

what i tried

the main goal was to find the best model for face mask detection so i trained multiple architectures and ran experiments to see what actually helps

i compared 3 detection architectures - YOLOv11 (all 5 sizes from nano to xlarge), Faster R-CNN with ResNet50-FPN backbone, and RF-DETR which is a newer transformer based detector. for each architecture i trained on the same augmented dataset to keep things fair. the augmentation pipeline uses albumentations with horizontal flip, color jitter, random scaling and gaussian blur. i augmented the training set 3x (original plus 2 augmented versions) while keeping validation and test sets clean

i also ran two additional experiments. first was image preprocessing where i applied CLAHE contrast enhancement and gray world white balance before augmentation, thinking it would help with lighting variations in the images. second was test time augmentation (TTA) where i ran inference multiple times with different transforms (flips, scales, rotations) and merged the predictions. tested TTA levels from 1x up to 8x transforms. both experiments gave surprising results which ill explain below

results

model	mAP@0.5	F1	speed
Faster R-CNN	87.2%	0.894	26ms
YOLOv11m	80.7%	0.779	39ms
YOLOv11x	76.6%	0.771	45ms
RF-DETR	71.1%	0.770	673ms

faster r-cnn won with 87% mAP. yolo variants are faster but slightly less accurate. rf-detr is way too slow for real time use

per class breakdown for faster r-cnn

with_mask: 94% mAP (easy, most common)
without_mask: 94% mAP (also good)
mask_weared_incorrect: 74% mAP (hardest, only 3% of data)

things that didnt work

being honest here about what failed because i think this shows the experimentation process

preprocessing hurt performance

i tried applying CLAHE and white balance before augmentation thinking it would help normalize lighting conditions across images. ran the full training pipeline again with preprocessed data (called this v3). turns out it made things worse - faster r-cnn dropped 2.5% mAP (from 87.2% to 84.6%), yolo dropped about 1.2%. the only one barely affected was rf-detr. my guess is the preprocessing removed some texture information that was actually useful for detection, or the augmentation was already handling lighting variations well enough

TTA was counterproductive

test time augmentation is supposed to improve accuracy by running inference multiple times with different transforms and merging predictions. i tested 8 levels of TTA (adding horizontal flip, vertical flip, scaling, rotation). instead of helping it degraded all models significantly - faster r-cnn went from 87% down to 30% at 8x TTA. even at 3x TTA it dropped to 56%. this was unexpected, maybe because face detection is sensitive to orientation or the NMS merging wasnt tuned properly for this task

limitations

class imbalance is a problem. mask_weared_incorrect only has 3% of the data so the model struggles with it
rf-detr is too slow for realtime (673ms per image)
didnt try focal loss or class weighting to handle imbalance
only tested on this specific dataset

next steps

if i had more time id try

focal loss to handle class imbalance
more data collection for the rare class
knowledge distillation to get a smaller faster model
test on different datasets to check generalization

project structure

├── inference.py              # run inference on images
├── requirements.txt          # dependencies
├── data.ipynb               # dataset prep
├── augment_data.ipynb       # augmentation pipeline
├── training-*.ipynb         # training notebooks for each model
├── tta_experiment.ipynb     # TTA experiments
├── benchmark_v3.ipynb       # final comparison
├── dataset/                 # train/val/test splits
└── runs/                    # trained model weights

model weights

weights are available on huggingface

https://huggingface.co/ZhafranR/face-mask-detection-verihub

download and put them in the runs folder or specify path with --weights flag

environment

python 3.8+
pytorch 2.0+
GPU optional but recommended (tested on CUDA)

if running on CPU just know inference will be slower especially for rf-detr

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support