README — Wound infection classification models Project summary - Binary image classifier that labels wound images as clean (0) or infected (1). - Trained and evaluated multiple pretrained backbones (EfficientNetV2, MobileNetV3, ConvNeXt‑Tiny) on the same curated dataset (70/15/15 split). - This repo contains model weights, evaluation metrics, and example inference code. Quick notes - Labels are manual and non‑clinical. Models are for proof‑of‑concept research only — not for clinical use. - Outputs were padded to length 1000 to match the Executorch runtime format used for mobile export. For normal PyTorch inference the first two logits correspond to the two classes. Requirements - Python 3.12 (recommended for reproducing environment) - PyTorch (compatible with your hardware; tested with torch >=2.8,<3.0) - torchvision - numpy - scikit-learn - pillow - (Optional) executorch / runtime exporters if you use the .pte mobile artifacts Models & evaluation summary - Label mapping: 0 = clean, 1 = infected Metrics reported on the test set (per training environment/run): EfficientNetV2 (new dataset) - env6: Precision 0.826 | Recall 0.856 | F1 0.840 | TP 237 | FP 50 | TN 89 | FN 40 - env8: Precision 0.836 | Recall 0.866 | F1 0.851 | TP 240 | FP 47 | TN 92 | FN 37 - env7: Precision 0.814 | Recall 0.884 | F1 0.848 | TP 245 | FP 56 | TN 83 | FN 32 MobileNetV3 (new dataset) - env6: Precision 0.869 | Recall 0.791 | F1 0.828 | TP 219 | FP 33 | TN 106 | FN 58 - env8: Precision 0.898 | Recall 0.697 | F1 0.785 | TP 193 | FP 22 | TN 117 | FN 84 - env7: Precision 0.849 | Recall 0.874 | F1 0.861 | TP 242 | FP 43 | TN 96 | FN 35 ConvNeXt‑Tiny (new dataset) - env6: Precision 0.839 | Recall 0.863 | F1 0.851 | TP 239 | FP 46 | TN 93 | FN 38 Chosen / recommended checkpoints - EfficientNetV2: env8 (balanced best F1 / ROC/PR behaviour) - MobileNetV3: env7 (best tradeoff of F1 and latency for mobile) - ConvNeXt‑Tiny: env6 (best calibration + strong ROC/PR and CPU efficiency) Why these were chosen - EfficientNetV2 env8: solid ROC/PR performance and high recall for infected cases. - MobileNetV3 env7: highest infected-class F1 in our runs and good inference latency for mobile deployment. - ConvNeXt‑Tiny env6: strongest calibration (reliability diagram), top PR/ROC equivalence and best average CPU usage — selected as the most feasible single model for deployment. Repository contents - models/ - efficientnetv2_env8.pte - mobilenetv3_env7.pte - convnext_tiny_env6.pte - (optional) .pte mobile artifacts if exported for Executorch - notebooks/ or examples/ - eval_metrics.ipynb — code to reproduce ROC/PR/reliability diagrams and confusion matrices - inference_example.py — minimal inference script - README.md (this file) Notes on mobile export - Models were exported to a mobile format (.pte) using torch.export → executorch lowering → XNNPackPartitioner. If you rely on mobile runtime artifacts, use the matching .pte file and follow Executorch runtime integration instructions. - Outputs were padded to 1000-length vectors for compatibility with the runtime. Padding uses a large negative value (−100) to avoid stealing probability mass. Calibration & decision thresholds - Reliability diagrams show calibration issues; raw probabilities are not fully trustworthy. - Before using probabilities for clinical decisions, apply post‑hoc calibration (temperature scaling or isotonic regression) and re‑compute reliability diagrams and decision thresholds. - For single-number model selection we reported per-class precision/recall/F1. For deployment choose thresholds based on calibrated probabilities and the clinical tradeoff between false positives and false negatives. Data & labeling - Dataset: curated from public sources (Kaggle wound images) and manually labeled into clean vs infected. - Labels are non‑clinical and may contain noise. Use caution; validate with clinical experts for any real-world deployment.