README — Wound infection classification models


Project summary


- Binary image classifier that labels wound images as clean (0) or infected (1).

- Trained and evaluated multiple pretrained backbones (EfficientNetV2, MobileNetV3, ConvNeXt‑Tiny) on the same curated dataset (70/15/15 split).

- This repo contains model weights, evaluation metrics, and example inference code.

Quick notes


- Labels are manual and non‑clinical. Models are for proof‑of‑concept research only — not for clinical use.

- Outputs were padded to length 1000 to match the Executorch runtime format used for mobile export. For normal PyTorch inference the first two logits correspond to the two classes.

Requirements


- Python 3.12 (recommended for reproducing environment)

- PyTorch (compatible with your hardware; tested with torch >=2.8,<3.0)

- torchvision

- numpy

- scikit-learn

- pillow

- (Optional) executorch / runtime exporters if you use the .pte mobile artifacts

Models & evaluation summary


- Label mapping: 0 = clean, 1 = infected

Metrics reported on the test set (per training environment/run):

EfficientNetV2 (new dataset)


- env6: Precision 0.826 | Recall 0.856 | F1 0.840 | TP 237 | FP 50 | TN 89 | FN 40

- env8: Precision 0.836 | Recall 0.866 | F1 0.851 | TP 240 | FP 47 | TN 92 | FN 37

- env7: Precision 0.814 | Recall 0.884 | F1 0.848 | TP 245 | FP 56 | TN 83 | FN 32

MobileNetV3 (new dataset)


- env6: Precision 0.869 | Recall 0.791 | F1 0.828 | TP 219 | FP 33 | TN 106 | FN 58

- env8: Precision 0.898 | Recall 0.697 | F1 0.785 | TP 193 | FP 22 | TN 117 | FN 84

- env7: Precision 0.849 | Recall 0.874 | F1 0.861 | TP 242 | FP 43 | TN 96 | FN 35

ConvNeXt‑Tiny (new dataset)


- env6: Precision 0.839 | Recall 0.863 | F1 0.851 | TP 239 | FP 46 | TN 93 | FN 38

Chosen / recommended checkpoints


- EfficientNetV2: env8 (balanced best F1 / ROC/PR behaviour)

- MobileNetV3: env7 (best tradeoff of F1 and latency for mobile)

- ConvNeXt‑Tiny: env6 (best calibration + strong ROC/PR and CPU efficiency)

Why these were chosen


- EfficientNetV2 env8: solid ROC/PR performance and high recall for infected cases.

- MobileNetV3 env7: highest infected-class F1 in our runs and good inference latency for mobile deployment.

- ConvNeXt‑Tiny env6: strongest calibration (reliability diagram), top PR/ROC equivalence and best average CPU usage — selected as the most feasible single model for deployment.

Repository contents


- models/
	- efficientnetv2_env8.pte

	- mobilenetv3_env7.pte

	- convnext_tiny_env6.pte

	- (optional) .pte mobile artifacts if exported for Executorch


- notebooks/ or examples/
	- eval_metrics.ipynb — code to reproduce ROC/PR/reliability diagrams and confusion matrices

	- inference_example.py — minimal inference script


- README.md (this file)

Notes on mobile export


- Models were exported to a mobile format (.pte) using torch.export → executorch lowering → XNNPackPartitioner. If you rely on mobile runtime artifacts, use the matching .pte file and follow Executorch runtime integration instructions.

- Outputs were padded to 1000-length vectors for compatibility with the runtime. Padding uses a large negative value (−100) to avoid stealing probability mass.

Calibration & decision thresholds


- Reliability diagrams show calibration issues; raw probabilities are not fully trustworthy.

- Before using probabilities for clinical decisions, apply post‑hoc calibration (temperature scaling or isotonic regression) and re‑compute reliability diagrams and decision thresholds.

- For single-number model selection we reported per-class precision/recall/F1. For deployment choose thresholds based on calibrated probabilities and the clinical tradeoff between false positives and false negatives.

Data & labeling


- Dataset: curated from public sources (Kaggle wound images) and manually labeled into clean vs infected.

- Labels are non‑clinical and may contain noise. Use caution; validate with clinical experts for any real-world deployment.