wound_detector / README.md
qymt's picture
Update README.md
6fbc75b verified
README — Wound infection classification models
Project summary
- Binary image classifier that labels wound images as clean (0) or infected (1).
- Trained and evaluated multiple pretrained backbones (EfficientNetV2, MobileNetV3, ConvNeXt‑Tiny) on the same curated dataset (70/15/15 split).
- This repo contains model weights, evaluation metrics, and example inference code.
Quick notes
- Labels are manual and non‑clinical. Models are for proof‑of‑concept research only — not for clinical use.
- Outputs were padded to length 1000 to match the Executorch runtime format used for mobile export. For normal PyTorch inference the first two logits correspond to the two classes.
Requirements
- Python 3.12 (recommended for reproducing environment)
- PyTorch (compatible with your hardware; tested with torch >=2.8,<3.0)
- torchvision
- numpy
- scikit-learn
- pillow
- (Optional) executorch / runtime exporters if you use the .pte mobile artifacts
Models & evaluation summary
- Label mapping: 0 = clean, 1 = infected
Metrics reported on the test set (per training environment/run):
EfficientNetV2 (new dataset)
- env6: Precision 0.826 | Recall 0.856 | F1 0.840 | TP 237 | FP 50 | TN 89 | FN 40
- env8: Precision 0.836 | Recall 0.866 | F1 0.851 | TP 240 | FP 47 | TN 92 | FN 37
- env7: Precision 0.814 | Recall 0.884 | F1 0.848 | TP 245 | FP 56 | TN 83 | FN 32
MobileNetV3 (new dataset)
- env6: Precision 0.869 | Recall 0.791 | F1 0.828 | TP 219 | FP 33 | TN 106 | FN 58
- env8: Precision 0.898 | Recall 0.697 | F1 0.785 | TP 193 | FP 22 | TN 117 | FN 84
- env7: Precision 0.849 | Recall 0.874 | F1 0.861 | TP 242 | FP 43 | TN 96 | FN 35
ConvNeXt‑Tiny (new dataset)
- env6: Precision 0.839 | Recall 0.863 | F1 0.851 | TP 239 | FP 46 | TN 93 | FN 38
Chosen / recommended checkpoints
- EfficientNetV2: env8 (balanced best F1 / ROC/PR behaviour)
- MobileNetV3: env7 (best tradeoff of F1 and latency for mobile)
- ConvNeXt‑Tiny: env6 (best calibration + strong ROC/PR and CPU efficiency)
Why these were chosen
- EfficientNetV2 env8: solid ROC/PR performance and high recall for infected cases.
- MobileNetV3 env7: highest infected-class F1 in our runs and good inference latency for mobile deployment.
- ConvNeXt‑Tiny env6: strongest calibration (reliability diagram), top PR/ROC equivalence and best average CPU usage — selected as the most feasible single model for deployment.
Repository contents
- models/
- efficientnetv2_env8.pte
- mobilenetv3_env7.pte
- convnext_tiny_env6.pte
- (optional) .pte mobile artifacts if exported for Executorch
- notebooks/ or examples/
- eval_metrics.ipynb — code to reproduce ROC/PR/reliability diagrams and confusion matrices
- inference_example.py — minimal inference script
- README.md (this file)
Notes on mobile export
- Models were exported to a mobile format (.pte) using torch.export → executorch lowering → XNNPackPartitioner. If you rely on mobile runtime artifacts, use the matching .pte file and follow Executorch runtime integration instructions.
- Outputs were padded to 1000-length vectors for compatibility with the runtime. Padding uses a large negative value (−100) to avoid stealing probability mass.
Calibration & decision thresholds
- Reliability diagrams show calibration issues; raw probabilities are not fully trustworthy.
- Before using probabilities for clinical decisions, apply post‑hoc calibration (temperature scaling or isotonic regression) and re‑compute reliability diagrams and decision thresholds.
- For single-number model selection we reported per-class precision/recall/F1. For deployment choose thresholds based on calibrated probabilities and the clinical tradeoff between false positives and false negatives.
Data & labeling
- Dataset: curated from public sources (Kaggle wound images) and manually labeled into clean vs infected.
- Labels are non‑clinical and may contain noise. Use caution; validate with clinical experts for any real-world deployment.