PlasmoSENet β Malaria Parasite Detection from Blood Smear Microscopy
PlasmoSENet is a custom CNN architecture designed from first principles for automated malaria parasite detection in thin blood smear microscopy images. It is part of the LocalMedScan project for offline, privacy-first medical image screening.
Model Details
| Property | Value |
|---|---|
| Architecture | PlasmoSENet (custom ResNet-style with SE attention) |
| Parameters | 10,632,306 (~10.6M) |
| Model Size | 40.7 MB |
| Input | RGB 224x224 blood smear cell images |
| Output | Binary classification: Parasitized vs Uninfected |
| Training | From scratch (no pretrained weights) |
| Framework | PyTorch |
| License | MIT |
Architecture Highlights
PlasmoSENet integrates four domain-specific design elements:
Multi-Scale Stem β Parallel 3x3, 5x5, and 7x7 convolutional branches whose receptive fields are calibrated to the physical dimensions of Plasmodium erythrocytic stages:
- Ring forms (~2-3 um) captured by 3x3 branch
- Trophozoites (~4-6 um) captured by 5x5 branch
- Schizonts (~6-8 um) captured by 7x7 branch
Squeeze-and-Excitation (SE) Channel Attention β In every residual block, learns stain-aware feature weightings that amplify diagnostically relevant purple/blue parasite signals over pink/red erythrocyte background.
Stochastic Depth (DropPath) β Linearly increasing drop rates (0.0 to 0.2) across 11 residual blocks, providing implicit ensemble regularization over O(2^11) = 2048 subnetworks.
Kaiming Initialization β Fan-out mode for stable from-scratch convergence without ImageNet pretrained weights.
Architecture Table
Input: (B, 3, 224, 224)
MultiScaleStem(3 -> 64) -> (B, 64, 112, 112)
Stage 1: 2x SEResBlock(64 -> 64) -> (B, 64, 112, 112)
Stage 2: 3x SEResBlock(64 -> 128) -> (B, 128, 56, 56)
Stage 3: 4x SEResBlock(128 -> 256) -> (B, 256, 28, 28)
Stage 4: 2x SEResBlock(256 -> 384) -> (B, 384, 14, 14)
Head Conv: Conv1x1 + BN + ReLU -> (B, 384, 14, 14)
Classifier: GAP -> Dropout(0.3) -> Linear(384, 2)
Performance
Test Results (NIH Malaria Dataset, 2,757 test images)
| Metric | Without TTA | With 5-View TTA |
|---|---|---|
| Accuracy | 96.66% | 96.55% |
| Sensitivity | β | 96.08% |
| Specificity | β | 97.01% |
| Precision | β | 96.87% |
| F1 Score | β | 96.47% |
Confusion Matrix (with TTA)
Predicted + Predicted -
Actual + 1,298 53
Actual - 42 1,364
Comparison with MobileNetV2 Baseline
| Model | Params | Size | Accuracy | Sensitivity | Specificity | F1 |
|---|---|---|---|---|---|---|
| MobileNetV2 (fine-tuned) | 3.4M | 14 MB | 97.97% | 97.41% | 98.51% | 97.95% |
| PlasmoSENet (from scratch) | 10.6M | 41 MB | 96.55% | 96.08% | 97.01% | 96.47% |
MobileNetV2 with transfer learning outperforms PlasmoSENet by 1.42 percentage points, consistent with the well-established advantage of pretrained features on moderately-sized medical datasets. However, PlasmoSENet demonstrates that clinically useful accuracy (>96%) is achievable without any pretrained weights β significant for regulated medical device contexts.
Training Details
| Hyperparameter | Value |
|---|---|
| Optimizer | AdamW (weight_decay=0.05) |
| Learning Rate | Linear warmup 10 epochs (1e-5 to 1e-3) + cosine annealing |
| Batch Size | 64 |
| Label Smoothing | 0.1 |
| Mixup | alpha=0.2, 50% probability |
| CutMix | alpha=1.0, 50% probability |
| Stochastic Depth | 0.0 to 0.2 linearly |
| Dropout | 0.3 |
| Gradient Clipping | max_norm=1.0 |
| Mixed Precision | AMP (float16/float32) |
| Early Stopping | patience=20 (stopped at epoch 60) |
| Best Validation | 96.99% (epoch 40) |
| Hardware | NVIDIA RTX 3070 (8 GB), ~80s/epoch |
Data Augmentation
- RandomResizedCrop(224, scale=0.7-1.0)
- RandomHorizontalFlip + RandomVerticalFlip
- RandomRotation(90 degrees) β blood cells are rotationally invariant
- RandomAffine(translate=0.1, scale=0.9-1.1)
- ColorJitter(0.4, 0.4, 0.3, 0.05)
- GaussianBlur(kernel=3, sigma=0.1-2.0)
- RandomErasing(p=0.25, scale=0.02-0.2)
Critical Bug Fix: TransformSubset
We identified and corrected a subtle bug in standard PyTorch training pipelines using random_split. Because Subset objects share the parent dataset reference, setting transforms on one subset overwrites transforms for all subsets. We implement a TransformSubset wrapper that maintains independent transforms per partition β without this fix, training runs with no augmentation, potentially inflating published results.
Dataset
NIH Malaria Cell Images Dataset (Rajaraman et al., 2018)
- 27,558 annotated cell images from Giemsa-stained thin blood smears
- 13,779 Parasitized + 13,779 Uninfected (perfectly balanced)
- Split: 80% train (22,046) / 10% val (2,755) / 10% test (2,757)
- Fixed seed (42) for reproducibility
Usage
import torch
from core.plasmosenet import PlasmoSENet
# Load model
model = PlasmoSENet(num_classes=2, drop_path_rate=0.0) # No DropPath at inference
state_dict = torch.load("model.pth", map_location="cpu", weights_only=True)
model.load_state_dict(state_dict)
model.eval()
# Inference
from torchvision import transforms
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
from PIL import Image
img = Image.open("blood_smear_cell.png").convert("RGB")
tensor = transform(img).unsqueeze(0)
with torch.inference_mode():
probs = torch.softmax(model(tensor), dim=1)
# probs[0][0] = Parasitized probability
# probs[0][1] = Uninfected probability
Key Achievements
- Novel domain-specific architecture: First CNN combining multi-scale stem, per-block SE attention, and stochastic depth specifically for malaria detection
- 96.55% accuracy from scratch: Clinically useful without any pretrained weights
- TransformSubset bug fix: Identified and corrected a common PyTorch training pipeline bug
- Comprehensive regularization: Demonstrated that heavy regularization (Mixup, CutMix, DropPath, label smoothing) enables training a 10.6M parameter model on just 27K images without overfitting
- GradCAM compatible: Head conv layer serves as interpretability target for clinical validation
- Unique architecture name: "PlasmoSENet" verified unique via Google Scholar (zero prior results)
Limitations
- Trained on single dataset (NIH, one geographic site)
- Binary classification only (no species/stage identification)
- Requires pre-segmented cell images
- Did not surpass transfer learning baseline on this dataset size
Citation
If you use this model, please cite:
@misc{plasmosenet2026,
title={PlasmoSENet: A Multi-Scale Squeeze-and-Excitation Residual Network for Malaria Parasite Detection},
author={Svetozar Technologies},
year={2026},
url={https://huggingface.co/Svetozar1993/LocalMedScan-malaria-plasmosenet}
}
Links
- GitHub: Svetozar-Technologies/LocalMedScan
- MobileNetV2 Model: Svetozar1993/LocalMedScan-malaria-mobilenetv2
- Dataset: NIH Malaria Cell Images