VOC Semantic Segmentation โ EfficientSegNet (MobileNetV3-Large + LR-ASPP)
Semantic segmentation model trained on Pascal VOC 2012.
Architecture: MobileNetV3-Large + LR-ASPP (pretrained COCO backbone).
Optimised with Optuna TPE-sampler HPO.
Metrics
| Metric | Value |
|---|---|
| Macro DICE | 0.7645 |
| FLOPs/image | 7.3561e+08 |
| DICE / GFLOPs | 1.0392 |
| Parameters | 3.22M |
Best Hyperparameters (from Optuna HPO)
| Hyperparameter | Value |
|---|---|
| โ | N/A (direct training) |
Training Details
- Dataset: Pascal VOC 2012 (train split, 80/20 train/val)
- Image size: 300ร300
- Loss: Combined CrossEntropy + Dice Loss
- Optimizer: AdamW + CosineAnnealingLR
- AMP: Mixed precision (PyTorch 2.x)
- Augmentation: HFlip, Rotation, Gaussian noise / blur / salt-pepper / brightness
VOC Classes
background, aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor
Usage
import torch
from model import EfficientSegNet
from config import Config
model = EfficientSegNet(pretrained=False).to(Config.DEVICE)
ckpt = torch.load("best_model.pth", map_location=Config.DEVICE, weights_only=True)
model.load_state_dict(ckpt["model_state"])
model.eval()
Trained: 2026-03-07 19:25