license: openrail++
language:
- en
base_model:
- RomDev2/yolox-tiny
tags:
- shapes
- trashbins
pipeline_tag: image-classification
TSAP β Trash/Sustainability Assessment Platform
A deep learning computer vision system for automatic detection and classification of waste bins in images. The system localizes bins using YOLOv4-tiny and classifies their properties using a multi-task ResNet34 model.
Overview
TSAP runs two ML pipelines:
| Pipeline | Architecture | Input Size | Output |
|---|---|---|---|
| Detection | YOLOv4-tiny | 608Γ608 | Bounding boxes |
| Classification | ResNet34 (multi-task) | 200Γ200 | Fullness + Shape |
Classification labels:
- Fullness (5 classes):
Closed,Empty,Half,Full,Overflow,Open - Shape (8 classes):
Star,Circle,Arrow,Centric,Triangle,Square,Chevron,Lightning Bolt
Directory Structure
tsap/
βββ classification/
β βββ model.py # TSAPMultiClassification (ResNet34 dual-head)
β βββ dataset.py # Dataset loader with Albumentations augmentation
β βββ train.py # Training engine (mixed precision, wandb)
βββ detection/
β βββ config.py # YOLOv4-tiny hyperparameters and anchors
β βββ dataset.py # YOLO dataset loader with mosaic augmentation
β βββ train_yolo.py # YOLO training engine with COCO evaluation
βββ common/
β βββ loss.py # FocalLoss, Yolo_loss, Yolo_loss_general
β βββ meters.py # AverageMeter, ProgressMeter
βββ utils/
β βββ utils.py # Train/test splits, confusion matrices, visualization
β βββ detection_utils.py # IoU / GIoU / DIoU / CIoU calculations
β βββ annot_classification_map.py # Image-to-label mapping
β βββ annot_darknet2torch.py # Darknet annotation format converter
β βββ cvat_annotation_converter.py # CVAT XML β YOLO format converter
βββ models/
β βββ classification_model/ # Trained classification weights + ONNX
β βββ detection_model/ # YOLOv4-tiny weights + ONNX
β βββ training/ # Standalone training scripts for individual heads
βββ main.py # Usage examples
βββ test.py # Inference script
βββ environment.yml # Conda environment
Setup
conda env create -f environment.yml
conda activate <env-name>
Key dependencies: PyTorch 1.7.1 (CUDA 10.2), TorchVision 0.8.2, OpenCV 3.4.2, Albumentations 0.5.2, ONNX 1.8.0 + ONNXRuntime 1.5.2, Weights & Biases.
Data Preparation
Annotations are created in CVAT and exported as XML. The converter handles bounding box extraction and label generation:
python utils/cvat_annotation_converter.py
This converts CVAT XML annotations to YOLO format and crops classification samples with their labels.
Training
Classification
from classification.train import Engine
from classification.model import TSAPMultiClassification
model = TSAPMultiClassification(pretrained=True)
engine = Engine(model, train_loader, val_loader, device_ids=[0,1])
engine.train(epochs=200)
Hyperparameters: batch size 128, SGD with momentum 0.9, OneCycleLR (lr=3e-3), image size 200Γ200.
Detection
from detection.train_yolo import Engine_YOLO
engine = Engine_YOLO(cfg_path, weights_path, train_loader, val_loader)
engine.train(max_batches=30000)
Hyperparameters: batch size 64 (16 subdivisions), learning rate 0.001 with burn-in, image size 608Γ608.
Inference
python test.py
The inference pipeline:
- Resize to 200Γ200 (classification) or 608Γ608 (detection)
- Normalize using dataset mean/std
- Run forward pass
- Apply sigmoid + argmax; filter predictions below threshold 0.3
Models
Pre-trained weights are in models/:
| File | Description |
|---|---|
classification_model/tsap_bin_classifer.pt |
Multi-task fullness + shape classifier |
classification_model/shape_classifier_resnet34.pt |
Shape-only classifier |
classification_model/TSAP_classifier_dynamic.onnx |
ONNX (dynamic batch) |
detection_model/tsap-detection.weights |
Darknet format detection weights |
detection_model/tsap-detection.cfg |
Darknet config |
detection_model/TSAP_detection_dynamic.onnx |
ONNX (dynamic batch) |
Architecture
Classification β TSAPMultiClassification
Input (200Γ200)
βββ ResNet34 backbone (shared)
βββ Conv head β Linear(128) β Fullness logits (5)
βββ Conv head β Linear(128) β Shape logits (8)
Training uses BCEWithLogitsLoss for both heads simultaneously.
Detection β YOLOv4-tiny
- Input: 608Γ608, 6 object classes
- 2 detection scales (32Γ and 16Γ stride), 3 anchors each
- Losses: XY, WH, objectness, class (IoU-based anchor assignment)
Experiment Tracking
Training integrates with Weights & Biases for metric logging and visualization. Set your API key before training:
wandb login