WasteManagement / README.md
Sharath33's picture
Update README.md
e6c1c9c verified
metadata
license: openrail++
language:
  - en
base_model:
  - RomDev2/yolox-tiny
tags:
  - shapes
  - trashbins
pipeline_tag: image-classification

TSAP β€” Trash/Sustainability Assessment Platform

A deep learning computer vision system for automatic detection and classification of waste bins in images. The system localizes bins using YOLOv4-tiny and classifies their properties using a multi-task ResNet34 model.

Overview

TSAP runs two ML pipelines:

Pipeline Architecture Input Size Output
Detection YOLOv4-tiny 608Γ—608 Bounding boxes
Classification ResNet34 (multi-task) 200Γ—200 Fullness + Shape

Classification labels:

  • Fullness (5 classes): Closed, Empty, Half, Full, Overflow, Open
  • Shape (8 classes): Star, Circle, Arrow, Centric, Triangle, Square, Chevron, Lightning Bolt

Directory Structure

tsap/
β”œβ”€β”€ classification/
β”‚   β”œβ”€β”€ model.py                        # TSAPMultiClassification (ResNet34 dual-head)
β”‚   β”œβ”€β”€ dataset.py                      # Dataset loader with Albumentations augmentation
β”‚   └── train.py                        # Training engine (mixed precision, wandb)
β”œβ”€β”€ detection/
β”‚   β”œβ”€β”€ config.py                       # YOLOv4-tiny hyperparameters and anchors
β”‚   β”œβ”€β”€ dataset.py                      # YOLO dataset loader with mosaic augmentation
β”‚   └── train_yolo.py                   # YOLO training engine with COCO evaluation
β”œβ”€β”€ common/
β”‚   β”œβ”€β”€ loss.py                         # FocalLoss, Yolo_loss, Yolo_loss_general
β”‚   └── meters.py                       # AverageMeter, ProgressMeter
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ utils.py                        # Train/test splits, confusion matrices, visualization
β”‚   β”œβ”€β”€ detection_utils.py              # IoU / GIoU / DIoU / CIoU calculations
β”‚   β”œβ”€β”€ annot_classification_map.py     # Image-to-label mapping
β”‚   β”œβ”€β”€ annot_darknet2torch.py          # Darknet annotation format converter
β”‚   └── cvat_annotation_converter.py   # CVAT XML β†’ YOLO format converter
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ classification_model/           # Trained classification weights + ONNX
β”‚   β”œβ”€β”€ detection_model/                # YOLOv4-tiny weights + ONNX
β”‚   └── training/                       # Standalone training scripts for individual heads
β”œβ”€β”€ main.py                             # Usage examples
β”œβ”€β”€ test.py                             # Inference script
└── environment.yml                     # Conda environment

Setup

conda env create -f environment.yml
conda activate <env-name>

Key dependencies: PyTorch 1.7.1 (CUDA 10.2), TorchVision 0.8.2, OpenCV 3.4.2, Albumentations 0.5.2, ONNX 1.8.0 + ONNXRuntime 1.5.2, Weights & Biases.

Data Preparation

Annotations are created in CVAT and exported as XML. The converter handles bounding box extraction and label generation:

python utils/cvat_annotation_converter.py

This converts CVAT XML annotations to YOLO format and crops classification samples with their labels.

Training

Classification

from classification.train import Engine
from classification.model import TSAPMultiClassification

model = TSAPMultiClassification(pretrained=True)
engine = Engine(model, train_loader, val_loader, device_ids=[0,1])
engine.train(epochs=200)

Hyperparameters: batch size 128, SGD with momentum 0.9, OneCycleLR (lr=3e-3), image size 200Γ—200.

Detection

from detection.train_yolo import Engine_YOLO

engine = Engine_YOLO(cfg_path, weights_path, train_loader, val_loader)
engine.train(max_batches=30000)

Hyperparameters: batch size 64 (16 subdivisions), learning rate 0.001 with burn-in, image size 608Γ—608.

Inference

python test.py

The inference pipeline:

  1. Resize to 200Γ—200 (classification) or 608Γ—608 (detection)
  2. Normalize using dataset mean/std
  3. Run forward pass
  4. Apply sigmoid + argmax; filter predictions below threshold 0.3

Models

Pre-trained weights are in models/:

File Description
classification_model/tsap_bin_classifer.pt Multi-task fullness + shape classifier
classification_model/shape_classifier_resnet34.pt Shape-only classifier
classification_model/TSAP_classifier_dynamic.onnx ONNX (dynamic batch)
detection_model/tsap-detection.weights Darknet format detection weights
detection_model/tsap-detection.cfg Darknet config
detection_model/TSAP_detection_dynamic.onnx ONNX (dynamic batch)

Architecture

Classification β€” TSAPMultiClassification

Input (200Γ—200)
    └── ResNet34 backbone (shared)
         β”œβ”€β”€ Conv head β†’ Linear(128) β†’ Fullness logits (5)
         └── Conv head β†’ Linear(128) β†’ Shape logits (8)

Training uses BCEWithLogitsLoss for both heads simultaneously.

Detection β€” YOLOv4-tiny

  • Input: 608Γ—608, 6 object classes
  • 2 detection scales (32Γ— and 16Γ— stride), 3 anchors each
  • Losses: XY, WH, objectness, class (IoU-based anchor assignment)

Experiment Tracking

Training integrates with Weights & Biases for metric logging and visualization. Set your API key before training:

wandb login