| --- |
| license: openrail++ |
| language: |
| - en |
| base_model: |
| - RomDev2/yolox-tiny |
| tags: |
| - shapes |
| - trashbins |
| pipeline_tag: image-classification |
| --- |
| # TSAP β Trash/Sustainability Assessment Platform |
|
|
| A deep learning computer vision system for automatic detection and classification of waste bins in images. The system localizes bins using YOLOv4-tiny and classifies their properties using a multi-task ResNet34 model. |
|
|
| ## Overview |
|
|
| TSAP runs two ML pipelines: |
|
|
| | Pipeline | Architecture | Input Size | Output | |
| |---|---|---|---| |
| | Detection | YOLOv4-tiny | 608Γ608 | Bounding boxes | |
| | Classification | ResNet34 (multi-task) | 200Γ200 | Fullness + Shape | |
|
|
| **Classification labels:** |
|
|
| - **Fullness** (5 classes): `Closed`, `Empty`, `Half`, `Full`, `Overflow`, `Open` |
| - **Shape** (8 classes): `Star`, `Circle`, `Arrow`, `Centric`, `Triangle`, `Square`, `Chevron`, `Lightning Bolt` |
|
|
| ## Directory Structure |
|
|
| ``` |
| tsap/ |
| βββ classification/ |
| β βββ model.py # TSAPMultiClassification (ResNet34 dual-head) |
| β βββ dataset.py # Dataset loader with Albumentations augmentation |
| β βββ train.py # Training engine (mixed precision, wandb) |
| βββ detection/ |
| β βββ config.py # YOLOv4-tiny hyperparameters and anchors |
| β βββ dataset.py # YOLO dataset loader with mosaic augmentation |
| β βββ train_yolo.py # YOLO training engine with COCO evaluation |
| βββ common/ |
| β βββ loss.py # FocalLoss, Yolo_loss, Yolo_loss_general |
| β βββ meters.py # AverageMeter, ProgressMeter |
| βββ utils/ |
| β βββ utils.py # Train/test splits, confusion matrices, visualization |
| β βββ detection_utils.py # IoU / GIoU / DIoU / CIoU calculations |
| β βββ annot_classification_map.py # Image-to-label mapping |
| β βββ annot_darknet2torch.py # Darknet annotation format converter |
| β βββ cvat_annotation_converter.py # CVAT XML β YOLO format converter |
| βββ models/ |
| β βββ classification_model/ # Trained classification weights + ONNX |
| β βββ detection_model/ # YOLOv4-tiny weights + ONNX |
| β βββ training/ # Standalone training scripts for individual heads |
| βββ main.py # Usage examples |
| βββ test.py # Inference script |
| βββ environment.yml # Conda environment |
| ``` |
|
|
| ## Setup |
|
|
| ```bash |
| conda env create -f environment.yml |
| conda activate <env-name> |
| ``` |
|
|
| Key dependencies: PyTorch 1.7.1 (CUDA 10.2), TorchVision 0.8.2, OpenCV 3.4.2, Albumentations 0.5.2, ONNX 1.8.0 + ONNXRuntime 1.5.2, Weights & Biases. |
|
|
| ## Data Preparation |
|
|
| Annotations are created in CVAT and exported as XML. The converter handles bounding box extraction and label generation: |
|
|
| ```bash |
| python utils/cvat_annotation_converter.py |
| ``` |
|
|
| This converts CVAT XML annotations to YOLO format and crops classification samples with their labels. |
|
|
| ## Training |
|
|
| ### Classification |
|
|
| ```python |
| from classification.train import Engine |
| from classification.model import TSAPMultiClassification |
| |
| model = TSAPMultiClassification(pretrained=True) |
| engine = Engine(model, train_loader, val_loader, device_ids=[0,1]) |
| engine.train(epochs=200) |
| ``` |
|
|
| Hyperparameters: batch size 128, SGD with momentum 0.9, OneCycleLR (lr=3e-3), image size 200Γ200. |
|
|
| ### Detection |
|
|
| ```python |
| from detection.train_yolo import Engine_YOLO |
| |
| engine = Engine_YOLO(cfg_path, weights_path, train_loader, val_loader) |
| engine.train(max_batches=30000) |
| ``` |
|
|
| Hyperparameters: batch size 64 (16 subdivisions), learning rate 0.001 with burn-in, image size 608Γ608. |
|
|
| ## Inference |
|
|
| ```python |
| python test.py |
| ``` |
|
|
| The inference pipeline: |
| 1. Resize to 200Γ200 (classification) or 608Γ608 (detection) |
| 2. Normalize using dataset mean/std |
| 3. Run forward pass |
| 4. Apply sigmoid + argmax; filter predictions below threshold 0.3 |
|
|
| ## Models |
|
|
| Pre-trained weights are in `models/`: |
|
|
| | File | Description | |
| |---|---| |
| | `classification_model/tsap_bin_classifer.pt` | Multi-task fullness + shape classifier | |
| | `classification_model/shape_classifier_resnet34.pt` | Shape-only classifier | |
| | `classification_model/TSAP_classifier_dynamic.onnx` | ONNX (dynamic batch) | |
| | `detection_model/tsap-detection.weights` | Darknet format detection weights | |
| | `detection_model/tsap-detection.cfg` | Darknet config | |
| | `detection_model/TSAP_detection_dynamic.onnx` | ONNX (dynamic batch) | |
|
|
| ## Architecture |
|
|
| ### Classification β `TSAPMultiClassification` |
|
|
| ``` |
| Input (200Γ200) |
| βββ ResNet34 backbone (shared) |
| βββ Conv head β Linear(128) β Fullness logits (5) |
| βββ Conv head β Linear(128) β Shape logits (8) |
| ``` |
|
|
| Training uses `BCEWithLogitsLoss` for both heads simultaneously. |
|
|
| ### Detection β YOLOv4-tiny |
|
|
| - Input: 608Γ608, 6 object classes |
| - 2 detection scales (32Γ and 16Γ stride), 3 anchors each |
| - Losses: XY, WH, objectness, class (IoU-based anchor assignment) |
|
|
| ## Experiment Tracking |
|
|
| Training integrates with [Weights & Biases](https://wandb.ai) for metric logging and visualization. Set your API key before training: |
|
|
| ```bash |
| wandb login |
| ``` |