nateraw/pascal-voc-2012
Viewer β’ Updated β’ 2.91k β’ 800 β’ 2
A comprehensive object detection system built with TensorFlow/Keras supporting custom model training and inference on images, videos, and real-time webcam feeds.
The-void/
βββ main.py # Main entry point
βββ requirements.txt # Python dependencies
βββ config.yaml # Configuration file
βββ README.md # This file
βββ src/ # Source code
β βββ config.py # Configuration management
β βββ dataset.py # Dataset loading and preprocessing
β βββ model.py # Model definition
β βββ train.py # Training script
β βββ detect.py # Inference module
β βββ evaluate.py # Evaluation metrics
β βββ utils.py # Utility functions
βββ data/ # Training data
β βββ train/ # Training images and annotations
β βββ val/ # Validation images and annotations
β βββ test/ # Test images
βββ models/ # Trained models
βββ outputs/ # Detection results and logs
pip install -r requirements.txt
Edit config.yaml to customize training and inference parameters:
# Model settings
model:
backbone: "mobilenetv2" # Options: mobilenetv2, resnet50, efficientnet
input_shape: [416, 416, 3]
num_classes: 80
# Training settings
training:
epochs: 100
batch_size: 32
learning_rate: 0.001
# Inference settings
inference:
confidence_threshold: 0.5
iou_threshold: 0.4
Train with dummy data (for testing):
python main.py train --dummy-data
Train with custom data:
python main.py train --config config.yaml
Detect objects in an image:
python main.py detect --model models/final_model.h5 --input-image test.jpg
Detect objects in a video:
python main.py detect --model models/final_model.h5 --input-video test.mp4 --output-video output.mp4
Real-time detection from webcam:
python main.py detect --model models/final_model.h5 --webcam --duration 30
The system supports Pascal VOC XML format for annotations:
<?xml version="1.0"?>
<annotation>
<object>
<name>person</name>
<bndbox>
<xmin>100</xmin>
<ymin>150</ymin>
<xmax>300</xmax>
<ymax>450</ymax>
</bndbox>
</object>
</annotation>
[x1, y1, x2, y2]batch_size in config.yamlfrom src.train import train_model
from src.config import Config
config = Config('config.yaml')
model, history = train_model(config, use_custom_data=True)
from src.dataset import ObjectDetectionDataset
dataset = ObjectDetectionDataset(
image_dir='data/train/images',
annotation_dir='data/train/annotations',
image_size=(416, 416),
class_names=['person', 'car', 'dog']
)
train_gen = dataset.get_data_generator(batch_size=32, shuffle=True)
Base model
google/efficientnet-b0