Traffic-Tracker / README.md
cyberai-1
Update Readme
84a9657
metadata
title: Computer Vison | Traffic Tracker
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

TrafficSense - Road Traffic Detection, Tracking, and Analytics

AIMS Senegal - Computer Vision Project 2 - April 2026

TrafficSense is a road-traffic analysis application that detects, tracks, counts, and summarizes moving traffic objects from video files, remote video URLs, or a webcam feed. The system combines YOLO object detection, ByteTrack object tracking, a FastAPI backend, and a browser dashboard for live monitoring and post-processing analytics.

The project focuses on six traffic classes: person, bicycle, car, motorbike, bus, and truck.

License Python YOLO


Main Features

Area Description
Object detection YOLOv8-compatible models through Ultralytics. The default model path is best.pt.
Multi-object tracking ByteTrack assigns persistent IDs to visible objects across frames.
Unique counting Each tracked object is counted once when its track_id appears for the first time.
Supported classes person, bicycle, car, motorbike, bus, truck.
Live processing The backend streams annotated frames to the browser with Server-Sent Events.
Video inputs Local upload, remote video URL, and webcam frame analysis.
Visual output Bounding boxes, class labels, tracking IDs, object trails, and live counters.
Dashboard Scene filtering, global statistics, class distribution, timeline chart, scene comparison, and object-position heatmap.
Logs Detection CSV, raw JSONL detections, summary JSON, frame-level CSV statistics, and annotated MP4 output.
Export Download logs and annotated videos directly from the interface.
Training support Frame extraction and fine-tuning scripts are included for custom datasets.

Architecture

traffic-tracker/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py            # FastAPI application, routes, sessions, streaming, dashboard aggregation
β”‚   β”œβ”€β”€ tracker.py         # YOLO + ByteTrack processing engine and log generation
β”‚   β”œβ”€β”€ run_tracker.py     # Command-line processing entry point
β”‚   β”œβ”€β”€ finetune.py        # YOLO fine-tuning script
β”‚   β”œβ”€β”€ extract_frames.py  # Utility to extract video frames for labeling
β”‚   β”œβ”€β”€ dataset.yaml       # Dataset configuration for training
β”‚   β”œβ”€β”€ best.pt            # Default model weights used by the app
β”‚   └── requirements.txt   # Python dependencies
β”œβ”€β”€ frontend/
β”‚   └── index.html         # Single-page dashboard and control interface
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ Traffic_detection.mp4
β”‚   └── Group_05_Africa_countries_001_detections.csv
β”œβ”€β”€ logs/                  # Created at runtime: summaries, detections, annotated videos
β”œβ”€β”€ uploads/               # Created at runtime: uploaded source videos
β”œβ”€β”€ output/                # Created at runtime when needed
β”œβ”€β”€ Dockerfile             # Docker/Hugging Face Spaces deployment
β”œβ”€β”€ LICENSE
└── README.md

Backend flow

  1. A video file, video URL, or webcam session is submitted to FastAPI.
  2. TrafficTracker loads the selected YOLO model and filters detections by selected classes.
  3. YOLO detects objects frame by frame.
  4. ByteTrack assigns stable object IDs.
  5. The tracker writes annotated frames, detection rows, frame statistics, and summary metrics.
  6. The dashboard endpoint aggregates completed sessions and saved log files.
  7. The frontend renders live video feedback and analytics.

Frontend flow

The frontend is contained in frontend/index.html. It provides:

  • Source selection: file upload, remote URL, or webcam.
  • Scene and group metadata inputs.
  • Model, confidence, and class controls.
  • Live frame canvas with counters and progress state.
  • Analytics dashboard with charts and heatmap.
  • Log list and download controls.

Installation

Local Python setup

cd backend
pip install -r requirements.txt

For CUDA-enabled GPU environments, install the matching PyTorch build before running the app. For example:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Start the application

From the backend directory:

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Then open:

http://localhost:8000

The FastAPI server serves the frontend automatically from frontend/index.html.

Docker

docker build -t trafficsense .
docker run --rm -p 7860:7860 trafficsense

Then open:

http://localhost:7860

The Docker configuration is also compatible with the Hugging Face Spaces metadata at the top of this README.


Using the Web Interface

Analyze an uploaded video

  1. Open the web interface.
  2. Drop a video into the upload area or choose a file manually.
  3. Enter a scene name, such as intersection_01 or Africa_countries.
  4. Keep the default group ID or enter another group name.
  5. Select the traffic classes to track.
  6. Choose the model path and confidence threshold.
  7. Click START ANALYSIS.
  8. Watch the annotated video stream and live object counters.
  9. Open the Analytics tab to inspect summary charts and the position heatmap.
  10. Open the Log Files tab to download generated outputs.

Analyze a remote video URL

Paste a direct http:// or https:// video URL into the URL field. The backend downloads the video into uploads/ and processes it like a normal uploaded file.

Analyze webcam frames

Use the webcam option in the interface. The browser captures frames and sends them to the backend session. When stopped, the backend saves the same summary and detection files used for video processing.


Dashboard and Metrics

The dashboard combines all completed in-memory sessions and saved *_summary.json files in logs/.

Summary cards

Metric Meaning
Scenes Number of completed scenes included in the current dashboard filter.
Total objects Sum of unique tracked objects across selected scenes.
Total duration Total processed video duration in seconds.
Average per scene total_objects / number_of_scenes, rounded to the nearest integer.

Charts

Component Description
Objects by class Bar chart of unique object counts per class.
Traffic intensity timeline Number of detections grouped into 10-second buckets.
Scene comparison Per-scene duration, total object count, cars, pedestrians, and trucks/buses.
Position heatmap A normalized grid built from object center coordinates (cx, cy) in the detection CSV files.

Position heatmap

The heatmap uses each detection center and normalizes it by the frame size:

  • x = cx / frame_width
  • y = cy / frame_height

The normalized positions are assigned to a 24 by 24 grid. Each cell stores:

  • total detections in that region
  • per-class counts in that region
  • dominant class for color display

The map includes percentage coordinates around the plot. Cell colors match the class colors used in the Track Classes controls:

Class Color role
person Red
bicycle Green
car Amber
motorbike Pink
bus Blue
truck Purple

Tracking and Counting Method

The tracker uses YOLO detections followed by ByteTrack tracking. Each detection includes a track_id when the tracker can associate it with an object trajectory.

Counting is based on first appearance:

if track_id has not been counted before:
    add track_id to counted_ids
    increment count_per_class[class_name]

This avoids counting the same visible object again on every frame. The CSV schema still includes crossed_line and direction fields for compatibility with shared traffic-analysis formats, but the current implementation stores false and an empty direction by default.

The tracker also computes approximate pixel speed:

speed_px_s = distance_between_current_and_previous_center * fps

This value is useful for relative movement analysis inside the same video, but it is not a calibrated real-world speed in km/h.


API Reference

Method Endpoint Description
GET / Serves the web interface.
GET /health Basic server status and active session count.
GET /classes Returns supported traffic classes.
POST /upload Uploads a file or downloads a video URL and starts processing.
POST /webcam/start Starts a webcam tracking session.
POST /webcam/frame/{sid} Sends one webcam frame for detection and tracking.
POST /webcam/stop/{sid} Stops a webcam session and writes logs.
GET /stream/{sid} Streams annotated frames for an uploaded video session using Server-Sent Events.
GET /status/{sid} Returns processing status, progress, FPS, and latest counters.
GET /summary/{sid} Returns final summary for a completed session.
GET /dashboard Returns aggregated dashboard data and heatmap cells.
GET /logs Lists generated files in logs/.
GET /videos Lists annotated MP4 files.
GET /log/{filename} Downloads one log file.
GET /download/video/{sid} Downloads annotated video for a completed session.
GET /download/video-file/{filename} Downloads an annotated video by filename.
GET /stream/video/{sid} Streams an annotated video for browser playback.
GET /stream/video-file/{filename} Streams an annotated video by filename.

Upload form fields

Field Type Default Description
file file empty Local video file.
video_url string empty Remote video URL. Used only if no file is uploaded.
scene_name string scene_01 Scene label used in logs and dashboard filters.
group_id string Group_05 Group label used in log filenames.
classes comma-separated string all classes Example: car,bus,truck.
conf float 0.5 YOLO confidence threshold.
model string best.pt Path or name of the model weights.

Output Files

Each completed session writes files into logs/ using this pattern:

{group_id}_{scene_name}_{order}_detections.csv
{group_id}_{scene_name}_{order}_detections.jsonl
{group_id}_{scene_name}_{order}_summary.json
{group_id}_{scene_name}_{order}_frame_stats.csv
{group_id}_{scene_name}_{order}_annotated.mp4

The order number is automatically incremented per group and scene.

Detection CSV

The main detection table contains one row per detected object per frame:

Column Description
frame Frame index starting at 1.
timestamp_sec Timestamp in seconds.
scene_name Scene label.
group_id Group label.
video_name Original video name or webcam.
track_id ByteTrack object ID, or -1 if no ID is assigned.
class_name Detected traffic class.
confidence YOLO detection confidence.
bbox_x1, bbox_y1, bbox_x2, bbox_y2 Bounding box coordinates in pixels.
cx, cy Bounding box center in pixels.
frame_width, frame_height Source frame dimensions.
crossed_line Compatibility field, currently false by default.
direction Compatibility field, currently empty by default.
speed_px_s Approximate speed in pixels per second.

JSONL detections

The JSONL file stores the same detection rows in line-delimited JSON format.

Summary JSON

{
  "scene": "Africa_countries",
  "group_id": "Group_05",
  "video_name": "Traffic_detection.mp4",
  "session_id": "abc123",
  "processed_at": "2026-04-29T12:00:00",
  "total_frames": 1800,
  "duration_sec": 60.0,
  "fps": 30.0,
  "resolution": [1080, 1440],
  "selected_classes": ["person", "bicycle", "car", "motorbike", "bus", "truck"],
  "total_unique_objects": 142,
  "count_per_class": {
    "car": 98,
    "bus": 12,
    "truck": 17,
    "person": 15
  },
  "annotated_video": "logs/Group_05_Africa_countries_001_annotated.mp4",
  "temporal_distribution": [
    {"bucket_10s": 0, "detections": 34},
    {"bucket_10s": 1, "detections": 51}
  ]
}

Frame statistics CSV

The frame statistics file summarizes each processed frame, including frame index, timestamp, number of detections in the frame, visibility state, and cumulative counts.


Command-Line Processing

The CLI is useful for batch processing videos without the web interface.

cd backend

# Process a video and show the annotated window
python run_tracker.py --video ../data/Traffic_detection.mp4 --scene Africa_countries --show

# Track only selected classes
python run_tracker.py --video ../data/Traffic_detection.mp4 --classes car bus truck --conf 0.4

# Use a custom model path
python run_tracker.py --video ../data/Traffic_detection.mp4 --model best.pt --conf 0.5

Generated logs are saved to the directory passed with --logs or to logs/ by default.


Fine-Tuning Workflow

The repository includes utilities for preparing and training a custom detector.

1. Extract frames

cd backend
python extract_frames.py --video ../data/Traffic_detection.mp4 --out frames/ --every 10

2. Label the frames

Label extracted frames with a tool that can export YOLO-format annotations. The dataset configuration should follow backend/dataset.yaml.

3. Train or fine-tune

python finetune.py --data dataset.yaml --model yolov8s.pt --epochs 50 --device 0

4. Use the trained weights

python run_tracker.py --video ../data/Traffic_detection.mp4 --model runs/traffic/finetune/weights/best.pt

The web interface can also use a custom model by entering the model path in the model field.


Model and Class Notes

The tracker maps the following COCO class IDs:

COCO ID Class
0 person
1 bicycle
2 car
3 motorbike
5 bus
7 truck

The default confidence threshold in the web API is 0.5. Lower values may detect more objects but can increase false positives. Higher values reduce weak detections but may miss smaller or partially occluded objects.


Practical Notes

  • best.pt should be available from the backend working directory unless another model path is provided.
  • logs/, uploads/, and output/ are created automatically.
  • Annotated MP4 files are written with OpenCV. When ffmpeg is available, the backend can produce a browser-compatible H.264 copy for playback.
  • Heatmap data depends on detection CSV files. If a summary exists without its matching detection CSV, the heatmap for that scene will be empty.
  • Unique counts depend on tracking stability. Heavy occlusion, camera cuts, or very crowded scenes can create new IDs for the same physical object.

License

MIT - see LICENSE.


Authors

AIMS Senegal - Computer Vision 2026