Spaces:

CyberAl
/

Traffic-Tracker

Sleeping

App Files Files Community

Traffic-Tracker / README.md

cyberai-1

Update Readme

84a9657 29 days ago

preview code

raw

history blame contribute delete

15.2 kB

metadata

title: Computer Vison | Traffic Tracker
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

TrafficSense - Road Traffic Detection, Tracking, and Analytics

AIMS Senegal - Computer Vision Project 2 - April 2026

TrafficSense is a road-traffic analysis application that detects, tracks, counts, and summarizes moving traffic objects from video files, remote video URLs, or a webcam feed. The system combines YOLO object detection, ByteTrack object tracking, a FastAPI backend, and a browser dashboard for live monitoring and post-processing analytics.

The project focuses on six traffic classes: person, bicycle, car, motorbike, bus, and truck.

Main Features

Area	Description
Object detection	YOLOv8-compatible models through Ultralytics. The default model path is `best.pt`.
Multi-object tracking	ByteTrack assigns persistent IDs to visible objects across frames.
Unique counting	Each tracked object is counted once when its `track_id` appears for the first time.
Supported classes	`person`, `bicycle`, `car`, `motorbike`, `bus`, `truck`.
Live processing	The backend streams annotated frames to the browser with Server-Sent Events.
Video inputs	Local upload, remote video URL, and webcam frame analysis.
Visual output	Bounding boxes, class labels, tracking IDs, object trails, and live counters.
Dashboard	Scene filtering, global statistics, class distribution, timeline chart, scene comparison, and object-position heatmap.
Logs	Detection CSV, raw JSONL detections, summary JSON, frame-level CSV statistics, and annotated MP4 output.
Export	Download logs and annotated videos directly from the interface.
Training support	Frame extraction and fine-tuning scripts are included for custom datasets.

Architecture

traffic-tracker/
├── backend/
│   ├── main.py            # FastAPI application, routes, sessions, streaming, dashboard aggregation
│   ├── tracker.py         # YOLO + ByteTrack processing engine and log generation
│   ├── run_tracker.py     # Command-line processing entry point
│   ├── finetune.py        # YOLO fine-tuning script
│   ├── extract_frames.py  # Utility to extract video frames for labeling
│   ├── dataset.yaml       # Dataset configuration for training
│   ├── best.pt            # Default model weights used by the app
│   └── requirements.txt   # Python dependencies
├── frontend/
│   └── index.html         # Single-page dashboard and control interface
├── data/
│   ├── Traffic_detection.mp4
│   └── Group_05_Africa_countries_001_detections.csv
├── logs/                  # Created at runtime: summaries, detections, annotated videos
├── uploads/               # Created at runtime: uploaded source videos
├── output/                # Created at runtime when needed
├── Dockerfile             # Docker/Hugging Face Spaces deployment
├── LICENSE
└── README.md

Backend flow

A video file, video URL, or webcam session is submitted to FastAPI.
TrafficTracker loads the selected YOLO model and filters detections by selected classes.
YOLO detects objects frame by frame.
ByteTrack assigns stable object IDs.
The tracker writes annotated frames, detection rows, frame statistics, and summary metrics.
The dashboard endpoint aggregates completed sessions and saved log files.
The frontend renders live video feedback and analytics.

Frontend flow

The frontend is contained in frontend/index.html. It provides:

Source selection: file upload, remote URL, or webcam.
Scene and group metadata inputs.
Model, confidence, and class controls.
Live frame canvas with counters and progress state.
Analytics dashboard with charts and heatmap.
Log list and download controls.

Installation

Local Python setup

cd backend
pip install -r requirements.txt

For CUDA-enabled GPU environments, install the matching PyTorch build before running the app. For example:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Start the application

From the backend directory:

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Then open:

http://localhost:8000

The FastAPI server serves the frontend automatically from frontend/index.html.

Docker

docker build -t trafficsense .
docker run --rm -p 7860:7860 trafficsense

Then open:

http://localhost:7860

The Docker configuration is also compatible with the Hugging Face Spaces metadata at the top of this README.

Using the Web Interface

Analyze an uploaded video

Open the web interface.
Drop a video into the upload area or choose a file manually.
Enter a scene name, such as intersection_01 or Africa_countries.
Keep the default group ID or enter another group name.
Select the traffic classes to track.
Choose the model path and confidence threshold.
Click START ANALYSIS.
Watch the annotated video stream and live object counters.
Open the Analytics tab to inspect summary charts and the position heatmap.
Open the Log Files tab to download generated outputs.

Analyze a remote video URL

Paste a direct http:// or https:// video URL into the URL field. The backend downloads the video into uploads/ and processes it like a normal uploaded file.

Analyze webcam frames

Use the webcam option in the interface. The browser captures frames and sends them to the backend session. When stopped, the backend saves the same summary and detection files used for video processing.

Dashboard and Metrics

The dashboard combines all completed in-memory sessions and saved *_summary.json files in logs/.

Summary cards

Metric	Meaning
Scenes	Number of completed scenes included in the current dashboard filter.
Total objects	Sum of unique tracked objects across selected scenes.
Total duration	Total processed video duration in seconds.
Average per scene	`total_objects / number_of_scenes`, rounded to the nearest integer.

Charts

Component	Description
Objects by class	Bar chart of unique object counts per class.
Traffic intensity timeline	Number of detections grouped into 10-second buckets.
Scene comparison	Per-scene duration, total object count, cars, pedestrians, and trucks/buses.
Position heatmap	A normalized grid built from object center coordinates (`cx`, `cy`) in the detection CSV files.

Position heatmap

The heatmap uses each detection center and normalizes it by the frame size:

x = cx / frame_width
y = cy / frame_height

The normalized positions are assigned to a 24 by 24 grid. Each cell stores:

total detections in that region
per-class counts in that region
dominant class for color display

The map includes percentage coordinates around the plot. Cell colors match the class colors used in the Track Classes controls:

Class	Color role
`person`	Red
`bicycle`	Green
`car`	Amber
`motorbike`	Pink
`bus`	Blue
`truck`	Purple

Tracking and Counting Method

The tracker uses YOLO detections followed by ByteTrack tracking. Each detection includes a track_id when the tracker can associate it with an object trajectory.

Counting is based on first appearance:

if track_id has not been counted before:
    add track_id to counted_ids
    increment count_per_class[class_name]

This avoids counting the same visible object again on every frame. The CSV schema still includes crossed_line and direction fields for compatibility with shared traffic-analysis formats, but the current implementation stores false and an empty direction by default.

The tracker also computes approximate pixel speed:

speed_px_s = distance_between_current_and_previous_center * fps

This value is useful for relative movement analysis inside the same video, but it is not a calibrated real-world speed in km/h.

API Reference

Method	Endpoint	Description
`GET`	`/`	Serves the web interface.
`GET`	`/health`	Basic server status and active session count.
`GET`	`/classes`	Returns supported traffic classes.
`POST`	`/upload`	Uploads a file or downloads a video URL and starts processing.
`POST`	`/webcam/start`	Starts a webcam tracking session.
`POST`	`/webcam/frame/{sid}`	Sends one webcam frame for detection and tracking.
`POST`	`/webcam/stop/{sid}`	Stops a webcam session and writes logs.
`GET`	`/stream/{sid}`	Streams annotated frames for an uploaded video session using Server-Sent Events.
`GET`	`/status/{sid}`	Returns processing status, progress, FPS, and latest counters.
`GET`	`/summary/{sid}`	Returns final summary for a completed session.
`GET`	`/dashboard`	Returns aggregated dashboard data and heatmap cells.
`GET`	`/logs`	Lists generated files in `logs/`.
`GET`	`/videos`	Lists annotated MP4 files.
`GET`	`/log/{filename}`	Downloads one log file.
`GET`	`/download/video/{sid}`	Downloads annotated video for a completed session.
`GET`	`/download/video-file/{filename}`	Downloads an annotated video by filename.
`GET`	`/stream/video/{sid}`	Streams an annotated video for browser playback.
`GET`	`/stream/video-file/{filename}`	Streams an annotated video by filename.

Upload form fields

Field	Type	Default	Description
`file`	file	empty	Local video file.
`video_url`	string	empty	Remote video URL. Used only if no file is uploaded.
`scene_name`	string	`scene_01`	Scene label used in logs and dashboard filters.
`group_id`	string	`Group_05`	Group label used in log filenames.
`classes`	comma-separated string	all classes	Example: `car,bus,truck`.
`conf`	float	`0.5`	YOLO confidence threshold.
`model`	string	`best.pt`	Path or name of the model weights.

Output Files

Each completed session writes files into logs/ using this pattern:

{group_id}_{scene_name}_{order}_detections.csv
{group_id}_{scene_name}_{order}_detections.jsonl
{group_id}_{scene_name}_{order}_summary.json
{group_id}_{scene_name}_{order}_frame_stats.csv
{group_id}_{scene_name}_{order}_annotated.mp4

The order number is automatically incremented per group and scene.

Detection CSV

The main detection table contains one row per detected object per frame:

Column	Description
`frame`	Frame index starting at 1.
`timestamp_sec`	Timestamp in seconds.
`scene_name`	Scene label.
`group_id`	Group label.
`video_name`	Original video name or `webcam`.
`track_id`	ByteTrack object ID, or `-1` if no ID is assigned.
`class_name`	Detected traffic class.
`confidence`	YOLO detection confidence.
`bbox_x1`, `bbox_y1`, `bbox_x2`, `bbox_y2`	Bounding box coordinates in pixels.
`cx`, `cy`	Bounding box center in pixels.
`frame_width`, `frame_height`	Source frame dimensions.
`crossed_line`	Compatibility field, currently `false` by default.
`direction`	Compatibility field, currently empty by default.
`speed_px_s`	Approximate speed in pixels per second.

JSONL detections

The JSONL file stores the same detection rows in line-delimited JSON format.

Summary JSON

{
  "scene": "Africa_countries",
  "group_id": "Group_05",
  "video_name": "Traffic_detection.mp4",
  "session_id": "abc123",
  "processed_at": "2026-04-29T12:00:00",
  "total_frames": 1800,
  "duration_sec": 60.0,
  "fps": 30.0,
  "resolution": [1080, 1440],
  "selected_classes": ["person", "bicycle", "car", "motorbike", "bus", "truck"],
  "total_unique_objects": 142,
  "count_per_class": {
    "car": 98,
    "bus": 12,
    "truck": 17,
    "person": 15
  },
  "annotated_video": "logs/Group_05_Africa_countries_001_annotated.mp4",
  "temporal_distribution": [
    {"bucket_10s": 0, "detections": 34},
    {"bucket_10s": 1, "detections": 51}
  ]
}

Frame statistics CSV

The frame statistics file summarizes each processed frame, including frame index, timestamp, number of detections in the frame, visibility state, and cumulative counts.

Command-Line Processing

The CLI is useful for batch processing videos without the web interface.

cd backend

# Process a video and show the annotated window
python run_tracker.py --video ../data/Traffic_detection.mp4 --scene Africa_countries --show

# Track only selected classes
python run_tracker.py --video ../data/Traffic_detection.mp4 --classes car bus truck --conf 0.4

# Use a custom model path
python run_tracker.py --video ../data/Traffic_detection.mp4 --model best.pt --conf 0.5

Generated logs are saved to the directory passed with --logs or to logs/ by default.

Fine-Tuning Workflow

The repository includes utilities for preparing and training a custom detector.

1. Extract frames

cd backend
python extract_frames.py --video ../data/Traffic_detection.mp4 --out frames/ --every 10

2. Label the frames

Label extracted frames with a tool that can export YOLO-format annotations. The dataset configuration should follow backend/dataset.yaml.

3. Train or fine-tune

python finetune.py --data dataset.yaml --model yolov8s.pt --epochs 50 --device 0

4. Use the trained weights

python run_tracker.py --video ../data/Traffic_detection.mp4 --model runs/traffic/finetune/weights/best.pt

The web interface can also use a custom model by entering the model path in the model field.

Model and Class Notes

The tracker maps the following COCO class IDs:

COCO ID	Class
0	person
1	bicycle
2	car
3	motorbike
5	bus
7	truck

The default confidence threshold in the web API is 0.5. Lower values may detect more objects but can increase false positives. Higher values reduce weak detections but may miss smaller or partially occluded objects.

Practical Notes

best.pt should be available from the backend working directory unless another model path is provided.
logs/, uploads/, and output/ are created automatically.
Annotated MP4 files are written with OpenCV. When ffmpeg is available, the backend can produce a browser-compatible H.264 copy for playback.
Heatmap data depends on detection CSV files. If a summary exists without its matching detection CSV, the heatmap for that scene will be empty.
Unique counts depend on tracking stability. Heavy occlusion, camera cuts, or very crowded scenes can create new IDs for the same physical object.

License

MIT - see LICENSE.

Authors

AIMS Senegal - Computer Vision 2026