Spaces:
Sleeping
title: Computer Vison | Traffic Tracker
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
TrafficSense - Road Traffic Detection, Tracking, and Analytics
AIMS Senegal - Computer Vision Project 2 - April 2026
TrafficSense is a road-traffic analysis application that detects, tracks, counts, and summarizes moving traffic objects from video files, remote video URLs, or a webcam feed. The system combines YOLO object detection, ByteTrack object tracking, a FastAPI backend, and a browser dashboard for live monitoring and post-processing analytics.
The project focuses on six traffic classes: person, bicycle, car, motorbike, bus, and truck.
Main Features
| Area | Description |
|---|---|
| Object detection | YOLOv8-compatible models through Ultralytics. The default model path is best.pt. |
| Multi-object tracking | ByteTrack assigns persistent IDs to visible objects across frames. |
| Unique counting | Each tracked object is counted once when its track_id appears for the first time. |
| Supported classes | person, bicycle, car, motorbike, bus, truck. |
| Live processing | The backend streams annotated frames to the browser with Server-Sent Events. |
| Video inputs | Local upload, remote video URL, and webcam frame analysis. |
| Visual output | Bounding boxes, class labels, tracking IDs, object trails, and live counters. |
| Dashboard | Scene filtering, global statistics, class distribution, timeline chart, scene comparison, and object-position heatmap. |
| Logs | Detection CSV, raw JSONL detections, summary JSON, frame-level CSV statistics, and annotated MP4 output. |
| Export | Download logs and annotated videos directly from the interface. |
| Training support | Frame extraction and fine-tuning scripts are included for custom datasets. |
Architecture
traffic-tracker/
βββ backend/
β βββ main.py # FastAPI application, routes, sessions, streaming, dashboard aggregation
β βββ tracker.py # YOLO + ByteTrack processing engine and log generation
β βββ run_tracker.py # Command-line processing entry point
β βββ finetune.py # YOLO fine-tuning script
β βββ extract_frames.py # Utility to extract video frames for labeling
β βββ dataset.yaml # Dataset configuration for training
β βββ best.pt # Default model weights used by the app
β βββ requirements.txt # Python dependencies
βββ frontend/
β βββ index.html # Single-page dashboard and control interface
βββ data/
β βββ Traffic_detection.mp4
β βββ Group_05_Africa_countries_001_detections.csv
βββ logs/ # Created at runtime: summaries, detections, annotated videos
βββ uploads/ # Created at runtime: uploaded source videos
βββ output/ # Created at runtime when needed
βββ Dockerfile # Docker/Hugging Face Spaces deployment
βββ LICENSE
βββ README.md
Backend flow
- A video file, video URL, or webcam session is submitted to FastAPI.
TrafficTrackerloads the selected YOLO model and filters detections by selected classes.- YOLO detects objects frame by frame.
- ByteTrack assigns stable object IDs.
- The tracker writes annotated frames, detection rows, frame statistics, and summary metrics.
- The dashboard endpoint aggregates completed sessions and saved log files.
- The frontend renders live video feedback and analytics.
Frontend flow
The frontend is contained in frontend/index.html. It provides:
- Source selection: file upload, remote URL, or webcam.
- Scene and group metadata inputs.
- Model, confidence, and class controls.
- Live frame canvas with counters and progress state.
- Analytics dashboard with charts and heatmap.
- Log list and download controls.
Installation
Local Python setup
cd backend
pip install -r requirements.txt
For CUDA-enabled GPU environments, install the matching PyTorch build before running the app. For example:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
Start the application
From the backend directory:
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
Then open:
http://localhost:8000
The FastAPI server serves the frontend automatically from frontend/index.html.
Docker
docker build -t trafficsense .
docker run --rm -p 7860:7860 trafficsense
Then open:
http://localhost:7860
The Docker configuration is also compatible with the Hugging Face Spaces metadata at the top of this README.
Using the Web Interface
Analyze an uploaded video
- Open the web interface.
- Drop a video into the upload area or choose a file manually.
- Enter a scene name, such as
intersection_01orAfrica_countries. - Keep the default group ID or enter another group name.
- Select the traffic classes to track.
- Choose the model path and confidence threshold.
- Click START ANALYSIS.
- Watch the annotated video stream and live object counters.
- Open the Analytics tab to inspect summary charts and the position heatmap.
- Open the Log Files tab to download generated outputs.
Analyze a remote video URL
Paste a direct http:// or https:// video URL into the URL field. The backend downloads the video into uploads/ and processes it like a normal uploaded file.
Analyze webcam frames
Use the webcam option in the interface. The browser captures frames and sends them to the backend session. When stopped, the backend saves the same summary and detection files used for video processing.
Dashboard and Metrics
The dashboard combines all completed in-memory sessions and saved *_summary.json files in logs/.
Summary cards
| Metric | Meaning |
|---|---|
| Scenes | Number of completed scenes included in the current dashboard filter. |
| Total objects | Sum of unique tracked objects across selected scenes. |
| Total duration | Total processed video duration in seconds. |
| Average per scene | total_objects / number_of_scenes, rounded to the nearest integer. |
Charts
| Component | Description |
|---|---|
| Objects by class | Bar chart of unique object counts per class. |
| Traffic intensity timeline | Number of detections grouped into 10-second buckets. |
| Scene comparison | Per-scene duration, total object count, cars, pedestrians, and trucks/buses. |
| Position heatmap | A normalized grid built from object center coordinates (cx, cy) in the detection CSV files. |
Position heatmap
The heatmap uses each detection center and normalizes it by the frame size:
x = cx / frame_widthy = cy / frame_height
The normalized positions are assigned to a 24 by 24 grid. Each cell stores:
- total detections in that region
- per-class counts in that region
- dominant class for color display
The map includes percentage coordinates around the plot. Cell colors match the class colors used in the Track Classes controls:
| Class | Color role |
|---|---|
person |
Red |
bicycle |
Green |
car |
Amber |
motorbike |
Pink |
bus |
Blue |
truck |
Purple |
Tracking and Counting Method
The tracker uses YOLO detections followed by ByteTrack tracking. Each detection includes a track_id when the tracker can associate it with an object trajectory.
Counting is based on first appearance:
if track_id has not been counted before:
add track_id to counted_ids
increment count_per_class[class_name]
This avoids counting the same visible object again on every frame. The CSV schema still includes crossed_line and direction fields for compatibility with shared traffic-analysis formats, but the current implementation stores false and an empty direction by default.
The tracker also computes approximate pixel speed:
speed_px_s = distance_between_current_and_previous_center * fps
This value is useful for relative movement analysis inside the same video, but it is not a calibrated real-world speed in km/h.
API Reference
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Serves the web interface. |
GET |
/health |
Basic server status and active session count. |
GET |
/classes |
Returns supported traffic classes. |
POST |
/upload |
Uploads a file or downloads a video URL and starts processing. |
POST |
/webcam/start |
Starts a webcam tracking session. |
POST |
/webcam/frame/{sid} |
Sends one webcam frame for detection and tracking. |
POST |
/webcam/stop/{sid} |
Stops a webcam session and writes logs. |
GET |
/stream/{sid} |
Streams annotated frames for an uploaded video session using Server-Sent Events. |
GET |
/status/{sid} |
Returns processing status, progress, FPS, and latest counters. |
GET |
/summary/{sid} |
Returns final summary for a completed session. |
GET |
/dashboard |
Returns aggregated dashboard data and heatmap cells. |
GET |
/logs |
Lists generated files in logs/. |
GET |
/videos |
Lists annotated MP4 files. |
GET |
/log/{filename} |
Downloads one log file. |
GET |
/download/video/{sid} |
Downloads annotated video for a completed session. |
GET |
/download/video-file/{filename} |
Downloads an annotated video by filename. |
GET |
/stream/video/{sid} |
Streams an annotated video for browser playback. |
GET |
/stream/video-file/{filename} |
Streams an annotated video by filename. |
Upload form fields
| Field | Type | Default | Description |
|---|---|---|---|
file |
file | empty | Local video file. |
video_url |
string | empty | Remote video URL. Used only if no file is uploaded. |
scene_name |
string | scene_01 |
Scene label used in logs and dashboard filters. |
group_id |
string | Group_05 |
Group label used in log filenames. |
classes |
comma-separated string | all classes | Example: car,bus,truck. |
conf |
float | 0.5 |
YOLO confidence threshold. |
model |
string | best.pt |
Path or name of the model weights. |
Output Files
Each completed session writes files into logs/ using this pattern:
{group_id}_{scene_name}_{order}_detections.csv
{group_id}_{scene_name}_{order}_detections.jsonl
{group_id}_{scene_name}_{order}_summary.json
{group_id}_{scene_name}_{order}_frame_stats.csv
{group_id}_{scene_name}_{order}_annotated.mp4
The order number is automatically incremented per group and scene.
Detection CSV
The main detection table contains one row per detected object per frame:
| Column | Description |
|---|---|
frame |
Frame index starting at 1. |
timestamp_sec |
Timestamp in seconds. |
scene_name |
Scene label. |
group_id |
Group label. |
video_name |
Original video name or webcam. |
track_id |
ByteTrack object ID, or -1 if no ID is assigned. |
class_name |
Detected traffic class. |
confidence |
YOLO detection confidence. |
bbox_x1, bbox_y1, bbox_x2, bbox_y2 |
Bounding box coordinates in pixels. |
cx, cy |
Bounding box center in pixels. |
frame_width, frame_height |
Source frame dimensions. |
crossed_line |
Compatibility field, currently false by default. |
direction |
Compatibility field, currently empty by default. |
speed_px_s |
Approximate speed in pixels per second. |
JSONL detections
The JSONL file stores the same detection rows in line-delimited JSON format.
Summary JSON
{
"scene": "Africa_countries",
"group_id": "Group_05",
"video_name": "Traffic_detection.mp4",
"session_id": "abc123",
"processed_at": "2026-04-29T12:00:00",
"total_frames": 1800,
"duration_sec": 60.0,
"fps": 30.0,
"resolution": [1080, 1440],
"selected_classes": ["person", "bicycle", "car", "motorbike", "bus", "truck"],
"total_unique_objects": 142,
"count_per_class": {
"car": 98,
"bus": 12,
"truck": 17,
"person": 15
},
"annotated_video": "logs/Group_05_Africa_countries_001_annotated.mp4",
"temporal_distribution": [
{"bucket_10s": 0, "detections": 34},
{"bucket_10s": 1, "detections": 51}
]
}
Frame statistics CSV
The frame statistics file summarizes each processed frame, including frame index, timestamp, number of detections in the frame, visibility state, and cumulative counts.
Command-Line Processing
The CLI is useful for batch processing videos without the web interface.
cd backend
# Process a video and show the annotated window
python run_tracker.py --video ../data/Traffic_detection.mp4 --scene Africa_countries --show
# Track only selected classes
python run_tracker.py --video ../data/Traffic_detection.mp4 --classes car bus truck --conf 0.4
# Use a custom model path
python run_tracker.py --video ../data/Traffic_detection.mp4 --model best.pt --conf 0.5
Generated logs are saved to the directory passed with --logs or to logs/ by default.
Fine-Tuning Workflow
The repository includes utilities for preparing and training a custom detector.
1. Extract frames
cd backend
python extract_frames.py --video ../data/Traffic_detection.mp4 --out frames/ --every 10
2. Label the frames
Label extracted frames with a tool that can export YOLO-format annotations. The dataset configuration should follow backend/dataset.yaml.
3. Train or fine-tune
python finetune.py --data dataset.yaml --model yolov8s.pt --epochs 50 --device 0
4. Use the trained weights
python run_tracker.py --video ../data/Traffic_detection.mp4 --model runs/traffic/finetune/weights/best.pt
The web interface can also use a custom model by entering the model path in the model field.
Model and Class Notes
The tracker maps the following COCO class IDs:
| COCO ID | Class |
|---|---|
| 0 | person |
| 1 | bicycle |
| 2 | car |
| 3 | motorbike |
| 5 | bus |
| 7 | truck |
The default confidence threshold in the web API is 0.5. Lower values may detect more objects but can increase false positives. Higher values reduce weak detections but may miss smaller or partially occluded objects.
Practical Notes
best.ptshould be available from the backend working directory unless another model path is provided.logs/,uploads/, andoutput/are created automatically.- Annotated MP4 files are written with OpenCV. When
ffmpegis available, the backend can produce a browser-compatible H.264 copy for playback. - Heatmap data depends on detection CSV files. If a summary exists without its matching detection CSV, the heatmap for that scene will be empty.
- Unique counts depend on tracking stability. Heavy occlusion, camera cuts, or very crowded scenes can create new IDs for the same physical object.
License
MIT - see LICENSE.
Authors
AIMS Senegal - Computer Vision 2026