Instructions to use rfonod/geo-trax with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use rfonod/geo-trax with ultralytics:
from ultralytics import YOLOvv8 model = YOLOvv8.from_pretrained("rfonod/geo-trax") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
Geo-trax: YOLOv8s Vehicle Detector for Drone BEV Imagery
This is the default detection model for Geo-trax, a comprehensive pipeline for extracting georeferenced vehicle trajectories from high-altitude drone (bird's-eye view) video footage. The model detects vehicles in aerial imagery and underpins the results reported in the associated publication.
🎬 This accelerated animation previews some of the capabilities of Geo-trax. Watch the full demonstration (~4 min) on YouTube.
Model Details
| Property | Value |
|---|---|
| Architecture | YOLOv8s (HBB, horizontal bounding boxes) |
| Input resolution | 1920 × 1920 px |
| Classes | 6 trained (4 primary + 2 auxiliary; see below) |
| Parameters | ~11 M |
| Framework | Ultralytics ≥ 8.4.64 |
| Trained on | 19,339 annotated aerial images (679,306 labeled instances); multi-stage, see publication |
| Validated on | Songdo Vision test set (1,084 images, 55,124 vehicle instances) |
Classes and Detection Performance
Metrics reported on the Songdo Vision test split
(1,084 images, 55,124 labeled vehicle instances). The Instances column is the per-class
support in the test set. See Table 3 of the
publication for full results.
| ID | Label | Notes | Instances | Precision | Recall | mAP@50 | mAP@50-95 |
|---|---|---|---|---|---|---|---|
| 0 | Car | incl. vans | 49,508 | 0.979 | 0.981 | 0.992 | 0.835 |
| 1 | Bus | 1,759 | 0.952 | 0.977 | 0.988 | 0.826 | |
| 2 | Truck | 3,052 | 0.887 | 0.916 | 0.935 | 0.722 | |
| 3 | Motorcycle | 805 | 0.827 | 0.866 | 0.888 | 0.463 | |
| 4 | Pedestrian | not evaluated | n/a | n/a | n/a | n/a | n/a |
| 5 | Bicycle | not evaluated | n/a | n/a | n/a | n/a | n/a |
| All | 55,124 | 0.911 | 0.935 | 0.951 | 0.711 |
The model reaches 0.951 mAP@50 and 0.711 mAP@50-95 overall, with near-saturated accuracy on cars and buses (mAP@50 ≥ 0.988). Trucks and especially motorcycles are harder: motorcycles are small, sparse in the test set (805 instances), and the main driver of the lower mAP@50-95.
Evaluation Plots
Precision-recall curves and the normalized confusion matrix on the Songdo Vision test set:
![]() Precision-Recall Curve |
![]() Normalized Confusion Matrix |
Note on pedestrian and bicycle classes: The model was trained on pedestrian and bicycle instances; however, these classes are not evaluated and not recommended for use. They were underrepresented in the training data, are not annotated in the Songdo Vision dataset (making reliable evaluation impossible), and achieve poor detection performance in practice.
How to Use
With Geo-trax (recommended)
This model is the default in Geo-trax and downloads automatically on first use:
pip install geo-trax
geotrax extract video.mp4 # detect, track, and stabilize; auto-downloads the model
geotrax batch video.mp4 --no-geo # detect, track, and stabilize; skip georeferencing
geotrax batch video.mp4 # full pipeline including georeferencing (requires orthophotos)
See the Geo-trax GitHub README for the full pipeline, configuration options, and georeferencing.
Direct Ultralytics inference
from ultralytics import YOLO
from huggingface_hub import hf_hub_download
weights = hf_hub_download(repo_id="rfonod/geo-trax", filename="geotrax_hbb_yolov8s_1920_v1.pt")
model = YOLO(weights)
results = model("drone_frame.jpg", imgsz=1920, conf=0.25, iou=0.45, classes=[0, 1, 2, 3])
results[0].show()
Tip: The model was trained and validated at 1920 px input resolution. Downscaling to 1280 px is possible with a small accuracy trade-off; going below 960 px significantly degrades detection of small vehicles (motorcycles, distant cars). Pass
classes=[0, 1, 2, 3]to restrict inference to the four evaluated classes and suppress unreliable predictions.
Training Data
Training followed a multi-stage strategy starting from YOLOv8s weights pretrained on COCO as the initial foundation. Two successive stages were applied:
Stage 1 (BASE): The model was trained on a large, diverse collection drawn from eight public aerial and drone datasets (CARPK, PUCPR+, CyCAR, UAVDT, HARPY, RAI4VD, UIT-ADrone, and VisDrone) combined with the Songdo Vision dataset, totalling 19,339 training images with 679,306 annotations across 6 vehicle classes (car, bus, truck, motorcycle, pedestrian, bicycle).
Stage 2 (FINE): The BASE-trained model was subsequently fine-tuned on a curated, high-quality subset of 9,004 images with 321,368 annotations, emphasising accurate annotations and higher-resolution images, again combined with Songdo Vision, to yield the final weights released here.
Training set composition (annotations per class):
| Stage | Images | Annotations | Car | Bus | Truck | Motorcycle | Pedestrian | Bicycle |
|---|---|---|---|---|---|---|---|---|
| BASE | 19,339 | 679,306 | 561,666 | 15,587 | 28,830 | 44,512 | 24,239 | 4,472 |
| FINE | 9,004 | 321,368 | 266,745 | 8,047 | 14,305 | 30,925 | 1,260 | 86 |
Songdo Vision comprises 5,419 annotated drone frames (4,335 training / 1,084 test; 80/20 split) collected during a large-scale urban traffic monitoring experiment in Songdo, South Korea. It covers four primary vehicle classes captured at 140-150 m altitude by DJI Mavic 3 drones, contributing 217,311 training and 55,124 test instances to the totals above.
Training configuration:
| Setting | Value |
|---|---|
| Initialization | YOLOv8s pretrained on COCO |
| Optimizer | SGD |
| Learning rate (initial / final factor) | 0.01 / 0.01 |
| Momentum | 0.937 |
| Weight decay | 0.0005 |
| Batch size | 8 |
| Early stopping | 50-epoch patience |
| Input resolution | 1920 × 1920 px (letterbox padding) |
| Mixed precision | AMP enabled |
| Augmentation | random scaling, translation, horizontal flip, mosaic, colour jitter, Gaussian/median blur, grayscale, CLAHE |
See the publication for complete dataset statistics, training details, and ablation results.
Intended Use and Limitations
- GSD assumption: The bundled Geo-trax config assumes a ground sampling distance (GSD) of ~0.027 m/px (DJI Mavic 3, 4K, 140-150 m altitude). Adjust this value in the config for different hardware or flight altitudes.
- Supported classes: Car, bus, truck, and motorcycle (class IDs 0-3). The model was also
trained on pedestrian and bicycle instances; however, these classes achieve poor detection
performance and are not recommended for use (see the class table above). Geo-trax filters to
the four primary classes by default; when using Ultralytics directly, pass
classes=[0, 1, 2, 3]to suppress unreliable predictions.
Related datasets and resources
- Songdo Traffic: the georeferenced vehicle-trajectory dataset this model helps produce via
the Geo-trax pipeline:
10.5281/zenodo.13828384· HFrfonod/songdo-traffic - Songdo Vision: the vehicle-detection (annotated image) dataset used to train and validate
this model:
10.5281/zenodo.13828407· HFrfonod/songdo-vision - Source video recordings (not open access):
10.5075/EPFL.20.500.14299/253923 - Publication: Transportation Research Part C (2025):
10.1016/j.trc.2025.105205· arXiv:2411.02136 - Software: Geo-trax: github.com/rfonod/geo-trax ·
Zenodo
10.5281/zenodo.12119542· demo video
Citation
If you use this model, please cite the associated publication:
@article{fonod2025advanced,
title = {Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery},
author = {Fonod, Robert and Cho, Haechan and Yeo, Hwasoo and Geroliminis, Nikolas},
journal = {Transportation Research Part C: Emerging Technologies},
volume = {178},
pages = {105205},
year = {2025},
doi = {10.1016/j.trc.2025.105205}
}
If you additionally use the Geo-trax software, please also cite the specific version you used via its Zenodo record. For example, for version 1.0.0:
@software{fonod2026geo-trax,
author = {Fonod, Robert},
title = {Geo-trax: A Comprehensive Framework for Georeferenced Vehicle Trajectory Extraction from Drone Imagery},
url = {https://github.com/rfonod/geo-trax},
doi = {10.5281/zenodo.12119542},
version = {1.0.0},
year = {2026}
}
License
This model is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license; see the LICENSE file for the full terms. The Geo-trax codebase is distributed separately under the MIT License.
- Downloads last month
- -
Model tree for rfonod/geo-trax
Base model
Ultralytics/YOLOv8Datasets used to train rfonod/geo-trax
Voxel51/VisDrone2019-DET
rfonod/songdo-vision
Collection including rfonod/geo-trax
Paper for rfonod/geo-trax
Evaluation results
- mAP@0.5 on Songdo Visiontest set self-reported0.951
- mAP@0.5:0.95 on Songdo Visiontest set self-reported0.711
- Precision on Songdo Visiontest set self-reported0.911
- Recall on Songdo Visiontest set self-reported0.935


