autonomous_vehicle_trajectory_transformer

Overview

This model is a sequence-to-sequence Transformer designed to predict the future paths of road agents (vehicles, cyclists, and pedestrians). It processes historical multi-sensor fusion logs to generate multi-modal trajectory predictions, allowing autonomous systems to make safer motion planning decisions in complex urban environments.

Model Architecture

The architecture consists of a Temporal-Spatial Transformer encoder-decoder.

Input: A 2-second history of coordinates, velocities, and headings ($x, y, v, \theta$).
Encoder: Captures temporal dependencies and social interactions between multiple agents using self-attention.
Decoder: Employs query-based decoding to produce the next 5 seconds of coordinates at 10Hz.
Optimization: Minimized using the Average Displacement Error (ADE) and Final Displacement Error (FDE): $ADE = \frac{1}{T} \sum_{t=1}^{T} \sqrt{(\hat{x}_t - x_t)^2 + (\hat{y}_t - y_t)^2}$

Intended Use

Collision Avoidance: Providing high-fidelity "what-if" scenarios for reactive braking and steering systems.
Behavior Prediction: Classifying agent intent (e.g., lane change vs. straight driving) based on trajectory curvature.
Simulation: Generating realistic agent movements for virtual testing of self-driving software stacks.

Limitations

Occlusions: If an agent is hidden from sensors for more than 500ms, the model's uncertainty increases drastically.
Unseen Environments: Performance may degrade in regions with traffic rules significantly different from the training data (e.g., changing from right-hand to left-hand traffic).
Physical Constraints: Without a post-processing kinematic filter, the model might occasionally predict physically impossible movements (e.g., instantaneous 90-degree turns at high speed).

Downloads last month: 9

Video Preview

Robotics