autonomous_vehicle_trajectory_transformer

Overview

This model is a sequence-to-sequence Transformer designed to predict the future paths of road agents (vehicles, cyclists, and pedestrians). It processes historical multi-sensor fusion logs to generate multi-modal trajectory predictions, allowing autonomous systems to make safer motion planning decisions in complex urban environments.

Model Architecture

The architecture consists of a Temporal-Spatial Transformer encoder-decoder.

  • Input: A 2-second history of coordinates, velocities, and headings ($x, y, v, \theta$).
  • Encoder: Captures temporal dependencies and social interactions between multiple agents using self-attention.
  • Decoder: Employs query-based decoding to produce the next 5 seconds of coordinates at 10Hz.
  • Optimization: Minimized using the Average Displacement Error (ADE) and Final Displacement Error (FDE): ADE=1Tβˆ‘t=1T(x^tβˆ’xt)2+(y^tβˆ’yt)2ADE = \frac{1}{T} \sum_{t=1}^{T} \sqrt{(\hat{x}_t - x_t)^2 + (\hat{y}_t - y_t)^2}

Intended Use

  • Collision Avoidance: Providing high-fidelity "what-if" scenarios for reactive braking and steering systems.
  • Behavior Prediction: Classifying agent intent (e.g., lane change vs. straight driving) based on trajectory curvature.
  • Simulation: Generating realistic agent movements for virtual testing of self-driving software stacks.

Limitations

  • Occlusions: If an agent is hidden from sensors for more than 500ms, the model's uncertainty increases drastically.
  • Unseen Environments: Performance may degrade in regions with traffic rules significantly different from the training data (e.g., changing from right-hand to left-hand traffic).
  • Physical Constraints: Without a post-processing kinematic filter, the model might occasionally predict physically impossible movements (e.g., instantaneous 90-degree turns at high speed).
Downloads last month
6
Video Preview
loading