autonomous_vehicle_trajectory_transformer
Overview
This model is a sequence-to-sequence Transformer designed to predict the future paths of road agents (vehicles, cyclists, and pedestrians). It processes historical multi-sensor fusion logs to generate multi-modal trajectory predictions, allowing autonomous systems to make safer motion planning decisions in complex urban environments.
Model Architecture
The architecture consists of a Temporal-Spatial Transformer encoder-decoder.
- Input: A 2-second history of coordinates, velocities, and headings ($x, y, v, \theta$).
- Encoder: Captures temporal dependencies and social interactions between multiple agents using self-attention.
- Decoder: Employs query-based decoding to produce the next 5 seconds of coordinates at 10Hz.
- Optimization: Minimized using the Average Displacement Error (ADE) and Final Displacement Error (FDE):
Intended Use
- Collision Avoidance: Providing high-fidelity "what-if" scenarios for reactive braking and steering systems.
- Behavior Prediction: Classifying agent intent (e.g., lane change vs. straight driving) based on trajectory curvature.
- Simulation: Generating realistic agent movements for virtual testing of self-driving software stacks.
Limitations
- Occlusions: If an agent is hidden from sensors for more than 500ms, the model's uncertainty increases drastically.
- Unseen Environments: Performance may degrade in regions with traffic rules significantly different from the training data (e.g., changing from right-hand to left-hand traffic).
- Physical Constraints: Without a post-processing kinematic filter, the model might occasionally predict physically impossible movements (e.g., instantaneous 90-degree turns at high speed).
- Downloads last month
- 6