Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- robotics
|
| 5 |
+
- autonomous-vehicles
|
| 6 |
+
- trajectory-prediction
|
| 7 |
+
- time-series
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# autonomous_vehicle_trajectory_transformer
|
| 11 |
+
|
| 12 |
+
## Overview
|
| 13 |
+
This model is a sequence-to-sequence Transformer designed to predict the future paths of road agents (vehicles, cyclists, and pedestrians). It processes historical multi-sensor fusion logs to generate multi-modal trajectory predictions, allowing autonomous systems to make safer motion planning decisions in complex urban environments.
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
## Model Architecture
|
| 18 |
+
The architecture consists of a **Temporal-Spatial Transformer** encoder-decoder.
|
| 19 |
+
- **Input**: A 2-second history of coordinates, velocities, and headings ($x, y, v, \theta$).
|
| 20 |
+
- **Encoder**: Captures temporal dependencies and social interactions between multiple agents using self-attention.
|
| 21 |
+
- **Decoder**: Employs query-based decoding to produce the next 5 seconds of coordinates at 10Hz.
|
| 22 |
+
- **Optimization**: Minimized using the Average Displacement Error (ADE) and Final Displacement Error (FDE):
|
| 23 |
+
$$ADE = \frac{1}{T} \sum_{t=1}^{T} \sqrt{(\hat{x}_t - x_t)^2 + (\hat{y}_t - y_t)^2}$$
|
| 24 |
+
|
| 25 |
+
## Intended Use
|
| 26 |
+
- **Collision Avoidance**: Providing high-fidelity "what-if" scenarios for reactive braking and steering systems.
|
| 27 |
+
- **Behavior Prediction**: Classifying agent intent (e.g., lane change vs. straight driving) based on trajectory curvature.
|
| 28 |
+
- **Simulation**: Generating realistic agent movements for virtual testing of self-driving software stacks.
|
| 29 |
+
|
| 30 |
+
## Limitations
|
| 31 |
+
- **Occlusions**: If an agent is hidden from sensors for more than 500ms, the model's uncertainty increases drastically.
|
| 32 |
+
- **Unseen Environments**: Performance may degrade in regions with traffic rules significantly different from the training data (e.g., changing from right-hand to left-hand traffic).
|
| 33 |
+
- **Physical Constraints**: Without a post-processing kinematic filter, the model might occasionally predict physically impossible movements (e.g., instantaneous 90-degree turns at high speed).
|