Buckets:
Title: Data-Driven Traffic Simulation for an Intersection in a Metropolis
URL Source: https://arxiv.org/html/2408.00943
Markdown Content: Chengbo Zang, Mehmet Kerem Turkcan, Gil Zussman, Javad Ghaderi, Zoran Kostic
Electrical Engineering, Columbia University
{cz2678, mkt2126, gil.zussman, jg3465, zk2172}@columbia.edu
Abstract
We present a novel data-driven simulation environment for modeling traffic in metropolitan street intersections. Using real-world tracking data collected over an extended period of time, we train trajectory forecasting models to learn agent interactions and environmental constraints that are difficult to capture conventionally. Trajectories of new agents are first coarsely generated by sampling from the spatial and temporal generative distributions, then refined using state-of-the-art trajectory forecasting models. The simulation can run either autonomously, or under explicit human control conditioned on the generative distributions. We present the experiments for a variety of model configurations. Under an iterative prediction scheme, the way-point-supervised TrajNet++ model obtained 0.36 0.36 0.36 0.36 Final Displacement Error (FDE) in 20 20 20 20 FPS on an NVIDIA A100 GPU.
1 Introduction
Accurate modeling and reconstruction of traffic flows in simulation environments is important for solving transportation problems in modern cities [10]. Simulation of traffic trajectories within intersections of a metropolis involves consideration of realistic car movements, human decisions and interactions, environmental constraints, and various forms of social regulations.
Conventional simulation systems are often built bottom-up where the state space, rules of interactions, and policies are unambiguously defined beforehand. This can be challenging given the complex nature of real-world applications. Moreover, most existing simulation systems target traffic flow control and optimization, while lacking realistic fine-grained details of interactions between traffic participants. It is also quite challenging for such systems to model human decisions, where the behavior of each agent can be spontaneous, or affected by other agents as well as environmental constraints in the scene.
To address these challenges, this study uses a data-driven approach leveraging the data acquired from a real traffic intersection situated in a busy urban environment. We utilize statistical priors and deep-learning-based trajectory forecasting models to capture the complex dynamics of traffic participants in real-world scenarios.
(a)Collected trajectories
(b)Categorized trajectories
(c)Coarse prior trajectories
(d)Refined trajectories
Figure 1: Overall workflow of agent generation.(a) Real-world trajectories collected from the intersection (vehicles in purple and pedestrians in orange). (b) Examples of different types of trajectories categorized by GMMs. (c) Coarse way-points sampled from GMMs and interpolated prior trajectories. (d) Final trajectories refined by deep forecasting models compared to the coarsely sampled prior trajectories in (c).
2 Related Work
Traffic Simulation. Conventional traffic modeling methods evolved largely from statistical physics [7]. These methods require heavy simplification assumptions and precise rule definitions. Modern approaches involve Deep Reinforcement Learning [21], evolutionary algorithms [24], or other state-space models [2]. However, these techniques often struggle in real-world scenarios due to the intractable size of states and policies. Most simulation systems focus on vehicle flows and exclude the role of pedestrians. More recent work includes the modeling of vehicle-pedestrian interaction such as Social Force Model [4], which has been adopted for the prediction of pedestrian motions [15, 33].
Trajectory Forecasting. Deep Neural Networks (DNNs) are used for predicting future motions of pedestrians and vehicles [28]. The key architectural component is often a sequential model (e.g. Recurrent Neural Network or Transformer) which autoregressively generates future predictions based on past observations [3, 12]. Some models take a generative approach and predict the embeddings of future trajectories from latent distributions to account for varying data patterns or noise using Generative Adversarial Networks (GANs) [14, 19], Generative Adversarial Imitation Learning (GAIL) [6, 8] or Conditional Variational Auto-Encoders (CVAEs) [33, 30, 23, 5]. Specially designed modules are introduced when modeling interactions in multi-agent scenarios by pooling [3, 13], attention operation [29, 12], or Graph Neural Networks [20, 30, 23]. Many architectures choose to incorporate auxiliary supervision using coarse way-points [22, 9] or final destinations [17, 31] of agent trajectories to boost model performances.
3 Method
3.1 Data Collection
We utilize a high-elevation camera overlooking a metropolis intersection. We fine-tuned a YOLOv8 object detection model [16] for pedestrians and vehicles, then collected real-world trajectory data under the tracking-by-detection paradigm featuring the BoT-SORT algorithm [1].
To underline the entry and exit locations for each agent for statistical analysis, we pre-processed the collected data by filtering out the trajectories that unexpectedly terminate in the middle of the intersection (due to occlusions or failure of the detection-tracking models). The filtered trajectories were then uniformly resampled to align at 30 30 30 30 FPS. Fig.1(a) shows several processed trajectories overlaid on top of each other. Details about the dataset are described in Sec.4.1.
3.2 Statistical Analysis
The distributions of pedestrian and vehicle trajectories exhibit clear dependencies both spatially and temporally (Fig.1(b) and Fig.2). It is intuitive to model them using conditional generative models, where the new agents would be generated by sampling from the distributions during simulation. At this stage of the study, we adopted Gaussian Mixture Model (GMM) for this purpose. We will explore models such as conditional GANs [25] or CVAEs [9] in future studies.
Figure 2: Distribution of agent densities over 24 24 24 24 hours. The x 𝑥 x italic_x-axis is the ToD and the y 𝑦 y italic_y-axis is the hourly average pedestrian and vehicle counts.
Temporal Agent Density.Fig.2 gives the distribution of agent densities traveling through the intersection over different time-of-day (ToD). The bars show the collected number of agents while the dashed lines delineate the fitted pedestrian and vehicle frequencies, respectively. The x 𝑥 x italic_x-axis is shifted to begin at 8:00 and end at 7:00 the next day for better interpretability. We assume that the ToD when agents enter the intersection is centered around a few peak hours (e.g. getting to work during the daylight or returning home at nighttime) and fit the mean pedestrian and vehicle densities using two GMMs (with 4 4 4 4 components for pedestrians and 3 3 3 3 components for vehicles, values determined by experiments). We denote their time-dependent distribution by
N t∼p tod(N∣t),similar-to subscript 𝑁 𝑡 subscript 𝑝 𝑡 𝑜 𝑑 conditional 𝑁 𝑡 N_{t}\sim p_{tod}(N\mid t),italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_t italic_o italic_d end_POSTSUBSCRIPT ( italic_N ∣ italic_t ) ,(1)
where N t subscript 𝑁 𝑡 N_{t}italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the total number of agents at time t 𝑡 t italic_t.
Spatial Trajectory Categorization. In the case of urban intersections, the agent trajectories are generally more confined to follow specific patterns dictated by the layout of intersections, traffic rules, social regulations, and environmental constraints [11]. We propose to characterize the trajectory of each agent by: 1) the position and velocity at the point of entry into the intersection 𝒙(0),𝒙′(0)∈ℝ 2 𝒙 0 superscript 𝒙′0 superscript ℝ 2\bm{x}(0),\bm{x}^{\prime}(0)\in\mathbb{R}^{2}bold_italic_x ( 0 ) , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 0 ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT; 2) the position and velocity at its exit from the intersection 𝒙(T),𝒙′(T)∈ℝ 2 𝒙 𝑇 superscript 𝒙′𝑇 superscript ℝ 2\bm{x}(T),\bm{x}^{\prime}(T)\in\mathbb{R}^{2}bold_italic_x ( italic_T ) , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_T ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT; 3) the total time elapsed T∈ℝ 𝑇 ℝ T\in\mathbb{R}italic_T ∈ blackboard_R between its entry and exit; and 4) |𝒦|=20 𝒦 20|\mathcal{K}|=20| caligraphic_K | = 20 way-points sampled evenly along the trajectory 𝒙(𝒦)∈ℝ 2|𝒦|𝒙 𝒦 superscript ℝ 2 𝒦\bm{x}(\mathcal{K})\in\mathbb{R}^{2|\mathcal{K}|}bold_italic_x ( caligraphic_K ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 | caligraphic_K | end_POSTSUPERSCRIPT, with sampling time T/K 𝑇 𝐾 T/K italic_T / italic_K. Thus a vectorized representation of each agent can be given by 𝒛=[𝒙(0),𝒙′(0),𝒙(T),𝒙′(T),𝒙(𝒦),T]∈ℝ 2|𝒦|+9 𝒛 matrix 𝒙 0 superscript 𝒙′0 𝒙 𝑇 superscript 𝒙′𝑇 𝒙 𝒦 𝑇 superscript ℝ 2 𝒦 9\bm{z}=\begin{bmatrix}\bm{x}(0),\bm{x}^{\prime}(0),\bm{x}(T),\bm{x}^{\prime}(T% ),\bm{x}(\mathcal{K}),T\end{bmatrix}\in\mathbb{R}^{2|\mathcal{K}|+9}bold_italic_z = [ start_ARG start_ROW start_CELL bold_italic_x ( 0 ) , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 0 ) , bold_italic_x ( italic_T ) , bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_T ) , bold_italic_x ( caligraphic_K ) , italic_T end_CELL end_ROW end_ARG ] ∈ blackboard_R start_POSTSUPERSCRIPT 2 | caligraphic_K | + 9 end_POSTSUPERSCRIPT. We model the distribution of different types of trajectories using a GMM with M=12 𝑀 12 M=12 italic_M = 12 components fitted respectively for pedestrians and vehicles, denoted as
𝒛∼p gmm(𝒛)=∑m=1 M w m𝒩(𝒛∣𝝁 m,𝚺 m),similar-to 𝒛 subscript 𝑝 𝑔 𝑚 𝑚 𝒛 superscript subscript 𝑚 1 𝑀 subscript 𝑤 𝑚 𝒩 conditional 𝒛 subscript 𝝁 𝑚 subscript 𝚺 𝑚\bm{z}\sim p_{gmm}(\bm{z})=\sum_{m=1}^{M}w_{m}~{}\mathcal{N}(\bm{z}\mid\bm{\mu% }{m},\bm{\Sigma}{m}),bold_italic_z ∼ italic_p start_POSTSUBSCRIPT italic_g italic_m italic_m end_POSTSUBSCRIPT ( bold_italic_z ) = ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT caligraphic_N ( bold_italic_z ∣ bold_italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ,(2)
where 𝒩(𝝁,𝚺)𝒩 𝝁 𝚺\mathcal{N}(\bm{\mu},\bm{\Sigma})caligraphic_N ( bold_italic_μ , bold_Σ ) is a multivariate Gaussian and w m subscript 𝑤 𝑚 w_{m}italic_w start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is the weight of component m 𝑚 m italic_m. Examples of categorized trajectories from some GMM components are illustrated in Fig.1(b), where each color represents a different GMM component. Pedestrians and vehicles from six different components are plotted in different colors.
3.3 Generation of Prior Trajectories
Algorithm 1 Prior Trajectory Generation Function.
Input:N 𝑁 N italic_N▷▷\triangleright▷ Number of agents
Input:𝒞 𝒞\mathcal{C}caligraphic_C▷▷\triangleright▷ Optional auxiliary conditions
Output:𝒙 pr(1:N)superscript subscript 𝒙 𝑝 𝑟:1 𝑁\bm{x}_{pr}^{(1:N)}bold_italic_x start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 : italic_N ) end_POSTSUPERSCRIPT▷▷\triangleright▷ Generated prior trajectories
1:function PriorGen(
N 𝑁 N italic_N ,
𝒞 𝒞\mathcal{C}caligraphic_C )
2:for
i←1←𝑖 1 i\leftarrow 1 italic_i ← 1 to
N 𝑁 N italic_N do
3:
𝒛 𝒞(i)∼p gmm(⋅∣𝒞)\bm{z}{\mathcal{C}}^{(i)}\sim p{gmm}(\cdot\mid\mathcal{C})bold_italic_z start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_g italic_m italic_m end_POSTSUBSCRIPT ( ⋅ ∣ caligraphic_C ) ▷▷\triangleright▷ Agent sampling
4:
𝒙 pr(i)←Spline(𝒛 𝒞(i))←superscript subscript 𝒙 𝑝 𝑟 𝑖 Spline superscript subscript 𝒛 𝒞 𝑖\bm{x}{pr}^{(i)}\leftarrow\textsc{Spline}(\bm{z}{\mathcal{C}}^{(i)})bold_italic_x start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ← Spline ( bold_italic_z start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ▷▷\triangleright▷ Resampling
5:end for
6:return
𝒙 pr(1:N)superscript subscript 𝒙 𝑝 𝑟:1 𝑁\bm{x}_{pr}^{(1:N)}bold_italic_x start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 : italic_N ) end_POSTSUPERSCRIPT
7:end function
The algorithm for the generation of new agents during simulations is illustrated in Algorithm 1. We start by sampling pedestrians and vehicles from their corresponding GMMs. Auxiliary conditions can be provided to insert more control into the sampling process. For example, if one wishes to sample agents from specific GMM components (i.e. pedestrians or vehicles going in specific directions) in some set 𝒞 𝒞\mathcal{C}caligraphic_C, then the GMM can be modified as
𝒛 𝒞∼p gmm(𝒛∣𝒞)=∑m∈𝒞 w^m𝒩(𝒛∣𝝁 m,𝚺 m),similar-to subscript 𝒛 𝒞 subscript 𝑝 𝑔 𝑚 𝑚 conditional 𝒛 𝒞 subscript 𝑚 𝒞 subscript^𝑤 𝑚 𝒩 conditional 𝒛 subscript 𝝁 𝑚 subscript 𝚺 𝑚\bm{z}{\mathcal{C}}\sim p{gmm}(\bm{z}\mid\mathcal{C})=\sum_{m\in\mathcal{C}}% \hat{w}{m}~{}\mathcal{N}(\bm{z}\mid\bm{\mu}{m},\bm{\Sigma}_{m}),bold_italic_z start_POSTSUBSCRIPT caligraphic_C end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_g italic_m italic_m end_POSTSUBSCRIPT ( bold_italic_z ∣ caligraphic_C ) = ∑ start_POSTSUBSCRIPT italic_m ∈ caligraphic_C end_POSTSUBSCRIPT over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT caligraphic_N ( bold_italic_z ∣ bold_italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ,(3)
where w^m=w m/∑n∈𝒞 w n subscript^𝑤 𝑚 subscript 𝑤 𝑚 subscript 𝑛 𝒞 subscript 𝑤 𝑛\hat{w}{m}=w{m}/\sum_{n\in\mathcal{C}}w_{n}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT / ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_C end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the adjusted component weight. This is exemplified in Sec.4.2.
The sampled 𝒛 𝒛\bm{z}bold_italic_z can serve as good priors that provide high-level control over agent motions. However, this is not fine-grained enough due to the basic limitation that GMMs take no consideration of agent interactions or other environmental constraints. Sec.3.4 describes a deep-learning-based refinement approach.
Note that the sampling times of the way-points T(i)/K superscript 𝑇 𝑖 𝐾 T^{(i)}/K italic_T start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT / italic_K are not uniform across different agents because their trajectories may have drastically different time elapsed T(i)superscript 𝑇 𝑖 T^{(i)}italic_T start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT. Given that 𝒛 𝒛\bm{z}bold_italic_z also contains the position and velocity at both ends, we fit the trajectory with Cubic Splines [32], obtaining a piece-wise interpolating polynomial with time-continuous acceleration. We then evaluate the polynomial with a fixed time interval Δt=0.4 Δ 𝑡 0.4\Delta t=0.4 roman_Δ italic_t = 0.4 s (2.5 2.5 2.5 2.5 FPS), obtaining a prior trajectory 𝒙 pr subscript 𝒙 𝑝 𝑟\bm{x}_{pr}bold_italic_x start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT as inputs to deep-learning-based trajectory forecasting models.
Fig.1(c) illustrates the prior trajectories of several pedestrians and vehicles denoted by dashed lines, where the circles are the interpolating way-points sampled from the GMM components shown in Fig.1(b).
3.4 Deep-Learning-Based Trajectory Refinement
To model agent interactions and other latent patterns in their motions, we adopt the TrajNet++ model [17], a DNN featuring an LSTM and a grid-based pooling module that deals with agent interactions. The model takes L ob=8 subscript 𝐿 𝑜 𝑏 8 L_{ob}=8 italic_L start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT = 8 steps (3.2 3.2 3.2 3.2 s) of past observations to predict L pd=12 subscript 𝐿 𝑝 𝑑 12 L_{pd}=12 italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT = 12 steps (4.8 4.8 4.8 4.8 s) into the future. The model operates in a goal-supervised manner, i.e. the agent positions at the end of the prediction window are also provided to the model as auxiliary inputs. The choice of L ob subscript 𝐿 𝑜 𝑏 L_{ob}italic_L start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT and L pd subscript 𝐿 𝑝 𝑑 L_{pd}italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT in our dataset follows from public benchmarks [26, 18, 27]. Sec.5 presents more experiments comparing different L pd subscript 𝐿 𝑝 𝑑 L_{pd}italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT-s.
At each time-step t 𝑡 t italic_t, we combine L ob subscript 𝐿 𝑜 𝑏 L_{ob}italic_L start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT steps of previous trajectories from all agents in the scene as 𝒙 ob:=𝒙(𝒕 ob)assign subscript 𝒙 𝑜 𝑏 𝒙 subscript 𝒕 𝑜 𝑏\bm{x}{ob}:=\bm{x}(\bm{t}{ob})bold_italic_x start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT := bold_italic_x ( bold_italic_t start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT ) (we use the sampled 𝒙 pr subscript 𝒙 𝑝 𝑟\bm{x}{pr}bold_italic_x start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT in case of newly generated agents with no past predictions), along with the temporal target locations (i.e.goals) of trajectories taken from 𝒙 pr subscript 𝒙 𝑝 𝑟\bm{x}{pr}bold_italic_x start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT at the end of the prediction window 𝒙 tg:=𝒙 pr(t+L pdΔt)assign subscript 𝒙 𝑡 𝑔 subscript 𝒙 𝑝 𝑟 𝑡 subscript 𝐿 𝑝 𝑑 Δ 𝑡\bm{x}{tg}:=\bm{x}{pr}(t+L_{pd}\Delta t)bold_italic_x start_POSTSUBSCRIPT italic_t italic_g end_POSTSUBSCRIPT := bold_italic_x start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT ( italic_t + italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT roman_Δ italic_t ) as model inputs. The model then predicts
𝒙 pd:=𝒙(𝒕 pd)=DNN(𝒙 ob,𝒙 tg),assign subscript 𝒙 𝑝 𝑑 𝒙 subscript 𝒕 𝑝 𝑑 DNN subscript 𝒙 𝑜 𝑏 subscript 𝒙 𝑡 𝑔\bm{x}{pd}:=\bm{x}(\bm{t}{pd})=\text{DNN}\left(\bm{x}{ob},\bm{x}{tg}\right),bold_italic_x start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT := bold_italic_x ( bold_italic_t start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT ) = DNN ( bold_italic_x start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_t italic_g end_POSTSUBSCRIPT ) ,(4)
with
{𝒕 ob=[t−(L ob−1)Δt,…,t]∈ℝ L ob 𝒕 pd=[t+Δt,…,t+L pdΔt]∈ℝ L pd.cases subscript 𝒕 𝑜 𝑏 𝑡 subscript 𝐿 𝑜 𝑏 1 Δ 𝑡…𝑡 superscript ℝ subscript 𝐿 𝑜 𝑏 otherwise subscript 𝒕 𝑝 𝑑 𝑡 Δ 𝑡…𝑡 subscript 𝐿 𝑝 𝑑 Δ 𝑡 superscript ℝ subscript 𝐿 𝑝 𝑑 otherwise\begin{cases}\bm{t}{ob}=\left[t-(L{ob}-1)\Delta t,\dots,t\right]\in\mathbb{R% }^{L_{ob}}\ \bm{t}{pd}=\left[t+\Delta t,\dots,t+L{pd}\Delta t\right]\in\mathbb{R}^{L_{pd% }}\end{cases}.{ start_ROW start_CELL bold_italic_t start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT = [ italic_t - ( italic_L start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT - 1 ) roman_Δ italic_t , … , italic_t ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_italic_t start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT = [ italic_t + roman_Δ italic_t , … , italic_t + italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT roman_Δ italic_t ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_CELL start_CELL end_CELL end_ROW .(5)
The model iteratively takes previous predictions as inputs while being supervised by temporal target locations taken from the priors. Fig.1(d) shows the agent trajectories refined from Fig.1(c). We also explored several other model architectures and supervision schemes in Sec.5.
3.5 Simulation Algorithm
Algorithm 2 Simulation Algorithm.
1:
p tod,p gmm subscript 𝑝 𝑡 𝑜 𝑑 subscript 𝑝 𝑔 𝑚 𝑚 p_{tod},p_{gmm}italic_p start_POSTSUBSCRIPT italic_t italic_o italic_d end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_g italic_m italic_m end_POSTSUBSCRIPT ▷▷\triangleright▷ Distributions
2:
Δt Δ 𝑡\Delta t roman_Δ italic_t ▷▷\triangleright▷ Simulation interval
3:
𝒜 ac←∅←subscript 𝒜 𝑎 𝑐\mathcal{A}_{ac}\leftarrow\emptyset caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT ← ∅ ▷▷\triangleright▷ Set of active agents
4:loop
5:
t←t+Δt←𝑡 𝑡 Δ 𝑡 t\leftarrow t+\Delta t italic_t ← italic_t + roman_Δ italic_t ▷▷\triangleright▷ Generate new agents
6:
N t∼p tod(⋅∣t)N_{t}\sim p_{tod}(\cdot\mid t)italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_t italic_o italic_d end_POSTSUBSCRIPT ( ⋅ ∣ italic_t )
7:if
|𝒜 ac|<N t subscript 𝒜 𝑎 𝑐 subscript 𝑁 𝑡|\mathcal{A}{ac}|<N{t}| caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT | < italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT then
8:
N←N t−|𝒜 ac|←𝑁 subscript 𝑁 𝑡 subscript 𝒜 𝑎 𝑐 N\leftarrow N_{t}-|\mathcal{A}_{ac}|italic_N ← italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - | caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT |
9:
𝒙 pr←PriorGen(N,𝒞)←subscript 𝒙 𝑝 𝑟 PriorGen 𝑁 𝒞\bm{x}_{pr}\leftarrow\textsc{PriorGen}(N,\mathcal{C})bold_italic_x start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT ← PriorGen ( italic_N , caligraphic_C )
10:
𝒜 ac←𝒜 ac∪𝒙 pr←subscript 𝒜 𝑎 𝑐 subscript 𝒜 𝑎 𝑐 subscript 𝒙 𝑝 𝑟\mathcal{A}{ac}\leftarrow\mathcal{A}{ac}\cup\bm{x}_{pr}caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT ← caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT ∪ bold_italic_x start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT
11:end if
12:
𝒙 ob,𝒙 tg←Slice(t,𝒜 ac,L ob,L pd)←subscript 𝒙 𝑜 𝑏 subscript 𝒙 𝑡 𝑔 Slice 𝑡 subscript 𝒜 𝑎 𝑐 subscript 𝐿 𝑜 𝑏 subscript 𝐿 𝑝 𝑑\bm{x}{ob},\bm{x}{tg}\leftarrow\textsc{Slice}(t,\mathcal{A}{ac},L{ob},L_{% pd})bold_italic_x start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_t italic_g end_POSTSUBSCRIPT ← Slice ( italic_t , caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT )
13:
𝒙 pd←Dnn(𝒙 ob,𝒙 tg)←subscript 𝒙 𝑝 𝑑 Dnn subscript 𝒙 𝑜 𝑏 subscript 𝒙 𝑡 𝑔\bm{x}{pd}\leftarrow\textsc{Dnn}(\bm{x}{ob},\bm{x}_{tg})bold_italic_x start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT ← Dnn ( bold_italic_x start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_t italic_g end_POSTSUBSCRIPT ) ▷▷\triangleright▷ DNN refinement
14:for
𝒙(i)superscript 𝒙 𝑖\bm{x}^{(i)}bold_italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT in
𝒜 ac subscript 𝒜 𝑎 𝑐\mathcal{A}_{ac}caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT do
15:
𝒙(i)←Concat(𝒙(i),𝒙 pd(i))←superscript 𝒙 𝑖 Concat superscript 𝒙 𝑖 superscript subscript 𝒙 𝑝 𝑑 𝑖\bm{x}^{(i)}\leftarrow\textsc{Concat}(\bm{x}^{(i)},\bm{x}_{pd}^{(i)})bold_italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ← Concat ( bold_italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT )
16:if
‖𝒙 t(i)−𝒙 T(i)‖<ϵ norm superscript subscript 𝒙 𝑡 𝑖 superscript subscript 𝒙 𝑇 𝑖 italic-ϵ\left|\bm{x}{t}^{(i)}-\bm{x}{T}^{(i)}\right|<\epsilon∥ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∥ < italic_ϵ then▷▷\triangleright▷ Check status
17:
𝒜 ac←𝒜 ac∖𝒙(i)←subscript 𝒜 𝑎 𝑐 subscript 𝒜 𝑎 𝑐 superscript 𝒙 𝑖\mathcal{A}{ac}\leftarrow\mathcal{A}{ac}\setminus\bm{x}^{(i)}caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT ← caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT ∖ bold_italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT
18:end if
19:end for
20:end loop
The simulation algorithm is summarized in Algorithm 2. It maintains a set of active agents 𝒜 ac subscript 𝒜 𝑎 𝑐\mathcal{A}{ac}caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT. At each iteration, we (i) obtain the expected total number of agents N t subscript 𝑁 𝑡 N{t}italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from p tod subscript 𝑝 𝑡 𝑜 𝑑 p_{tod}italic_p start_POSTSUBSCRIPT italic_t italic_o italic_d end_POSTSUBSCRIPT, (ii) generate prior trajectories of new agents 𝒙 pr subscript 𝒙 𝑝 𝑟\bm{x}{pr}bold_italic_x start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT from p gmm subscript 𝑝 𝑔 𝑚 𝑚 p{gmm}italic_p start_POSTSUBSCRIPT italic_g italic_m italic_m end_POSTSUBSCRIPT accordingly, and (iii) add them into 𝒜 ac subscript 𝒜 𝑎 𝑐\mathcal{A}{ac}caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT. Then the observations 𝒙 ob subscript 𝒙 𝑜 𝑏\bm{x}{ob}bold_italic_x start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT and target locations 𝒙 tg subscript 𝒙 𝑡 𝑔\bm{x}{tg}bold_italic_x start_POSTSUBSCRIPT italic_t italic_g end_POSTSUBSCRIPT of all agents are sliced from 𝒜 ac subscript 𝒜 𝑎 𝑐\mathcal{A}{ac}caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT (as in Eq.4) to construct DNN inputs and generate refined trajectories 𝒙 pd subscript 𝒙 𝑝 𝑑\bm{x}{pd}bold_italic_x start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT, which are then concatenated to the historical data in 𝒜 ac subscript 𝒜 𝑎 𝑐\mathcal{A}{ac}caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT. Any agent whose current position 𝒙 t(i):=𝒙(i)(t)assign superscript subscript 𝒙 𝑡 𝑖 superscript 𝒙 𝑖 𝑡\bm{x}{t}^{(i)}:=\bm{x}^{(i)}(t)bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT := bold_italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_t ) reaches its expected destination 𝒙 T(i):=𝒙 pr(i)(T(i))assign superscript subscript 𝒙 𝑇 𝑖 superscript subscript 𝒙 𝑝 𝑟 𝑖 superscript 𝑇 𝑖\bm{x}{T}^{(i)}:=\bm{x}{pr}^{(i)}(T^{(i)})bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT := bold_italic_x start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_T start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) will be considered to have exited the intersection and removed from 𝒜 ac subscript 𝒜 𝑎 𝑐\mathcal{A}{ac}caligraphic_A start_POSTSUBSCRIPT italic_a italic_c end_POSTSUBSCRIPT.
4 Experiments
(a)Potential collision
(b)Collision avoided
(c)Pedestrian outliers
(d)No supervision
(e)Destination supervised
(f)Way-point supervised
Figure 3: Simulation results and experiments.(a) - (b) Controlled simulation of a potential collision and the reaction of the trajectory forecasting model. (c) Outliers identified in pedestrians by thresholding the likelihood of the trajectories. (d) - (f) Comparison of different supervision schemes for TrajNet++. Under the same priors, different results of refined trajectories are given by (d) no supervision, (e) final destination as supervision, and (f) way-points iteratively sampled from the priors as supervision.
4.1 Dataset and Evaluations
The collected data from the intersection were organized to fit different purposes. For object detection and tracking, 13 13 13 13 k annotated images were collected sporadically over 5 5 5 5 years from a high-elevation camera overlooking the intersection. The fine-tuned YOLOv8 obtained 91.6 91.6 91.6 91.6 mAP for pedestrians and 98.7 98.7 98.7 98.7 mAP for vehicles, respectively.
For trajectory forecasting, tracked objects were collected for over 30 30 30 30 days containing time and bounding-box locations for 510 510 510 510 k pedestrians and 250 250 250 250 k vehicles. We uniformly sample 10 10 10 10 k of 20 20 20 20-frame (8 8 8 8 s) scenes and 10 10 10 10 k of 40 40 40 40-frame (16 16 16 16 s) scenes for trajectory forecasting model training and evaluation. Additionally, complete trajectories of 176 176 176 176 k pedestrians and 215 215 215 215 k vehicles were extracted from the collected data for the statistical analysis described in Sec.3.2.
We trained trajectory forecasting models on the 20 20 20 20-frame scenes (with L ob=8,L pd=12 formulae-sequence subscript 𝐿 𝑜 𝑏 8 subscript 𝐿 𝑝 𝑑 12 L_{ob}=8,L_{pd}=12 italic_L start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT = 8 , italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT = 12) using smooth-L1 loss and adopted common performance metrics of Average-Displacement-Error (ADE, the RMSE between the predictions and ground-truths over all agents at all time-steps) and Final-Displacement-Error (FDE, the RMSE over all agents evaluated only at the last time-step of the prediction window) [17]. The models were trained on the standard scenes and evaluated in an iterative prediction scheme (described in Sec.3.4) on the extended scenes to resemble the workflow during actual simulations. TrajNet++ with way-point supervision achieved the most desirable performance of 1.65 1.65 1.65 1.65 ADE and 0.36 0.36 0.36 0.36 FDE, measured in meters. Experiments on different model architectures and configurations are provided in Sec.5.
4.2 Controlled Simulation
The proposed simulation system can coarsely control agent trajectories with Eq.3 in terms of where they enter and exit the intersection as well as a prior trajectory to follow. In Fig.3(a) we purposefully sample south-bound pedestrians and left-turning vehicles whose prior trajectories meet at the middle of the crosswalk (the red ellipse), to see whether the trajectory forecasting model will correctly react to this situation.
As illustrated in Fig.3(b), the trajectory forecasting model forces both the vehicle and the crowd of pedestrians to slow down and deviate from their prior trajectories (denoted by the dashed red lines) to avoid a collision. This is a common practice that respects social norms and is expected to be observed in real-world scenarios.
4.3 Autonomous Simulation
Without inserting auxiliary conditions or other human control, the simulator is able to run autonomously and mimic different agent densities following Eq.1 and spatial locations following Eq.2.
5 Simulation Quality
Models L pd subscript 𝐿 𝑝 𝑑 L_{pd}italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT Goal ADE / FDE (m)FPS LSTM 12-2.34 / 4.25 288 LSTM 32-1.25 / 3.77 355 Trajectron++12-1.43 / 3.63 5 Trajectron++32-2.13 / 4.96 5 TrajNet++12 Wpts.1.65 / 0.36 20 TrajNet++12 Dest.1.92 / 0.51 20 TrajNet++32 Dest.0.59 / 1.21 29
Table 1: Comparison of model performances on the 40 40 40 40-frame (16 16 16 16 s) scenes on an NVIDIA A100. All models take L ob=8 subscript 𝐿 𝑜 𝑏 8 L_{ob}=8 italic_L start_POSTSUBSCRIPT italic_o italic_b end_POSTSUBSCRIPT = 8 frames (3.2 3.2 3.2 3.2 s) of inputs. Some of them predict iteratively (L pd=12 subscript 𝐿 𝑝 𝑑 12 L_{pd}=12 italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT = 12), others predict in one-shot (L pd=32 subscript 𝐿 𝑝 𝑑 32 L_{pd}=32 italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT = 32).
5.1 Outliers
Adopting conditional generative models for trajectory categorization allows for the identification of outliers in the collected trajectories by calculating their likelihoods (Eq.2). In Fig.3(c), we show the outliers in pedestrians whose log-likelihoods are more than 20 20 20 20 times of standard deviations away from the dataset mean. Some of these outliers show a pedestrian making a turn-around; others show a pedestrian staying still in one location for an exceptionally long period of time.
5.2 Model Configuration
Beyond the standard metrics of ADE/FDE calculated over a predefined prediction window length L pd subscript 𝐿 𝑝 𝑑 L_{pd}italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT, the measurement of simulation quality requires more careful considerations. Here we provide a brief discussion on system and model configurations based on current results.
Goal Supervision. We compared the refined trajectories from TrajNet++ before and after adding goal supervision (Fig.3(d)vs.Figs.3(e) and3(f)). Significant improvements in refinement quality can be observed, where agents deviate less from their prior trajectories without external forces. Relavant results are also quantified in Tab.1.
It is worth noting that the supervised models are often trained with fixed-length sequences and the agents are expected to reach their destinations in exactly L pd subscript 𝐿 𝑝 𝑑 L_{pd}italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT steps. This raises considerable issues in real-world deployment, since although it might not be difficult to know the destination of an agent (in our cases they are directly sampled), it can be challenging to know exactly when they will get there. Supervising the model with final destinations (Fig.3(e)) resulted in notable overshoot in cases when an agent needs a longer time window to reach the destination than L pd subscript 𝐿 𝑝 𝑑 L_{pd}italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT, while, in other cases, undershoot when they need a shorter window.
This can be mitigated by substituting the destinations with way-points taken from their prior trajectories at L pd subscript 𝐿 𝑝 𝑑 L_{pd}italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT steps ahead, with results shown in Fig.3(f). Alternating to smooth-L1 loss instead of MSE also dampens the overshoot. Agent motions tend to be more controllable by the statistical model (i.e.p gmm subscript 𝑝 𝑔 𝑚 𝑚 p_{gmm}italic_p start_POSTSUBSCRIPT italic_g italic_m italic_m end_POSTSUBSCRIPT), while being able to react and make deviations when necessary (Sec.4.2).
Choice of L pd subscript 𝐿 𝑝 𝑑 L_{pd}italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT. Using the 40 40 40 40-frame (16 16 16 16 s) scenes, we further compared the performances of predicting iteratively by training the model with L pd=12 subscript 𝐿 𝑝 𝑑 12 L_{pd}=12 italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT = 12 (4.8 4.8 4.8 4.8 s) and taking previous outputs as new inputs, vs. directly training the model to predict in one-shot with L pd=32 subscript 𝐿 𝑝 𝑑 32 L_{pd}=32 italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT = 32 (12.8 12.8 12.8 12.8 s). The results are given in Tab.1. For TrajNet++, Wpts.denotes way-point supervision and Dest.destination supervision, as opposed to LSTM and Trajectron++ using no supervisions.
By comparison, the destination-supervised TrajNet++ model achieved the lowest ADE of 0.59 0.59 0.59 0.59, while the way-point supervised version had a higher ADE but the lowest FDE of 0.36 0.36 0.36 0.36. Higher FPS was generally obtained under larger L pd subscript 𝐿 𝑝 𝑑 L_{pd}italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT due to fewer operations beyond model inference (e.g. data preparation). Considering the aforementioned complexities of choosing appropriate L pd subscript 𝐿 𝑝 𝑑 L_{pd}italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT and applying destination supervision in practice, it is thus reasonable to use way-point supervision with shorter L pd=12 subscript 𝐿 𝑝 𝑑 12 L_{pd}=12 italic_L start_POSTSUBSCRIPT italic_p italic_d end_POSTSUBSCRIPT = 12 during actual applications, giving our chosen model for the experiments in Sec.4.
6 Conclusion and Future Work
In this study, we propose a data-driven methodology for simulating the movement (trajectories) of agents within an intersection in a metropolis. We show that trajectory forecasting models are able to realistically govern agent motions under proper supervision by the statistical priors. The TrajNet++ model with way-point supervision was able strike a balance between the length of the prediction window and overall simulation quality by performing the predictions iteratively, achieving an FDE of 0.36 0.36 0.36 0.36 under controlled experiments. However, we note that the presented models were trained and evaluated within a single traffic intersection, raising reasonable concerns on potential overfitting since traffic conditions may vary drastically across different locations. More comprehensive evaluation is needed to address the issue.
Future work will include (a) evaluation of alternative trajectory forecasting architectures and configurations, (b) incorporation of a larger number of intersections and more diverse traffic scenarios for better generalization, (c) exploration of other potential cases of agent interactions under controlled simulation, and (d) investigations on how to connect broader aspects of applications (e.g. collision alert, traffic light control, and more efficient deployment). We intend to incorporate the model with graphics engines where we can reconstruct the traffic scenarios of the intersection in the digital world.
Acknowledgements
This work was supported in part by NSF grants CNS-1827923 and EEC-2133516, NSF grant CNS-2038984 and corresponding support from the Federal Highway Administration (FHA), NSF grant CNS-2148128 and by funds from federal agency and industry partners as specified in the Resilient & Intelligent NextG Systems (RINGS) program, and ARO grant W911NF2210031.
References
- [1] Nir Aharon, Roy Orfaig, and Ben-Zion Bobrovsky. BoT-SORT: Robust associations multi-pedestrian tracking.
- [2] Mourad Ahmane, Abdeljalil Abbas-Turki, Florent Perronnet, Jia Wu, Abdellah El Moudni, Jocelyn Buisson, and Renan Zeo. Modeling and controlling an isolated urban intersection based on cooperative vehicles. 28:44–62.
- [3] Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. Social LSTM: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- [4] Bani Anvari. A mathematical model for driver and pedestrian interaction in shared space environments. Publisher: [object Object].
- [5] Apratim Bhattacharyya, Michael Hanselmann, Mario Fritz, Bernt Schiele, and Christoph-Nikolas Straehle. Conditional flow variational autoencoders for structured sequence prediction.
- Choi et al. [2021] Seongjin Choi, Jiwon Kim, and Hwasoo Yeo. TrajGAIL: Generating Urban Vehicle Trajectories using Generative Adversarial Imitation Learning. Transportation Research Part C: Emerging Technologies, 128:103091, 2021. arXiv:2007.14189 [cs, stat].
- [7] D Chowdhury. Statistical physics of vehicular traffic and some related systems. 329(4):199–329.
- Da and Wei [2023] Longchao Da and Hua Wei. CrowdGAIL: A spatiotemporal aware method for agent navigation. Electronic Research Archive, 31(2):1134–1146, 2023.
- [9] Mohamed Debbagh. Learning structured output representations from attributes using deep conditional generative models.
- [10] Sergey Dorokhin, Alexander Artemov, Dmitry Likhachev, Alexey Novikov, and Evgeniy Starkov. Traffic simulation: an analytical review. 918(1):012058.
- [11] Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. Large scale interactive motion forecasting for autonomous driving : The waymo open motion dataset.
- [12] Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud Ben Amor, Éloi Zablocki, Matthieu Cord, and Alexandre Alahi. UniTraj: A unified framework for scalable vehicle trajectory prediction.
- [13] Ke Guo, Wenxi Liu, and Jia Pan. End-to-end trajectory distribution prediction based on occupancy grid maps.
- [14] Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. Social GAN: Socially acceptable trajectories with generative adversarial networks.
- [15] Dirk Helbing and Peter Molnar. Social force model for pedestrian dynamics. 51(5):4282–4286.
- [16] Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics YOLO.
- [17] Parth Kothari, Sven Kreiss, and Alexandre Alahi. Human trajectory forecasting in crowds: A deep learning perspective.
- [18] Alon Lerner, Yiorgos Chrysanthou, and Dani Lischinski. Crowds by example. 26(3):655–664.
- Li et al. [a] Jiachen Li, Hengbo Ma, and Masayoshi Tomizuka. Conditional generative neural system for probabilistic trajectory prediction, a.
- Li et al. [b] Jiachen Li, Fan Yang, Masayoshi Tomizuka, and Chiho Choi. EvolveGraph: Multi-agent trajectory prediction with dynamic relational reasoning, b.
- [21] Hanlin Liao. Urban intersection simulation and verification via deep reinforcement learning algorithms. 2435(1):012019.
- [22] Karttikeya Mangalam, Yang An, Harshayu Girase, and Jitendra Malik. From goals, waypoints & paths to long term human trajectory forecasting.
- [23] Ray Coden Mercurius, Ehsan Ahmadi, Soheil Mohamad Alizadeh Shabestary, and Amir Rasouli. AMEND: A mixture of experts framework for long-tailed trajectory prediction.
- [24] Adriana Simona Mihaita, Mauricio Camargo, and Pascal Lhoste. Optimization of a complex urban intersection using discrete event simulation and evolutionary algorithms. In International Federation of Automatic Control World Congress (IFAC WC 2014), Cape Town, Africa, pages 24–29.
- [25] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets.
- [26] S Pellegrini, A Ess, K Schindler, and L Van Gool. You’ll never walk alone: Modeling social behavior for multi-target tracking. In 2009 IEEE 12th International Conference on Computer Vision, pages 261–268. IEEE.
- [27] Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. Learning social etiquette: Human trajectory understanding in crowded scenes. In European Conference on Computer Vision.
- [28] Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M Kitani, Dariu M Gavrila, and Kai O Arras. Human motion trajectory prediction: a survey. 39(8):895–935.
- [29] Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, S.Hamid Rezatofighi, and Silvio Savarese. SoPhie: An attentive GAN for predicting paths compliant to social and physical constraints.
- [30] Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and Marco Pavone. Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data.
- [31] Chuhua Wang, Yuchen Wang, Mingze Xu, and David J. Crandall. Stepwise goal-driven networks for trajectory prediction. 7(2):2716–2723.
- [32] Yishen Guan, K. Yokoi, O. Stasse, and A. Kheddar. On robotic trajectory planning using polynomial interpolations. In 2005 IEEE International Conference on Robotics and Biomimetics - ROBIO, pages 111–116. IEEE.
- [33] Jiangbei Yue, Dinesh Manocha, and He Wang. Human trajectory prediction via neural social physics.
Xet Storage Details
- Size:
- 54.6 kB
- Xet hash:
- 035521bbd21ef11c142f8049f954f6a460a0f72c147d52fbbae7f85b6b75b843
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.










