Buckets:
Title: A Multi-Branched Radial Basis Network Approach to Predicting Complex Chaotic Behaviours
URL Source: https://arxiv.org/html/2404.00618
Published Time: Fri, 31 May 2024 00:39:27 GMT
Markdown Content:
Abstract
In this study, we propose a multi branched network approach to predict the dynamics of a physics attractor characterized by intricate and chaotic behavior. We introduce a unique neural network architecture comprised of Radial Basis Function (RBF) layers combined with an attention mechanism designed to effectively capture nonlinear inter-dependencies inherent in the attractor’s temporal evolution. Our results demonstrate successful prediction of the attractor’s trajectory across 100 predictions made using a real-world dataset of 36,700 time-series observations encompassing approximately 28 minutes of activity. To further illustrate the performance of our proposed technique, we provide comprehensive visualizations depicting the attractor’s original and predicted behaviors alongside quantitative measures comparing observed versus estimated outcomes. Overall, this work showcases the potential of advanced machine learning algorithms in elucidating hidden structures in complex physical systems while offering practical applications in various domains requiring accurate short-term forecasting capabilities.
1 Introduction
In traditional mathematics, a radial basis function is a function that is based on the distance between the input and a specified point, such as the origin or a center point. A radial function is any function that meets this property [1].
A radial function is a function φ:[0,∞)→ℝ:𝜑→0 ℝ\varphi:[0,\infty)\to\mathbb{R}italic_φ : [ 0 , ∞ ) → blackboard_R. When paired with a metric on a vector space ∥⋅∥:V→[0,∞)|\cdot|:V\to[0,\infty)∥ ⋅ ∥ : italic_V → [ 0 , ∞ ) a function φ 𝐜=φ(‖𝐱−𝐜‖)subscript 𝜑 𝐜 𝜑 norm 𝐱 𝐜\varphi_{\mathbf{c}}=\varphi(|\mathbf{x}-\mathbf{c}|)italic_φ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT = italic_φ ( ∥ bold_x - bold_c ∥ ) is said to be a radial kernel centered at 𝐜 𝐜\mathbf{c}bold_c. A Radial function and the associated radial kernels are said to be radial basis functions if, for any set of nodes {𝐱 k}k=1 n superscript subscript subscript 𝐱 𝑘 𝑘 1 𝑛{\mathbf{x}{k}}{k=1}^{n}{ bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT:
- •The kernels φ 𝐱 1,φ 𝐱 2,…,φ 𝐱 n subscript 𝜑 subscript 𝐱 1 subscript 𝜑 subscript 𝐱 2…subscript 𝜑 subscript 𝐱 𝑛\varphi_{\mathbf{x}{1}},\varphi{\mathbf{x}{2}},\dots,\varphi{\mathbf{x}_{n}}italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT are linearly independent (for example φ(r)=r 2 𝜑 𝑟 superscript 𝑟 2\varphi(r)=r^{2}italic_φ ( italic_r ) = italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in V=ℝ 𝑉 ℝ V=\mathbb{R}italic_V = blackboard_R is not a radial basis function).
- •The kernels φ 𝐱 1,φ 𝐱 2,…,φ 𝐱 n subscript 𝜑 subscript 𝐱 1 subscript 𝜑 subscript 𝐱 2…subscript 𝜑 subscript 𝐱 𝑛\varphi_{\mathbf{x}{1}},\varphi{\mathbf{x}{2}},\dots,\varphi{\mathbf{x}_{n}}italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT form a basis for a Haar Space, meaning that the interpolation matrix
[φ(‖𝐱 1−𝐱 1‖)φ(‖𝐱 2−𝐱 1‖)…φ(‖𝐱 n−𝐱 1‖)φ(‖𝐱 1−𝐱 2‖)φ(‖𝐱 2−𝐱 2‖)…φ(‖𝐱 n−𝐱 2‖)⋮⋮⋱⋮φ(‖𝐱 1−𝐱 n‖)φ(‖𝐱 2−𝐱 n‖)…φ(‖𝐱 n−𝐱 n‖)]matrix 𝜑 norm subscript 𝐱 1 subscript 𝐱 1 𝜑 norm subscript 𝐱 2 subscript 𝐱 1…𝜑 norm subscript 𝐱 𝑛 subscript 𝐱 1 𝜑 norm subscript 𝐱 1 subscript 𝐱 2 𝜑 norm subscript 𝐱 2 subscript 𝐱 2…𝜑 norm subscript 𝐱 𝑛 subscript 𝐱 2⋮⋮⋱⋮𝜑 norm subscript 𝐱 1 subscript 𝐱 𝑛 𝜑 norm subscript 𝐱 2 subscript 𝐱 𝑛…𝜑 norm subscript 𝐱 𝑛 subscript 𝐱 𝑛\begin{bmatrix}\varphi(|\mathbf{x}{1}-\mathbf{x}{1}|)&\varphi(|\mathbf{x}% {2}-\mathbf{x}{1}|)&\dots&\varphi(|\mathbf{x}{n}-\mathbf{x}{1}|)\ \varphi(|\mathbf{x}{1}-\mathbf{x}{2}|)&\varphi(|\mathbf{x}{2}-\mathbf{x}% {2}|)&\dots&\varphi(|\mathbf{x}{n}-\mathbf{x}{2}|)\ \vdots&\vdots&\ddots&\vdots\ \varphi(|\mathbf{x}{1}-\mathbf{x}{n}|)&\varphi(|\mathbf{x}{2}-\mathbf{x}% {n}|)&\dots&\varphi(|\mathbf{x}{n}-\mathbf{x}{n}|)\ \end{bmatrix} start_ARG start_ROW start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL … end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ) end_CELL end_ROW start_ROW start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL … end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL … end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ) end_CELL end_ROW end_ARG
Frequently utilized varieties of radial basis functions consist of:
- •Gaussian RBF:
φ(r)=exp(−r 2 2σ 2)𝜑 𝑟 superscript 𝑟 2 2 superscript 𝜎 2\varphi(r)=\exp\left(-\frac{r^{2}}{2\sigma^{2}}\right)italic_φ ( italic_r ) = roman_exp ( - divide start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )
where r 𝑟 r italic_r is the distance between the input point and the center, and σ 𝜎\sigma italic_σ is a parameter controlling the width of the Gaussian.
- •Multiquadric RBF:
φ(r)=1+(r σ)2 𝜑 𝑟 1 superscript 𝑟 𝜎 2\varphi(r)=\sqrt{1+\left(\frac{r}{\sigma}\right)^{2}}italic_φ ( italic_r ) = square-root start_ARG 1 + ( divide start_ARG italic_r end_ARG start_ARG italic_σ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
where r 𝑟 r italic_r is the distance between the input point and the center, and σ 𝜎\sigma italic_σ is a parameter controlling the shape of the function.
- •Inverse Multiquadric RBF:
φ(r)=1 1+(r σ)2 𝜑 𝑟 1 1 superscript 𝑟 𝜎 2\varphi(r)=\frac{1}{\sqrt{1+\left(\frac{r}{\sigma}\right)^{2}}}italic_φ ( italic_r ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 1 + ( divide start_ARG italic_r end_ARG start_ARG italic_σ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG
where r 𝑟 r italic_r is the distance between the input point and the center, and σ 𝜎\sigma italic_σ is a parameter controlling the shape of the function.
- •Thin Plate Spline RBF:
φ(r)=r 2log(r)𝜑 𝑟 superscript 𝑟 2 𝑟\varphi(r)=r^{2}\log(r)italic_φ ( italic_r ) = italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_r )
where r 𝑟 r italic_r is the distance between the input point and the center.
Imagine a ball rolling around a landscape with hills and valleys. An attractor acts like the bottom of a valley. Regardless of where you place the ball on the landscape (starting conditions), if it rolls downhill long enough, it will eventually settle at the valley’s bottom (the attractor). This signifies that the system (the ball) tends towards a specific set of values (the valley’s position) over time. Thus, formally defining an attractor involves identifying a group of numeric values that a system naturally gravitates towards, irrespective of its initial parameters.
Mathematical defintion of an attractor:
Let t 𝑡 t italic_t represent time and let f(t,⋅)𝑓 𝑡⋅f(t,\cdot)italic_f ( italic_t , ⋅ ) be a function specifying the dynamics of the system. If a 𝑎 a italic_a is a point in an n 𝑛 n italic_n-dimensional phase space, representing the initial state of the system, then f(0,a)=a 𝑓 0 𝑎 𝑎 f(0,a)=a italic_f ( 0 , italic_a ) = italic_a, and for a positive value of t 𝑡 t italic_t, f(t,a)𝑓 𝑡 𝑎 f(t,a)italic_f ( italic_t , italic_a ) is the result of the evolution of this state after t 𝑡 t italic_t units of time. For example, if the system describes the evolution of a free particle in one dimension, then the phase space is the plane ℝ 2 superscript ℝ 2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with coordinates (x,v)𝑥 𝑣(x,v)( italic_x , italic_v ), where x 𝑥 x italic_x is the position of the particle, v 𝑣 v italic_v is its velocity, a=(x,v)𝑎 𝑥 𝑣 a=(x,v)italic_a = ( italic_x , italic_v ), and the evolution is given by
f(t,(x,v))=(x+tv,v).𝑓 𝑡 𝑥 𝑣 𝑥 𝑡 𝑣 𝑣 f(t,(x,v))=(x+tv,v).italic_f ( italic_t , ( italic_x , italic_v ) ) = ( italic_x + italic_t italic_v , italic_v ) .
An attractor is a subset A 𝐴 A italic_A of the phase space characterized by the following three conditions:
- 1.A 𝐴 A italic_A is forward invariant under f 𝑓 f italic_f: if a 𝑎 a italic_a is an element of A 𝐴 A italic_A, then so is f(t,a)𝑓 𝑡 𝑎 f(t,a)italic_f ( italic_t , italic_a ) for all t>0 𝑡 0 t>0 italic_t > 0.
- 2.There exists a neighborhood of A 𝐴 A italic_A, called the basin of attraction for A 𝐴 A italic_A and denoted B(A)𝐵 𝐴 B(A)italic_B ( italic_A ), which consists of all points b 𝑏 b italic_b that "enter" A 𝐴 A italic_A in the limit t→∞→𝑡 t\to\infty italic_t → ∞. More formally, B(A)𝐵 𝐴 B(A)italic_B ( italic_A ) is the set of all points b 𝑏 b italic_b in the phase space with the following property: For any open neighborhood N 𝑁 N italic_N of A 𝐴 A italic_A, there is a positive constant T 𝑇 T italic_T such that f(t,b)∈N 𝑓 𝑡 𝑏 𝑁 f(t,b)\in N italic_f ( italic_t , italic_b ) ∈ italic_N for all real t>T 𝑡 𝑇 t>T italic_t > italic_T.
- 3.There is no proper (non-empty) subset of A 𝐴 A italic_A having the first two properties.
Since the basin of attraction contains an open set containing A 𝐴 A italic_A, every point that is sufficiently close to A 𝐴 A italic_A is attracted to A 𝐴 A italic_A. The definition of an attractor uses a metric on the phase space, but the resulting notion usually depends only on the topology of the phase space [4]. In the case of ℝ n superscript ℝ 𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the Euclidean norm [5] is typically used, which is defined as
‖x‖=x 1 2+x 2 2+⋯+x n 2 norm x superscript subscript 𝑥 1 2 superscript subscript 𝑥 2 2⋯superscript subscript 𝑥 𝑛 2|\textbf{x}|=\sqrt{x_{1}^{2}+x_{2}^{2}+\dots+x_{n}^{2}}∥ x ∥ = square-root start_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯ + italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
. Using these concepts we propose implementing a multi-branched radial basis neural network to help predict the chaotic and random behaviours of an attractor.
2 Related Work
Radial Basis networks have been extensively studied and proven effective in various classification tasks [6][7]. They offer a versatile framework for pattern recognition and data analysis, leveraging the flexibility of radial basis functions to model complex relationships within datasets. By capturing the intricate dynamics and nonlinear interactions inherent in real-world phenomena, Radial Basis networks contribute to advancing our understanding of complex systems and facilitating informed decision-making in fields ranging from communication systems [8][9] to computational biology [10][11].
While RBF layers offer valuable capabilities in certain modeling tasks, they alone may not be sufficient for capturing the rich dynamics and predicting chaotic and random behaviors in attractors. To address the complexities inherent in chaotic systems, more sophisticated and adaptable modeling approaches are required, which may involve combining RBF layers with other architectural components and techniques tailored to the specific characteristics of chaotic dynamics.
Attention mechanisms [12] have emerged as powerful tools in the realm of neural networks, offering sophisticated mechanisms for selectively focusing on relevant parts of input data while suppressing irrelevant information. Originally inspired by human cognitive processes, attention mechanisms have found widespread applications in various domains, including natural language processing, computer vision, and sequential data modeling.
3 Dataset
We use a pre existing kaggle dataset [13]. This dataset comprises time series data originating from an unidentified physics attractor, synthesized through undisclosed governing rules. Manifesting intricate and chaotic dynamics, the attractor presents a challenge for analysis.
The dataset encompasses 36,700 data points, each delineating the positions of two points in a two-dimensional space at distinct time intervals. Collected over approximately 28 minutes, the dataset offers insights into the attractor’s behavior over time. Notably, the system undergoes periodic resets, typically occurring upon reentry into a recurring loop. Table 1 shows the different variables in the dataset.
Table 1: Description of variables present in the dataset.
4 Methodology
It defines the network architecture consisting of several components:
Branches: Three separate branches are utilized, each focusing on learning the relationship between a specific pair of input columns.
- •An RBFLayer: Performs the Radial Basis Function transformation on the input data; customizable parameters include the number of kernels (K)𝐾(K)( italic_K ), output features (F o)subscript 𝐹 o(F_{\text{o}})( italic_F start_POSTSUBSCRIPT o end_POSTSUBSCRIPT ), radial function (φ)𝜑(\varphi)( italic_φ ), norm function ∥⋅∥|\cdot|∥ ⋅ ∥, and normalization option (normalize). We use the inverse_multiquadric function• ‣ 1 and Euclidean norm1.
- •Dropout layer: Introduced with a probability of 0.3 0.3 0.3 0.3 to mitigate overfitting.
- •AttentionLayer: Focuses on significant portions of the transformed data within the branch.
- •Linear layers with ReLU(x)=max{0,x}ReLU 𝑥 0 𝑥\operatorname{ReLU}(x)=\max{0,x}roman_ReLU ( italic_x ) = roman_max { 0 , italic_x } and tanh(x)=e x−e−x e x+e−x tanh 𝑥 superscript 𝑒 𝑥 superscript 𝑒 𝑥 superscript 𝑒 𝑥 superscript 𝑒 𝑥\operatorname{tanh}(x)=\displaystyle\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}roman_tanh ( italic_x ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT end_ARG activation functions for additional feature extraction and transformation.
Merging Layer: Following the processing of each pair of columns within their respective branches, the outputs are concatenated. A linear layer with a ReLU activation function integrates the combined information.
Output Layer: A final linear layer with an output size of 3 3 3 3 projects the merged features onto the desired three-dimensional prediction.
Denote x∈ℝ 3 𝑥 superscript ℝ 3 x\in\mathbb{R}^{3}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT as the input vector having three features. Let y^∈ℝ 3^𝑦 superscript ℝ 3{\hat{y}\in\mathbb{R}^{3}}over^ start_ARG italic_y end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT denote the output of the model. Each branch accepts a pair of input features represented as x i,x j∈{1,2,3}subscript 𝑥 𝑖 subscript 𝑥 𝑗 1 2 3 x_{i},x_{j}\in{1,2,3}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ { 1 , 2 , 3 }, satisfying i≠j 𝑖 𝑗{i\neq j}italic_i ≠ italic_j.
The forward function governs the data flow through the network:
- •Input Splitting: Separation of the input data x 𝑥 x italic_x into three distinct columns, representing the features: x=(x 1,x 2,x 3)𝑥 subscript 𝑥 1 subscript 𝑥 2 subscript 𝑥 3 x=(x_{1},x_{2},x_{3})italic_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ).
- •Branch Processing: Feeding each pair of columns into the assigned branch (branch1, branch2, or branch3); subsequently processed through their constituent layers, yielding an output per pair.
- •Output Concatenation: The individual branch outputs, namely (out 1,out 2,out 3)𝑜 𝑢 subscript 𝑡 1 𝑜 𝑢 subscript 𝑡 2 𝑜 𝑢 subscript 𝑡 3(out_{1},out_{2},out_{3})( italic_o italic_u italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_o italic_u italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_o italic_u italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), undergo concatenation along the feature dimension.
- •Merging: Transmission of the concatenated outputs through the merging layer produces a unified representation.
- •Prediction: Applying the merged features to the ultimate output layer yields the three-dimensional prediction (y^1,y^2,y^3)subscript^𝑦 1 subscript^𝑦 2 subscript^𝑦 3(\hat{y}{1},\hat{y}{2},\hat{y}_{3})( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ).
Figure 1: Proposed multi layer architecture
Figure 2: Single Sequential Proposed Layer
This design enables the model to discern specific relationships amongst diverse input feature pairs while combining the learned features through the attention mechanism and merging stages for delivering the final prediction.
5 Training
We train the model on a singular NVIDIA A30 GPU. It takes 2 hours to train the model for 2000 epochs with a batch_size=512. We use the Mean Squared Error (MSE) loss function as our criterion:
MSE(𝒚^,𝒚)=1 N∑i=1 N(y^i−y i)2 MSE^𝒚 𝒚 1 𝑁 superscript subscript 𝑖 1 𝑁 superscript subscript^𝑦 𝑖 subscript 𝑦 𝑖 2\mathrm{MSE}(\hat{\boldsymbol{y}},\boldsymbol{y})=\frac{1}{N}\sum_{i=1}^{N}% \left(\hat{y}{i}-y{i}\right)^{2}roman_MSE ( over^ start_ARG bold_italic_y end_ARG , bold_italic_y ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Where 𝒚^^𝒚\hat{\boldsymbol{y}}over^ start_ARG bold_italic_y end_ARG represents the predicted values, 𝒚 𝒚\boldsymbol{y}bold_italic_y represents the actual target values, and N 𝑁 N italic_N is the total number of samples. The MSE computes the average of the squared differences between predicted and actual values, providing a measure of the model’s performance in minimizing prediction errors. We utilize Adam[14] optimizer for our model:
m t subscript 𝑚 𝑡\displaystyle m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT=β 1⋅m t−1+(1−β 1)⋅g t,absent⋅subscript 𝛽 1 subscript 𝑚 𝑡 1⋅1 subscript 𝛽 1 subscript 𝑔 𝑡\displaystyle=\beta_{1}\cdot m_{t-1}+(1-\beta_{1})\cdot g_{t},= italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_m start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⋅ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , v t subscript 𝑣 𝑡\displaystyle v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT=β 2⋅v t−1+(1−β 2)⋅g t 2,absent⋅subscript 𝛽 2 subscript 𝑣 𝑡 1⋅1 subscript 𝛽 2 superscript subscript 𝑔 𝑡 2\displaystyle=\beta_{2}\cdot v_{t-1}+(1-\beta_{2})\cdot g_{t}^{2},= italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⋅ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , m^t subscript^𝑚 𝑡\displaystyle\hat{m}{t}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT=m t 1−β 1 t,absent subscript 𝑚 𝑡 1 superscript subscript 𝛽 1 𝑡\displaystyle=\frac{m{t}}{1-\beta_{1}^{t}},= divide start_ARG italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG , v^t subscript^𝑣 𝑡\displaystyle\hat{v}{t}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT=v t 1−β 2 t,absent subscript 𝑣 𝑡 1 superscript subscript 𝛽 2 𝑡\displaystyle=\frac{v{t}}{1-\beta_{2}^{t}},= divide start_ARG italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG , θ t+1 subscript 𝜃 𝑡 1\displaystyle\theta_{t+1}italic_θ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT=θ t−η v^t+ϵ⋅m^t,absent subscript 𝜃 𝑡⋅𝜂 subscript^𝑣 𝑡 italic-ϵ subscript^𝑚 𝑡\displaystyle=\theta_{t}-\frac{\eta}{\sqrt{\hat{v}{t}}+\epsilon}\cdot\hat{m}% {t},= italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - divide start_ARG italic_η end_ARG start_ARG square-root start_ARG over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + italic_ϵ end_ARG ⋅ over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,
Where m t subscript 𝑚 𝑡 m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and v t subscript 𝑣 𝑡 v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are the first and second moment estimates, g t subscript 𝑔 𝑡 g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the gradient, β 1 subscript 𝛽 1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and β 2 subscript 𝛽 2\beta_{2}italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are the exponential decay rates for the moment estimates, m^t subscript^𝑚 𝑡\hat{m}{t}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and v^t subscript^𝑣 𝑡\hat{v}{t}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are bias-corrected estimates, θ t subscript 𝜃 𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the parameter at iteration t 𝑡 t italic_t, η 𝜂\eta italic_η is the learning rate, and ϵ italic-ϵ\epsilon italic_ϵ is a small constant to prevent division by zero.
Finally for comparison purposes we train with the same hyper parameters two models a singular branch 2 and the proposed model 1. All implementations were done in PyTorch[15].
6 Results
- •Loss over iterations of the Single Sequential Network (Figure 3) The training loss for Object 1 (blue) starts high, sharply decreases, and then fluctuates around a lower level with some spikes. The training loss for Object 2 (orange) follows a similar pattern but maintains a higher overall loss throughout the training process. There are large spikes in the loss for both objects early in training, indicating potential instability or difficulty in the initial learning phase. The loss seems to stabilize and flatten out more towards the end of the training iterations shown.
- •Loss over iterations of the Multi-Branched Network (Figure 4) has a overall pattern is similar to Figure 3, with Object 2’s loss (orange) being consistently higher than Object 1’s loss (blue).However, the initial large spikes in loss are more prominent and last longer compared to Image 1.The loss curves appear to flatten out and stabilize at a later point in the training process compared to Figure 3.There are fewer small fluctuations and spikes in the loss curves once they stabilize, suggesting potentially smoother convergence.
In summary, while the overall trend of Object 2 having higher training loss is consistent across both images, the single sequential network exhibits more pronounced initial instability and takes longer to stabilize compared to the multi-branched architecture.
We next compare the outputs of the single sequential layer and the multi layered architecture. Figure 5 shows the object movement for the single sequential layer and Figure 6 shows the object movement for the multi layered architecture.
The predicted paths (black lines) for the single sequential layer 5 are relatively centralized and seem to capture some linear segments of the trajectories. The overall pattern shows dense and tangled paths, which is typical of chaotic systems. The black lines appear to follow the chaotic nature to some extent but might be too centralized and not dispersed enough to fully capture the randomness. The predicted paths (black lines) for the multi layer architecture layer 6 are also centralized but show slight shifts compared to the output of the single sequential layer. This output also has dense and tangled paths, consistent with chaotic behavior. The black lines appear to capture more variability and slight shifts, which might better reflect the unpredictability of chaotic systems.
Figure 3: Loss over iterations of the Single Sequential Network
Figure 4: Loss over iterations of the Multi-Branched Network
Figure 5: Output of the single sequential layer
Figure 6: Output of the multi layer
7 Conclusion
In conclusion, this paper has explored the application of Radial Basis Function Neural Networks (RBFNNs) in predicting chaotic and random behaviors. Through a comprehensive review of related work, we have highlighted the strengths and limitations of RBFNNs in capturing the complex dynamics of chaotic systems. Leveraging insights from chaos theory and neural network architecture, we have proposed novel approaches for enhancing the predictive capabilities of RBFNNs with attention mechanisms.
Our results demonstrate the effectiveness of our proposed methods in predicting chaotic and random behaviors. A comparison of object movement predictions illustrated in our visual results indicates that our enhanced RBFNN model effectively captures the inherent variability and unpredictability of chaotic systems. Specifically, in Figure 6 prediction paths exhibited greater variability and subtle shifts, closely aligning with the expected characteristics of chaotic behavior. This confirms that our model can realistically reflect the randomness and sensitivity to initial conditions typical of chaotic systems.
Overall, this paper contributes to advancing our understanding of chaotic systems and lays the groundwork for future research in utilizing RBFNNs for predictive modeling in complex dynamical systems.
8 Limitations
Chaotic systems often require ongoing monitoring and adjustments to plans. Since small changes can have significant impacts, staying updated on the current state of the system is crucial. We understand that chaos can not be truly predicted and understood.
9 Reproducibility
Results can be reproduced from the code present in my GitHub repository.
10 Acknowledgement
We acknowledge the work of Alessio Russo who originally implemented RBFNNs in PyTorch. His work is available on his GitHub[16].
References
- [1] Contributors to Wikimedia projects. Radial basis function - Wikipedia, 2024.
- [2] Gregory E. Fasshauer. Meshfree Approximation Methods with MATLAB. World Scientific Publishing Co. Pte. Ltd., Singapore, 2007.
- [3] Holger Wendland. Scattered Data Approximation. Cambridge University Press, Cambridge, 2005.
- [4] John Milnor. On the concept of attractor. Communications in Mathematical Physics, 99(2):177–195, Jun 1985.
- [5] M.Emre Celebi, Fatih Celiker, and Hassan A. Kingravi. On euclidean norm approximations, 2010.
- [6] Yue Wu, Hui Wang, Biaobiao Zhang, and K.-L. Du. Using Radial Basis Function Networks for Function Approximation and Classification. International Scholarly Research Notices, 2012, March 2012.
- [7] James A Leonard and Mark A Kramer. Radial basis function networks for classifying process faults. IEEE Control Systems Magazine, 11(3):31–38, 1991.
- [8] Deng Jianping, Narasimhan Sundararajan, and P Saratchandran. Communication channel equalization using complex-valued minimal radial basis function neural networks. IEEE Transactions on neural networks, 13(3):687–696, 2002.
- [9] Hao Yu, Tiantian Xie, Stanisław Paszczynski, and Bogdan M Wilamowski. Advantages of radial basis function networks for dynamic system design. IEEE Transactions on Industrial Electronics, 58(12):5438–5450, 2011.
- [10] A Vande Wouwer, Christine Renotte, and Ph Bogaerts. Biological reaction modeling using radial basis function networks. Computers & chemical engineering, 28(11):2157–2164, 2004.
- [11] Yu-Yen Ou et al. Identifying the molecular functions of electron transport proteins using radial basis function networks and biochemical properties. Journal of Molecular Graphics and Modelling, 73:166–178, 2017.
- [12] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023.
- [13] NIKITRICKY. Physics attractor time series dataset, 2023.
- [14] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017.
- [15] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library, 2019.
- [16] Alessio Russo. Pytorch rbf layer, 2021.
Xet Storage Details
- Size:
- 35.9 kB
- Xet hash:
- 969e0c43d1a7215c9ead36411c074379a90e14324141f06d5f285a773eaa0a8b
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.





