new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

May 20

Chaos as an interpretable benchmark for forecasting and data-driven modelling

The striking fractal geometry of strange attractors underscores the generative nature of chaos: like probability distributions, chaotic systems can be repeatedly measured to produce arbitrarily-detailed information about the underlying attractor. Chaotic systems thus pose a unique challenge to modern statistical learning techniques, while retaining quantifiable mathematical properties that make them controllable and interpretable as benchmarks. Here, we present a growing database currently comprising 131 known chaotic dynamical systems spanning fields such as astrophysics, climatology, and biochemistry. Each system is paired with precomputed multivariate and univariate time series. Our dataset has comparable scale to existing static time series databases; however, our systems can be re-integrated to produce additional datasets of arbitrary length and granularity. Our dataset is annotated with known mathematical properties of each system, and we perform feature analysis to broadly categorize the diverse dynamics present across the collection. Chaotic systems inherently challenge forecasting models, and across extensive benchmarks we correlate forecasting performance with the degree of chaos present. We also exploit the unique generative properties of our dataset in several proof-of-concept experiments: surrogate transfer learning to improve time series classification, importance sampling to accelerate model training, and benchmarking symbolic regression algorithms.

  • 1 authors
·
Oct 11, 2021

Generalized Teacher Forcing for Learning Chaotic Dynamics

Chaotic dynamical systems (DS) are ubiquitous in nature and society. Often we are interested in reconstructing such systems from observed time series for prediction or mechanistic insight, where by reconstruction we mean learning geometrical and invariant temporal properties of the system in question (like attractors). However, training reconstruction algorithms like recurrent neural networks (RNNs) on such systems by gradient-descent based techniques faces severe challenges. This is mainly due to exploding gradients caused by the exponential divergence of trajectories in chaotic systems. Moreover, for (scientific) interpretability we wish to have as low dimensional reconstructions as possible, preferably in a model which is mathematically tractable. Here we report that a surprisingly simple modification of teacher forcing leads to provably strictly all-time bounded gradients in training on chaotic systems, and, when paired with a simple architectural rearrangement of a tractable RNN design, piecewise-linear RNNs (PLRNNs), allows for faithful reconstruction in spaces of at most the dimensionality of the observed system. We show on several DS that with these amendments we can reconstruct DS better than current SOTA algorithms, in much lower dimensions. Performance differences were particularly compelling on real world data with which most other methods severely struggled. This work thus led to a simple yet powerful DS reconstruction algorithm which is highly interpretable at the same time.

  • 4 authors
·
Jun 7, 2023

Model scale versus domain knowledge in statistical forecasting of chaotic systems

Chaos and unpredictability are traditionally synonymous, yet large-scale machine learning methods recently have demonstrated a surprising ability to forecast chaotic systems well beyond typical predictability horizons. However, recent works disagree on whether specialized methods grounded in dynamical systems theory, such as reservoir computers or neural ordinary differential equations, outperform general-purpose large-scale learning methods such as transformers or recurrent neural networks. These prior studies perform comparisons on few individually-chosen chaotic systems, thereby precluding robust quantification of how statistical modeling choices and dynamical invariants of different chaotic systems jointly determine empirical predictability. Here, we perform the largest to-date comparative study of forecasting methods on the classical problem of forecasting chaos: we benchmark 24 state-of-the-art forecasting methods on a crowdsourced database of 135 low-dimensional systems with 17 forecast metrics. We find that large-scale, domain-agnostic forecasting methods consistently produce predictions that remain accurate up to two dozen Lyapunov times, thereby accessing a new long-horizon forecasting regime well beyond classical methods. We find that, in this regime, accuracy decorrelates with classical invariant measures of predictability like the Lyapunov exponent. However, in data-limited settings outside the long-horizon regime, we find that physics-based hybrid methods retain a comparative advantage due to their strong inductive biases.

  • 1 authors
·
Mar 12, 2023

Almost-Linear RNNs Yield Highly Interpretable Symbolic Codes in Dynamical Systems Reconstruction

Dynamical systems (DS) theory is fundamental for many areas of science and engineering. It can provide deep insights into the behavior of systems evolving in time, as typically described by differential or recursive equations. A common approach to facilitate mathematical tractability and interpretability of DS models involves decomposing nonlinear DS into multiple linear DS separated by switching manifolds, i.e. piecewise linear (PWL) systems. PWL models are popular in engineering and a frequent choice in mathematics for analyzing the topological properties of DS. However, hand-crafting such models is tedious and only possible for very low-dimensional scenarios, while inferring them from data usually gives rise to unnecessarily complex representations with very many linear subregions. Here we introduce Almost-Linear Recurrent Neural Networks (AL-RNNs) which automatically and robustly produce most parsimonious PWL representations of DS from time series data, using as few PWL nonlinearities as possible. AL-RNNs can be efficiently trained with any SOTA algorithm for dynamical systems reconstruction (DSR), and naturally give rise to a symbolic encoding of the underlying DS that provably preserves important topological properties. We show that for the Lorenz and R\"ossler systems, AL-RNNs discover, in a purely data-driven way, the known topologically minimal PWL representations of the corresponding chaotic attractors. We further illustrate on two challenging empirical datasets that interpretable symbolic encodings of the dynamics can be achieved, tremendously facilitating mathematical and computational analysis of the underlying systems.

  • 4 authors
·
Oct 18, 2024

Mamba Integrated with Physics Principles Masters Long-term Chaotic System Forecasting

Long-term forecasting of chaotic systems from short-term observations remains a fundamental and underexplored challenge due to the intrinsic sensitivity to initial conditions and the complex geometry of strange attractors. Existing approaches often rely on long-term training data or focus on short-term sequence correlations, struggling to maintain predictive stability and dynamical coherence over extended horizons. We propose PhyxMamba, a novel framework that integrates a Mamba-based state-space model with physics-informed principles to capture the underlying dynamics of chaotic systems. By reconstructing the attractor manifold from brief observations using time-delay embeddings, PhyxMamba extracts global dynamical features essential for accurate forecasting. Our generative training scheme enables Mamba to replicate the physical process, augmented by multi-token prediction and attractor geometry regularization for physical constraints, enhancing prediction accuracy and preserving key statistical invariants. Extensive evaluations on diverse simulated and real-world chaotic systems demonstrate that PhyxMamba delivers superior long-term forecasting and faithfully captures essential dynamical invariants from short-term data. This framework opens new avenues for reliably predicting chaotic systems under observation-scarce conditions, with broad implications across climate science, neuroscience, epidemiology, and beyond. Our code is open-source at https://github.com/tsinghua-fib-lab/PhyxMamba.

  • 5 authors
·
May 29, 2025

4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration

Leveraging diverse robotic data for pretraining remains a critical challenge. Existing methods typically model the dataset's action distribution using simple observations as inputs. However, these inputs are often incomplete, resulting in a dispersed conditional action distribution-an issue we refer to as coordinate system chaos and state chaos. This inconsistency significantly hampers pretraining efficiency. To address this, we propose 4D-VLA, a novel approach that effectively integrates 4D information into the input to mitigate these sources of chaos. Our model introduces depth and temporal information into visual features with sequential RGB-D inputs, aligning the coordinate systems of the robot and the scene. This alignment endows the model with strong spatiotemporal reasoning capabilities while minimizing training overhead. Additionally, we introduce memory bank sampling, a frame sampling strategy designed to extract informative frames from historical images, further improving effectiveness and efficiency. Experimental results demonstrate that our pretraining method and architectural components substantially enhance model performance. In both simulated and real-world experiments, our model achieves a significant increase in success rate over OpenVLA. To further assess spatial perception and generalization to novel views, we introduce MV-Bench, a multi-view simulation benchmark. Our model consistently outperforms existing methods, demonstrating stronger spatial understanding and adaptability.

  • 11 authors
·
Jun 27, 2025

ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction

Accurate prediction of climate in the subseasonal-to-seasonal scale is crucial for disaster readiness, reduced economic risk, and improved policy-making amidst climate change. Yet, S2S prediction remains challenging due to the chaotic nature of the system. At present, existing benchmarks for weather and climate applications, tend to (1) have shorter forecasting range of up-to 14 days, (2) do not include a wide range of operational baseline forecasts, and (3) lack physics-based constraints for explainability. Thus, we propose ChaosBench, a large-scale, multi-channel, physics-based benchmark for S2S prediction. ChaosBench has over 460K frames of real-world observations and simulations, each with 60 variable-channels and spanning for up-to 45 years. We also propose several physics-based, in addition to vision-based metrics, that enables for a more physically-consistent model. Furthermore, we include a diverse set of physics-based forecasts from 4 national weather agencies as baselines to our data-driven counterpart. We establish two tasks that vary in complexity: full and sparse dynamics prediction. Our benchmark is one of the first to perform large-scale evaluation on existing models including PanguWeather, FourCastNetV2, GraphCast, and ClimaX, and finds methods originally developed for weather-scale applications fails on S2S task. We release our benchmark code and datasets at https://leap-stc.github.io/ChaosBench.

  • 7 authors
·
Feb 1, 2024

Separable neural architectures as a primitive for unified predictive and generative intelligence

Intelligent systems across physics, language and perception often exhibit factorisable structure, yet are typically modelled by monolithic neural architectures that do not explicitly exploit this structure. The separable neural architecture (SNA) addresses this by formalising a representational class that unifies additive, quadratic and tensor-decomposed neural models. By constraining interaction order and tensor rank, SNAs impose a structural inductive bias that factorises high-dimensional mappings into low-arity components. Separability need not be a property of the system itself: it often emerges in the coordinates or representations through which the system is expressed. Crucially, this coordinate-aware formulation reveals a structural analogy between chaotic spatiotemporal dynamics and linguistic autoregression. By treating continuous physical states as smooth, separable embeddings, SNAs enable distributional modelling of chaotic systems. This approach mitigates the nonphysical drift characteristics of deterministic operators whilst remaining applicable to discrete sequences. The compositional versatility of this approach is demonstrated across four domains: autonomous waypoint navigation via reinforcement learning, inverse generation of multifunctional microstructures, distributional modelling of turbulent flow and neural language modelling. These results establish the separable neural architecture as a domain-agnostic primitive for predictive and generative intelligence, capable of unifying both deterministic and distributional representations.

  • 5 authors
·
Mar 12

The Deep Arbitrary Polynomial Chaos Neural Network or how Deep Artificial Neural Networks could benefit from Data-Driven Homogeneous Chaos Theory

Artificial Intelligence and Machine learning have been widely used in various fields of mathematical computing, physical modeling, computational science, communication science, and stochastic analysis. Approaches based on Deep Artificial Neural Networks (DANN) are very popular in our days. Depending on the learning task, the exact form of DANNs is determined via their multi-layer architecture, activation functions and the so-called loss function. However, for a majority of deep learning approaches based on DANNs, the kernel structure of neural signal processing remains the same, where the node response is encoded as a linear superposition of neural activity, while the non-linearity is triggered by the activation functions. In the current paper, we suggest to analyze the neural signal processing in DANNs from the point of view of homogeneous chaos theory as known from polynomial chaos expansion (PCE). From the PCE perspective, the (linear) response on each node of a DANN could be seen as a 1^{st} degree multi-variate polynomial of single neurons from the previous layer, i.e. linear weighted sum of monomials. From this point of view, the conventional DANN structure relies implicitly (but erroneously) on a Gaussian distribution of neural signals. Additionally, this view revels that by design DANNs do not necessarily fulfill any orthogonality or orthonormality condition for a majority of data-driven applications. Therefore, the prevailing handling of neural signals in DANNs could lead to redundant representation as any neural signal could contain some partial information from other neural signals. To tackle that challenge, we suggest to employ the data-driven generalization of PCE theory known as arbitrary polynomial chaos (aPC) to construct a corresponding multi-variate orthonormal representations on each node of a DANN to obtain Deep arbitrary polynomial chaos neural networks.

  • 6 authors
·
Jun 26, 2023

Limits and Powers of Koopman Learning

Dynamical systems provide a comprehensive way to study complex and changing behaviors across various sciences. Many modern systems are too complicated to analyze directly or we do not have access to models, driving significant interest in learning methods. Koopman operators have emerged as a dominant approach because they allow the study of nonlinear dynamics using linear techniques by solving an infinite-dimensional spectral problem. However, current algorithms face challenges such as lack of convergence, hindering practical progress. This paper addresses a fundamental open question: When can we robustly learn the spectral properties of Koopman operators from trajectory data of dynamical systems, and when can we not? Understanding these boundaries is crucial for analysis, applications, and designing algorithms. We establish a foundational approach that combines computational analysis and ergodic theory, revealing the first fundamental barriers -- universal for any algorithm -- associated with system geometry and complexity, regardless of data quality and quantity. For instance, we demonstrate well-behaved smooth dynamical systems on tori where non-trivial eigenfunctions of the Koopman operator cannot be determined by any sequence of (even randomized) algorithms, even with unlimited training data. Additionally, we identify when learning is possible and introduce optimal algorithms with verification that overcome issues in standard methods. These results pave the way for a sharp classification theory of data-driven dynamical systems based on how many limits are needed to solve a problem. These limits characterize all previous methods, presenting a unified view. Our framework systematically determines when and how Koopman spectral properties can be learned.

  • 3 authors
·
Jul 8, 2024

Respecting causality is all you need for training physics-informed neural networks

While the popularity of physics-informed neural networks (PINNs) is steadily rising, to this date PINNs have not been successful in simulating dynamical systems whose solution exhibits multi-scale, chaotic or turbulent behavior. In this work we attribute this shortcoming to the inability of existing PINNs formulations to respect the spatio-temporal causal structure that is inherent to the evolution of physical systems. We argue that this is a fundamental limitation and a key source of error that can ultimately steer PINN models to converge towards erroneous solutions. We address this pathology by proposing a simple re-formulation of PINNs loss functions that can explicitly account for physical causality during model training. We demonstrate that this simple modification alone is enough to introduce significant accuracy improvements, as well as a practical quantitative mechanism for assessing the convergence of a PINNs model. We provide state-of-the-art numerical results across a series of benchmarks for which existing PINNs formulations fail, including the chaotic Lorenz system, the Kuramoto-Sivashinsky equation in the chaotic regime, and the Navier-Stokes equations in the turbulent regime. To the best of our knowledge, this is the first time that PINNs have been successful in simulating such systems, introducing new opportunities for their applicability to problems of industrial complexity.

  • 3 authors
·
Mar 14, 2022

Linear equivalence of nonlinear recurrent neural networks

Large nonlinear recurrent neural networks with random couplings generate high-dimensional, potentially chaotic activity whose structure is of interest in neuroscience and other fields. A fundamental object encoding the collective structure of this activity is the N times N covariance matrix. Prior analytical work on the covariance matrix has been limited to low-dimensional summary statistics. Recent work proposed an ansatz in which, at large N, the covariance matrix for a typical quenched realization takes the same form as that of a linear network with the same couplings, driven by independent noise, with DMFT order parameters setting the transfer function and the noise spectrum. Here, we derive this ansatz using the two-site cavity method, providing two derivations with complementary perspectives. The first decomposes each unit's activity into a linear response to its local field and a nonlinear residual, and shows that cross-covariances between residuals at distinct sites are strongly suppressed, so the residuals act as independent noise driving a linear network. The second derives a self-consistent matrix equation for the covariance matrix. A naive Gaussian closure for the joint statistics of local fields at distinct sites misses cross terms that, in a linear network, would be generated by an external drive. The cavity method recovers these terms from non-Gaussian contributions, revealing an emergent external drive. Higher-order cross-site moments follow a Wick-like decomposition into products of pairwise covariances at leading order, reducing them to the linear-equivalent form. We verify the predictions in simulations. These results extend linear equivalence from feedforward high-dimensional nonlinear systems, where the activations are independent of the weights, to recurrent networks, where the activations are correlated with the couplings that generate them.

  • 1 authors
·
May 4

A Topological and Operator Algebraic Framework for Asynchronous Lattice Dynamical Systems

I introduce a novel mathematical framework integrating topological dynamics, operator algebras, and ergodic geometry to study lattices of asynchronous metric dynamical systems. Each node in the lattice carries an internal flow represented by a one-parameter family of operators, evolving on its own time scale. I formalize stratified state spaces capturing multiple levels of synchronized behavior, define an asynchronous evolution metric that quantifies phase-offset distances between subsystems, and characterize emergent coherent topologies arising when subsystems synchronize. Within this framework, I develop formal operators for the evolution of each subsystem and give precise conditions under which phase-aligned synchronization occurs across the lattice. The main results include: (1) the existence and uniqueness of coherent (synchronized) states under a contractive coupling condition, (2) stability of these coherent states and criteria for their emergence as a collective phase transition in a continuous operator topology, and (3) the influence of symmetries, with group-invariant coupling leading to flow-invariant synchrony subspaces and structured cluster dynamics. Proofs are given for each theorem, demonstrating full mathematical rigor. In a final section, I discuss hypothetical applications of this framework to symbolic lattice systems (e.g. subshifts), to invariant group actions on dynamical lattices, and to operator fields over stratified manifolds in the spirit of noncommutative geometry. Throughout, I write in the first person to emphasize the exploratory nature of this work. The paper avoids any reference to cosmology or observers, focusing instead on clean, formal mathematics suitable for a broad array of dynamical systems.

  • 1 authors
·
May 14, 2025

PECCARY: A novel approach for characterizing orbital complexity, stochasticity, and regularity

Permutation Entropy and statistiCal Complexity Analysis for astRophYsics (PECCARY) is a computationally inexpensive, statistical method by which any time-series can be characterized as predominantly regular, complex, or stochastic. Elements of the PECCARY method have been used in a variety of physical, biological, economic, and mathematical scenarios, but have not yet gained traction in the astrophysical community. This study introduces the PECCARY technique with the specific aims to motivate its use in and optimize it for the analysis of astrophysical orbital systems. PECCARY works by decomposing a time-dependent measure, such as the x-coordinate or orbital angular momentum time-series, into ordinal patterns. Due to its unique approach and statistical nature, PECCARY is well-suited for detecting preferred and forbidden patterns (a signature of chaos), even when the chaotic behavior is short-lived or when working with a relatively short duration time-series or small sets of time-series data. A variety of examples are used to demonstrate the capabilities of PECCARY. These include mathematical examples (sine waves, varieties of noise, sums of sine waves, well-known chaotic functions), a double pendulum system, and astrophysical tracer particle simulations with potentials of varying intricacies. Since the adopted timescale used to diagnose a given time-series can affect the outcome, a method is presented to identify an ideal sampling scheme, constrained by the overall duration and the natural timescale of the system. The accompanying PECCARY Python package and its usage are discussed.

  • 3 authors
·
Jul 16, 2024

Nonlinear dynamics of a chemically-active drop: from steady to chaotic self-propulsion

Individual chemically active drops suspended in a surfactant solution were observed to self-propel spontaneously with straight, helical, or chaotic trajectories. To elucidate how these drops can exhibit such strikingly different dynamics and `decide' what to do, we propose a minimal axisymmetric model of a spherical active drop, and show that simple and linear interface properties can lead to both steady self-propulsion of the droplet as well as chaotic behavior. The model includes two different mobility mechanisms, namely, diffusiophoresis and the Marangoni effect, that convert self-generated gradients of surfactant concentration into the flow at the droplet surface. In turn, surface-driven flow initiates surfactant advection that is the only nonlinear mechanism and, thus, the only source of dynamical complexity in our model. Numerical investigation of the fully-coupled hydrodynamic and advection diffusion problems reveals that strong advection (e.g., large droplet size) may destabilize a steadily self-propelling drop; once destabilized, the droplet spontaneously stops and a symmetric extensile flow emerges. If advection is strengthened even further in comparison with molecular diffusion, the droplet may perform chaotic oscillations. Our results indicate that the thresholds of these instabilities depend heavily on the balance between diffusiophoresis and the Marangoni effect. Using linear stability analysis, we demonstrate that diffusiophoresis promotes the onset of high-order modes of monotonic instability of the motionless drop. We argue that diffusiophoresis has a similar effect on the instabilities of a moving drop.

  • 2 authors
·
Jan 8, 2019

Preliminary sonification of ENSO using traditional Javanese gamelan scales

Sonification -- the mapping of data to non-speech audio -- offers an underexplored channel for representing complex dynamical systems. We treat El Niño-Southern Oscillation (ENSO), a canonical example of low-dimensional climate chaos, as a test case for culturally-situated sonification evaluated through complex systems diagnostics. Using parameter-mapping sonification of the Niño 3.4 sea surface temperature anomaly index (1870--2024), we encode ENSO variability into two traditional Javanese gamelan pentatonic systems (pelog and slendro) across four composition strategies, then analyze the resulting audio as trajectories in a two-dimensional acoustic phase space. Recurrence-based diagnostics, convex hull geometry, and coupling analysis reveal that the sonification pipeline preserves key dynamical signatures: alternating modes produce the highest trajectory recurrence rates, echoing ENSO's quasi-periodicity; layered polyphonic modes explore the broadest phase space regions; and the two scale families induce qualitatively distinct coupling regimes between spectral brightness and energy -- predominantly anti-phase in pelog but near-independent in slendro. Phase space trajectory analysis provides a rigorous geometric framework for comparing sonification designs within a complex systems context. Perceptual validation remains necessary; we contribute the dynamical systems methodology for evaluating such mappings.

Leslie Population Models in Predator-prey and Competitive populations: theory and applications by machine learning

We introduce a new predator-prey model by replacing the growth and predation constant by a square matrix, and the population density as a population vector. The classical Lotka-Volterra model describes a population that either modulates or converges. Stability analysis of such models have been extensively studied by the works of Merdan (https://doi.org/10.1016/j.chaos.2007.06.062). The new model adds complexity by introducing an age group structure where the population of each age group evolves as prescribed by the Leslie matrix. The added complexity changes the behavior of the model such that the population either displays roughly an exponential growth or decay. We first provide an exact equation that describes a time evolution and use analytic techniques to obtain an approximate growth factor. We also discuss the variants of the Leslie model, i.e., the complex value predator-prey model and the competitive model. We then prove the Last Species Standing theorem that determines the dominant population in the large time limit. The recursive structure of the model denies the application of simple regression. We discuss a machine learning scheme that allows an admissible fit for the population evolution of Paramecium Aurelia and Paramecium Caudatum. Another potential avenue to simplify the computation is to use the machinery of quantum operators. We demonstrate the potential of this approach by computing the Hamiltonian of a simple Leslie system.

  • 5 authors
·
Dec 20, 2024

simple-idealized-1d-nlse: Pseudo-Spectral Solver for the 1D Nonlinear Schrödinger Equation

We present an open-source Python implementation of an idealized high-order pseudo-spectral solver for the one-dimensional nonlinear Schr\"odinger equation (NLSE). The solver combines Fourier spectral spatial discretization with an adaptive eighth-order Dormand-Prince time integration scheme to achieve machine-precision conservation of mass and near-perfect preservation of momentum and energy for smooth solutions. The implementation accurately reproduces fundamental NLSE phenomena including soliton collisions with analytically predicted phase shifts, Akhmediev breather dynamics, and the development of modulation instability from noisy initial conditions. Four canonical test cases validate the numerical scheme: single soliton propagation, two-soliton elastic collision, breather evolution, and noise-seeded modulation instability. The solver employs a 2/3 dealiasing rule with exponential filtering to prevent aliasing errors from the cubic nonlinearity. Statistical analysis using Shannon, R\'enyi, and Tsallis entropies quantifies the spatio-temporal complexity of solutions, while phase space representations reveal the underlying coherence structure. The implementation prioritizes code transparency and educational accessibility over computational performance, providing a valuable pedagogical tool for exploring nonlinear wave dynamics. Complete source code, documentation, and example configurations are freely available, enabling reproducible computational experiments across diverse physical contexts where the NLSE governs wave evolution, including nonlinear optics, Bose-Einstein condensates, and ocean surface waves.

  • 5 authors
·
Sep 6, 2025

Toward World Modeling of Physiological Signals with Chaos-Theoretic Balancing and Latent Dynamics

Physiological time series signals reflect complex, multi-scale dynamical processes of the human body. Existing modeling studies focus on static tasks such as classification, event forecasting, or short-horizon next step prediction, while long-horizon signal-level forecasting and predictive nature of physiological signals remain underexplored. We introduce NormWear-2, a world model that encodes both multivariate physiological signals and clinical intervention variables into a shared latent space and models their joint temporal evolution as a dynamical system. Our approach combines inference from prior pre-trained knowledge (intuition) with instant non-parametric latent state transition adaptation (insight), enabling coherent forecasting across multiple temporal scales, conditioned on heterogeneous clinical interventions. During the pretraining phase, we find that chaos-theoretic balancing of dynamical regime diversity yields more robust representations, with a smaller balanced corpus outperforming one twice its size and capturing bifurcation regimes. We evaluate the world model performance across diverse real-world physiological datasets spanning heterogeneous temporal resolutions and intervention regimes, covering daily life, point-of-care, and clinical settings, including fitness planning, hemodialysis, diabetes management, and surgical monitoring. These evaluation datasets comprise records from 8,026 subjects, spanning study durations from 3.2 hours for high-resolution signal data to 2.3 years for longitudinal clinical biomarker tracking. NormWear-2 achieves the best overall forecasting performance across time, frequency, and latent representation domains, with significant improvements over state-of-the-art time series foundation models, while maintaining competitive downstream representation quality, providing a step toward general-purpose world models for physiological signals.

  • 11 authors
·
May 13

Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition

Deterministic inference is a comforting ideal in classical software: the same program on the same input should always produce the same output. As large language models move into real-world deployment, this ideal has been imported wholesale into inference stacks. Recent work from the Thinking Machines Lab has presented a detailed analysis of nondeterminism in LLM inference, showing how batch-invariant kernels and deterministic attention can enforce bitwise-identical outputs, positioning deterministic inference as a prerequisite for reproducibility and enterprise reliability. In this paper, we take the opposite stance. We argue that, for LLMs, deterministic inference kills. It kills the ability to model uncertainty, suppresses emergent abilities, collapses reasoning into a single brittle path, and weakens safety alignment by hiding tail risks. LLMs implement conditional distributions over outputs, not fixed functions. Collapsing these distributions to a single canonical completion may appear reassuring, but it systematically conceals properties central to artificial cognition. We instead advocate Stochastic CHAOS, treating distributional variability as a signal to be measured and controlled. Empirically, we show that deterministic inference is systematically misleading. Single-sample deterministic evaluation underestimates both capability and fragility, masking failure probability under paraphrases and noise. Phase-like transitions associated with emergent abilities disappear under greedy decoding. Multi-path reasoning degrades when forced onto deterministic backbones, reducing accuracy and diagnostic insight. Finally, deterministic evaluation underestimates safety risk by hiding rare but dangerous behaviors that appear only under multi-sample evaluation.

  • 10 authors
·
Jan 12 2

DyMixOp: Guiding Neural Operator Design for PDEs from a Complex Dynamics Perspective with Local-Global-Mixing

A primary challenge in using neural networks to approximate nonlinear dynamical systems governed by partial differential equations (PDEs) is transforming these systems into a suitable format, especially when dealing with non-linearizable dynamics or the need for infinite-dimensional spaces for linearization. This paper introduces DyMixOp, a novel neural operator framework for PDEs that integrates insights from complex dynamical systems to address this challenge. Grounded in inertial manifold theory, DyMixOp transforms infinite-dimensional nonlinear PDE dynamics into a finite-dimensional latent space, establishing a structured foundation that maintains essential nonlinear interactions and enhances physical interpretability. A key innovation is the Local-Global-Mixing (LGM) transformation, inspired by convection dynamics in turbulence. This transformation effectively captures both fine-scale details and nonlinear interactions, while mitigating spectral bias commonly found in existing neural operators. The framework is further strengthened by a dynamics-informed architecture that connects multiple LGM layers to approximate linear and nonlinear dynamics, reflecting the temporal evolution of dynamical systems. Experimental results across diverse PDE benchmarks demonstrate that DyMixOp achieves state-of-the-art performance, significantly reducing prediction errors, particularly in convection-dominated scenarios reaching up to 86.7\%, while maintaining computational efficiency and scalability.

  • 3 authors
·
Aug 18, 2025

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Training LLM agents in multi-turn environments with sparse rewards, where completing a single task requires 30+ turns of interaction within an episode, presents a fundamental challenge for reinforcement learning. We identify a critical failure mode unique to this setting: the exploration-exploitation cascade failure. This cascade begins with early-stage policy premature convergence, where sparse feedback causes agents to commit to flawed, low-entropy strategies. Subsequently, agents enter late-stage policy collapse, where conventional entropy regularization becomes counterproductive, promoting chaotic exploration that destabilizes training. We propose Entropy-regularized Policy Optimization (EPO), a general framework that breaks this failure cycle through three synergistic mechanisms: (1) adopting entropy regularization in multi-turn settings to enhance exploration, (2) an entropy smoothing regularizer that bounds policy entropy within historical averages to prevent abrupt fluctuations, and (3) adaptive phase-based weighting that balances exploration and exploitation across training. Our analysis justifies that EPO guarantees monotonically decreasing entropy variance while maintaining convergence. EPO achieves up to 152% performance improvement on ScienceWorld and up to 19.8% on ALFWorld. Our work demonstrates that multi-turn sparse-reward settings require fundamentally different entropy control than traditional RL, with broad implications for LLM agent training.

  • 9 authors
·
Sep 26, 2025 2

DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning (ML). ML practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap. DendroMap is available at https://div-lab.github.io/dendromap/.

  • 7 authors
·
May 13, 2022

Deconstructing Recurrence, Attention, and Gating: Investigating the transferability of Transformers and Gated Recurrent Neural Networks in forecasting of dynamical systems

Machine learning architectures, including transformers and recurrent neural networks (RNNs) have revolutionized forecasting in applications ranging from text processing to extreme weather. Notably, advanced network architectures, tuned for applications such as natural language processing, are transferable to other tasks such as spatiotemporal forecasting tasks. However, there is a scarcity of ablation studies to illustrate the key components that enable this forecasting accuracy. The absence of such studies, although explainable due to the associated computational cost, intensifies the belief that these models ought to be considered as black boxes. In this work, we decompose the key architectural components of the most powerful neural architectures, namely gating and recurrence in RNNs, and attention mechanisms in transformers. Then, we synthesize and build novel hybrid architectures from the standard blocks, performing ablation studies to identify which mechanisms are effective for each task. The importance of considering these components as hyper-parameters that can augment the standard architectures is exhibited on various forecasting datasets, from the spatiotemporal chaotic dynamics of the multiscale Lorenz 96 system, the Kuramoto-Sivashinsky equation, as well as standard real world time-series benchmarks. A key finding is that neural gating and attention improves the performance of all standard RNNs in most tasks, while the addition of a notion of recurrence in transformers is detrimental. Furthermore, our study reveals that a novel, sparsely used, architecture which integrates Recurrent Highway Networks with neural gating and attention mechanisms, emerges as the best performing architecture in high-dimensional spatiotemporal forecasting of dynamical systems.

  • 3 authors
·
Oct 3, 2024

Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection

Many historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map images could greatly speed up the map interpretation and helps generate rich metadata describing the map content. Many text detection algorithms have been proposed to locate text regions in map images automatically, but most of the algorithms are trained on out-ofdomain datasets (e.g., scenic images). Training data determines the quality of machine learning models, and manually annotating text regions in map images is labor-extensive and time-consuming. On the other hand, existing geographic data sources, such as Open- StreetMap (OSM), contain machine-readable map layers, which allow us to separate out the text layer and obtain text label annotations easily. However, the cartographic styles between OSM map tiles and historical maps are significantly different. This paper proposes a method to automatically generate an unlimited amount of annotated historical map images for training text detection models. We use a style transfer model to convert contemporary map images into historical style and place text labels upon them. We show that the state-of-the-art text detection models (e.g., PSENet) can benefit from the synthetic historical maps and achieve significant improvement for historical map text detection.

  • 5 authors
·
Dec 11, 2021

Dynamical phase diagram of synchronization in one dimension: universal behavior from Edwards-Wilkinson to random deposition through Kardar-Parisi-Zhang

Synchronization in one dimension displays generic scale invariance with universal properties previously observed in surface kinetic roughening and the wider context of the Kardar-Parisi-Zhang (KPZ) universality class. This has been established for phase oscillators and also for some limit-cycle oscillators, both in the presence of columnar (quenched) disorder and of time-dependent noise, by extensive numerical simulations, and has been analytically motivated by continuum approximations in the strong oscillator coupling limit. The robustness and the precise boundaries in parameter space for such critical behavior remain unclear, however, which may preclude further developments, including the extension of these results to higher dimensions and the experimental observation of nonequilibrium criticality in synchronizing (e.g.~electronic or chemical) oscillators. We here present complete numerical phase diagrams of one-dimensional synchronization, including saturation times and values, but, most importantly, also dynamical features giving insight into the gradual emergence of synchronous dynamics, based on systems of phase oscillators with either type of randomness. In the absence of synchronization, the dynamics evolves as expected for random deposition (for time-dependent noise) or linear growth (for columnar disorder), while a crossover from Edwards-Wilkinson to Kardar-Parisi-Zhang behavior (with the corresponding type of randomness) is observed as the randomness strength, or the nonoddity of the coupling among oscillators, is increased in the synchronous region -- their combined effect being partially captured by the so-called KPZ coupling. The distortion of scaling due to phase slips near the desynchronization boundary, a feature that is likely to play a role in experimental contexts, is also discussed.

  • 2 authors
·
Apr 6

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. We provide an analytical description of these phenomena by finding new exact solutions to the nonlinear dynamics of deep learning. Our theoretical analysis also reveals the surprising finding that as the depth of a network approaches infinity, learning speed can nevertheless remain finite: for a special class of initial conditions on the weights, very deep networks incur only a finite, depth independent, delay in learning speed relative to shallow networks. We show that, under certain conditions on the training data, unsupervised pretraining can find this special class of initial conditions, while scaled random Gaussian initializations cannot. We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times. We further show that these initial conditions also lead to faithful propagation of gradients even in deep nonlinear networks, as long as they operate in a special regime known as the edge of chaos.

  • 3 authors
·
Dec 20, 2013

Mechanistic Interpretability of RNNs emulating Hidden Markov Models

Recurrent neural networks (RNNs) provide a powerful approach in neuroscience to infer latent dynamics in neural populations and to generate hypotheses about the neural computations underlying behavior. However, past work has focused on relatively simple, input-driven, and largely deterministic behaviors - little is known about the mechanisms that would allow RNNs to generate the richer, spontaneous, and potentially stochastic behaviors observed in natural settings. Modeling with Hidden Markov Models (HMMs) has revealed a segmentation of natural behaviors into discrete latent states with stochastic transitions between them, a type of dynamics that may appear at odds with the continuous state spaces implemented by RNNs. Here we first show that RNNs can replicate HMM emission statistics and then reverse-engineer the trained networks to uncover the mechanisms they implement. In the absence of inputs, the activity of trained RNNs collapses towards a single fixed point. When driven by stochastic input, trajectories instead exhibit noise-sustained dynamics along closed orbits. Rotation along these orbits modulates the emission probabilities and is governed by transitions between regions of slow, noise-driven dynamics connected by fast, deterministic transitions. The trained RNNs develop highly structured connectivity, with a small set of "kick neurons" initiating transitions between these regions. This mechanism emerges during training as the network shifts into a regime of stochastic resonance, enabling it to perform probabilistic computations. Analyses across multiple HMM architectures - fully connected, cyclic, and linear-chain - reveal that this solution generalizes through the modular reuse of the same dynamical motif, suggesting a compositional principle by which RNNs can emulate complex discrete latent dynamics.

  • 5 authors
·
Oct 29, 2025

A Low-complexity Structured Neural Network to Realize States of Dynamical Systems

Data-driven learning is rapidly evolving and places a new perspective on realizing state-space dynamical systems. However, dynamical systems derived from nonlinear ordinary differential equations (ODEs) suffer from limitations in computational efficiency. Thus, this paper stems from data-driven learning to advance states of dynamical systems utilizing a structured neural network (StNN). The proposed learning technique also seeks to identify an optimal, low-complexity operator to solve dynamical systems, the so-called Hankel operator, derived from time-delay measurements. Thus, we utilize the StNN based on the Hankel operator to solve dynamical systems as an alternative to existing data-driven techniques. We show that the proposed StNN reduces the number of parameters and computational complexity compared with the conventional neural networks and also with the classical data-driven techniques, such as Sparse Identification of Nonlinear Dynamics (SINDy) and Hankel Alternative view of Koopman (HAVOK), which is commonly known as delay-Dynamic Mode Decomposition(DMD) or Hankel-DMD. More specifically, we present numerical simulations to solve dynamical systems utilizing the StNN based on the Hankel operator beginning from the fundamental Lotka-Volterra model, where we compare the StNN with the LEarning Across Dynamical Systems (LEADS), and extend our analysis to highly nonlinear and chaotic Lorenz systems, comparing the StNN with conventional neural networks, SINDy, and HAVOK. Hence, we show that the proposed StNN paves the way for realizing state-space dynamical systems with a low-complexity learning algorithm, enabling prediction and understanding of future states.

  • 4 authors
·
Mar 30, 2025

Measuring the Intrinsic Dimension of Objective Landscapes

Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

  • 4 authors
·
Apr 24, 2018

Illuminating search spaces by mapping elites

Many fields use search algorithms, which automatically explore a search space to find high-performing solutions: chemists search through the space of molecules to discover new drugs; engineers search for stronger, cheaper, safer designs, scientists search for models that best explain data, etc. The goal of search algorithms has traditionally been to return the single highest-performing solution in a search space. Here we describe a new, fundamentally different type of algorithm that is more useful because it provides a holistic view of how high-performing solutions are distributed throughout a search space. It creates a map of high-performing solutions at each point in a space defined by dimensions of variation that a user gets to choose. This Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) algorithm illuminates search spaces, allowing researchers to understand how interesting attributes of solutions combine to affect performance, either positively or, equally of interest, negatively. For example, a drug company may wish to understand how performance changes as the size of molecules and their cost-to-produce vary. MAP-Elites produces a large diversity of high-performing, yet qualitatively different solutions, which can be more helpful than a single, high-performing solution. Interestingly, because MAP-Elites explores more of the search space, it also tends to find a better overall solution than state-of-the-art search algorithms. We demonstrate the benefits of this new algorithm in three different problem domains ranging from producing modular neural networks to designing simulated and real soft robots. Because MAP- Elites (1) illuminates the relationship between performance and dimensions of interest in solutions, (2) returns a set of high-performing, yet diverse solutions, and (3) improves finding a single, best solution, it will advance science and engineering.

  • 2 authors
·
Apr 19, 2015

An Edit Friendly DDPM Noise Space: Inversion and Manipulations

Denoising diffusion probabilistic models (DDPMs) employ a sequence of white Gaussian noise samples to generate an image. In analogy with GANs, those noise maps could be considered as the latent code associated with the generated image. However, this native noise space does not possess a convenient structure, and is thus challenging to work with in editing tasks. Here, we propose an alternative latent noise space for DDPM that enables a wide range of editing operations via simple means, and present an inversion method for extracting these edit-friendly noise maps for any given image (real or synthetically generated). As opposed to the native DDPM noise space, the edit-friendly noise maps do not have a standard normal distribution and are not statistically independent across timesteps. However, they allow perfect reconstruction of any desired image, and simple transformations on them translate into meaningful manipulations of the output image (e.g., shifting, color edits). Moreover, in text-conditional models, fixing those noise maps while changing the text prompt, modifies semantics while retaining structure. We illustrate how this property enables text-based editing of real images via the diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIM inversion). We also show how it can be used within existing diffusion-based editing methods to improve their quality and diversity.

  • 3 authors
·
Apr 12, 2023

From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence

Can we learn more from data than existed in the generating process itself? Can new and useful information be constructed from merely applying deterministic transformations to existing data? Can the learnable content in data be evaluated without considering a downstream task? On these questions, Shannon information and Kolmogorov complexity come up nearly empty-handed, in part because they assume observers with unlimited computational capacity and fail to target the useful information content. In this work, we identify and exemplify three seeming paradoxes in information theory: (1) information cannot be increased by deterministic transformations; (2) information is independent of the order of data; (3) likelihood modeling is merely distribution matching. To shed light on the tension between these results and modern practice, and to quantify the value of data, we introduce epiplexity, a formalization of information capturing what computationally bounded observers can learn from data. Epiplexity captures the structural content in data while excluding time-bounded entropy, the random unpredictable content exemplified by pseudorandom number generators and chaotic dynamical systems. With these concepts, we demonstrate how information can be created with computation, how it depends on the ordering of the data, and how likelihood modeling can produce more complex programs than present in the data generating process itself. We also present practical procedures to estimate epiplexity which we show capture differences across data sources, track with downstream performance, and highlight dataset interventions that improve out-of-distribution generalization. In contrast to principles of model selection, epiplexity provides a theoretical foundation for data selection, guiding how to select, generate, or transform data for learning systems.

  • 6 authors
·
Jan 6