new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jun 18

Sycophancy as Material Failure under Pushback Loading: A Multi-Axis Characterization Across Three Loading Cases and up to Seventeen Material Charges

Sycophancy in LLMs is documented across 70+ papers, but expert agreement on construct boundaries remains low (ICC=.184; Ye et al., 2026). The construct fragments because behavioral classification depends on which surface form is privileged. We adopt a materials-science framing: conversation as test specimen under load, LLM-model as material charge, pushback as progressive load, stance-flip as material failure. We characterize this failure across three loading cases (debate n=1000; false-presuppositions n=3400; ethical-setting n=3400; 10-17 material charges per case; 7800 specimens total) using 14 turn-level axis-measurements spanning velocity, damage accumulation, frame-drift, brittleness, and direction stability, plus three speaker-resolved axes from an independent pipeline. The measurements are Hooke-coupled (σ= E cdot varepsilon analog) and reproduce across loading cases with effects up to |r_{rb}| = 0.35 on debate; the sign structure adds a second pattern: the ethical-setting case inverts the velocity and accumulation blocks. Variance composition partitions into two profiles: debate is charge-dominated (brittle-fracture-like: the material grade decides), false-presuppositions and ethical-setting are topic-dominated (creep-like: the load decides); the ratios (2.03 vs 0.13/0.17) are estimator-dependent, for debate even in direction. Cross-judge reliability (GPT-4o vs Haiku 4.5) shows debate scoring is judge-robust (Cohen's κ= 0.88) while false-presupposition scoring is judge-sensitive (κ= 0.36) -- a caveat single-judge benchmarks must report. This is the methodological move Ye et al.'s diagnosis calls for: a multi-axis characterization that does not depend on which surface form of the construct one privileges.

  • 1 authors
·
Jun 14

Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning

On-policy reinforcement learning (RL), particularly Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), has become the dominant paradigm for fine-tuning large language models (LLMs). While policy ratio clipping stabilizes training, this heuristic hard constraint incurs a fundamental cost: it indiscriminately truncates gradients from high-return yet high-divergence actions, suppressing rare but highly informative "eureka moments" in complex reasoning. Moreover, once data becomes slightly stale, hard clipping renders it unusable, leading to severe sample inefficiency. In this work, we revisit the trust-region objective in policy optimization and show that explicitly constraining the variance (second central moment) of the policy ratio provides a principled and smooth relaxation of hard clipping. This distributional constraint stabilizes policy updates while preserving gradient signals from valuable trajectories. Building on this insight, we propose R^2VPO (Ratio-Variance Regularized Policy Optimization), a novel primal-dual framework that supports stable on-policy learning and enables principled off-policy data reuse by dynamically reweighting stale samples rather than discarding them. We extensively evaluate R^2VPO on fine-tuning state-of-the-art LLMs, including DeepSeek-Distill-Qwen-1.5B and the openPangu-Embedded series (1B and 7B), across challenging mathematical reasoning benchmarks. Experimental results show that R^2VPO consistently achieves superior asymptotic performance, with average relative gains of up to 17% over strong clipping-based baselines, while requiring approximately 50% fewer rollouts to reach convergence. These findings establish ratio-variance control as a promising direction for improving both stability and data efficiency in RL-based LLM alignment.

  • 5 authors
·
Jan 6

Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation

Knowledge distillation is central to LLM post-training, yet its design space remains poorly understood, especially alongside reinforcement learning (RL). We show that the prevailing paradigms, off-policy distillation and on-policy distillation (OPD), implicitly couple two orthogonal choices: prefix source and token-level KL direction. This follows from decomposing sequence-level KL over autoregressive response distributions: forward KL pairs teacher prefixes with token-level forward KL, and reverse KL pairs student prefixes with token-level reverse KL. We argue this coupling is not intrinsic: decoupling the two axes yields four valid objectives. We establish gradient-level identities showing forward KL gives SFT-style cross-entropy matching with teacher soft targets, whereas reverse KL gives an RL-style policy-gradient objective with a dense teacher-student log-ratio reward, connecting them to off-policy SFT, DAgger-style on-policy SFT, offline-RL-style distillation, and OPD. We conduct an extensive controlled study on math reasoning, evaluating the four objectives both as standalone methods and as initializations for subsequent RL. The results reveal three tradeoffs: KL direction induces an accuracy-entropy tradeoff, prefix source a quality-compute tradeoff, and training length an accuracy-stability tradeoff. Motivated by these findings, we propose KL mixing and an entropy-gated length curriculum. KL mixing shows long-sequence distillation requires substantial forward-KL weight to prevent entropy collapse and length inflation without sacrificing accuracy. The entropy-gated length curriculum improves Avg@k and Pass@k by 3.6 and up to 5.8 points, and cuts average response length by roughly 3x versus fixed long-horizon training. Our results provide a framework and practical methods for designing reasoning distillation objectives that balance accuracy, diversity, compute, and RL behavior.

  • 6 authors
·
May 15

Deep Multi-View Enhancement Hashing for Image Retrieval

Hashing is an efficient method for nearest neighbor search in large-scale data space by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. However, large-scale high-speed retrieval through binary code has a certain degree of reduction in retrieval accuracy compared to traditional retrieval methods. We have noticed that multi-view methods can well preserve the diverse characteristics of data. Therefore, we try to introduce the multi-view deep neural network into the hash learning field, and design an efficient and innovative retrieval model, which has achieved a significant improvement in retrieval performance. In this paper, we propose a supervised multi-view hash model which can enhance the multi-view information through neural networks. This is a completely new hash learning method that combines multi-view and deep learning methods. The proposed method utilizes an effective view stability evaluation method to actively explore the relationship among views, which will affect the optimization direction of the entire network. We have also designed a variety of multi-data fusion methods in the Hamming space to preserve the advantages of both convolution and multi-view. In order to avoid excessive computing resources on the enhancement procedure during retrieval, we set up a separate structure called memory network which participates in training together. The proposed method is systematically evaluated on the CIFAR-10, NUS-WIDE and MS-COCO datasets, and the results show that our method significantly outperforms the state-of-the-art single-view and multi-view hashing methods.

  • 4 authors
·
Feb 1, 2020

Temporal Consistency Constrained Transferable Adversarial Attacks with Background Mixup for Action Recognition

Action recognition models using deep learning are vulnerable to adversarial examples, which are transferable across other models trained on the same data modality. Existing transferable attack methods face two major challenges: 1) they heavily rely on the assumption that the decision boundaries of the surrogate (a.k.a., source) model and the target model are similar, which limits the adversarial transferability; and 2) their decision boundary difference makes the attack direction uncertain, which may result in the gradient oscillation, weakening the adversarial attack. This motivates us to propose a Background Mixup-induced Temporal Consistency (BMTC) attack method for action recognition. From the input transformation perspective, we design a model-agnostic background adversarial mixup module to reduce the surrogate-target model dependency. In particular, we randomly sample one video from each category and make its background frame, while selecting the background frame with the top attack ability for mixup with the clean frame by reinforcement learning. Moreover, to ensure an explicit attack direction, we leverage the background category as guidance for updating the gradient of adversarial example, and design a temporal gradient consistency loss, which strengthens the stability of the attack direction on subsequent frames. Empirical studies on two video datasets, i.e., UCF101 and Kinetics-400, and one image dataset, i.e., ImageNet, demonstrate that our method significantly boosts the transferability of adversarial examples across several action/image recognition models. Our code is available at https://github.com/mlvccn/BMTC_TransferAttackVid.

  • 3 authors
·
May 23, 2025

Geometric Stability: The Missing Axis of Representations

Analysis of learned representations has a blind spot: it focuses on similarity, measuring how closely embeddings align with external references, but similarity reveals only what is represented, not whether that structure is robust. We introduce geometric stability, a distinct dimension that quantifies how reliably representational geometry holds under perturbation, and present Shesha, a framework for measuring it. Across 2,463 configurations in seven domains, we show that stability and similarity are empirically uncorrelated (ρapprox 0.01) and mechanistically distinct: similarity metrics collapse after removing the top principal components, while stability retains sensitivity to fine-grained manifold structure. This distinction yields actionable insights: for safety monitoring, stability acts as a functional geometric canary, detecting structural drift nearly 2times more sensitively than CKA while filtering out the non-functional noise that triggers false alarms in rigid distance metrics; for controllability, supervised stability predicts linear steerability (ρ= 0.89-0.96); for model selection, stability dissociates from transferability, revealing a geometric tax that transfer optimization incurs. Beyond machine learning, stability predicts CRISPR perturbation coherence and neural-behavioral coupling. By quantifying how reliably systems maintain structure, geometric stability provides a necessary complement to similarity for auditing representations across biological and computational systems.

  • 1 authors
·
Jan 14 2

"I May Not Have Articulated Myself Clearly": Diagnosing Dynamic Instability in LLM Reasoning at Inference Time

Reasoning failures in large language models (LLMs) are typically measured only at the end of a generation, yet many failures manifest as a process-level breakdown: the model "loses the thread" mid-reasoning. We study whether such breakdowns are detectable from inference-time observables available in standard APIs (token log probabilities), without any training or fine-tuning. We define a simple instability signal that combines consecutive-step distributional shift (JSD) and uncertainty (entropy), summarize each trace by its peak instability strength, and show that this signal reliably predicts failure. Across GSM8K and HotpotQA, instability strength predicts wrong answers with above-chance AUC and yields monotonic bucket-level accuracy decline at scale across model sizes. Crucially, we show that instability is not uniformly harmful: early instability can reflect subsequent stabilization and a correct final answer (corrective instability), whereas late instability is more often followed by failure (destructive instability), even at comparable peak magnitudes, indicating that recoverability depends not only on how strongly the distribution changes but also on when such changes occur relative to the remaining decoding horizon. The method is model-agnostic, training-free, and reproducible, and is presented as a diagnostic lens rather than a corrective or control mechanism.

  • 4 authors
·
Feb 2 3

Understanding and Diagnosing Deep Reinforcement Learning

Deep neural policies have recently been installed in a diverse range of settings, from biotechnology to automated financial systems. However, the utilization of deep neural networks to approximate the value function leads to concerns on the decision boundary stability, in particular, with regard to the sensitivity of policy decision making to indiscernible, non-robust features due to highly non-convex and complex deep neural manifolds. These concerns constitute an obstruction to understanding the reasoning made by deep neural policies, and their foundational limitations. Hence, it is crucial to develop techniques that aim to understand the sensitivities in the learnt representations of neural network policies. To achieve this we introduce a theoretically founded method that provides a systematic analysis of the unstable directions in the deep neural policy decision boundary across both time and space. Through experiments in the Arcade Learning Environment (ALE), we demonstrate the effectiveness of our technique for identifying correlated directions of instability, and for measuring how sample shifts remold the set of sensitive directions in the neural policy landscape. Most importantly, we demonstrate that state-of-the-art robust training techniques yield learning of disjoint unstable directions, with dramatically larger oscillations over time, when compared to standard training. We believe our results reveal the fundamental properties of the decision process made by reinforcement learning policies, and can help in constructing reliable and robust deep neural policies.

  • 1 authors
·
Jun 23, 2024 1

The Cylindrical Representation Hypothesis for Language Model Steering

Steering is a widely used technique for controlling large language models, yet its effects are often unstable and hard to predict. Existing theoretical accounts are largely based on the Linear Representation Hypothesis (LRH). While LRH assumes that concepts can be orthogonalized for lossless control, this idealized mapping fails in real representations and cannot account for the observed unpredictability of steering. By relaxing LRH's orthogonality assumption while preserving linear representations, we show that overlapping concept contributions naturally yield a sample-specific axis-orthogonal structure. We formalize this as the Cylindrical Representation Hypothesis (CRH). In CRH, a central axis captures the main difference between concept absence and presence and drives concept generation. A surrounding normal plane controls steering sensitivity by determining how easily the axis can activate the target concept. Within this plane, only specific sensitive sectors strongly facilitate concept activation, while other sectors can suppress or delay it. While the surrounding normal plane can be reliably identified from difference vectors, the sensitive sector cannot, introducing intrinsic uncertainty at the sector level. This uncertainty provides a principled explanation for why steering outcomes often fluctuate even when using well-aligned directions. Our experiments verify the existence of the cylindrical structure and demonstrate that CRH provides a valid and practical way to interpret model steering behavior in real settings: https://github.com/mbzuai-nlp/CRH.

  • 10 authors
·
May 2

Geometric coherence of single-cell CRISPR perturbations reveals regulatory architecture and predicts cellular stress

Genome engineering has achieved remarkable sequence-level precision, yet predicting the transcriptomic state that a cell will occupy after perturbation remains an open problem. Single-cell CRISPR screens measure how far cells move from their unperturbed state, but this effect magnitude ignores a fundamental question: do the cells move together? Two perturbations with identical magnitude can produce qualitatively different outcomes if one drives cells coherently along a shared trajectory while the other scatters them across expression space. We introduce a geometric stability metric, Shesha, that quantifies the directional coherence of single-cell perturbation responses as the mean cosine similarity between individual cell shift vectors and the mean perturbation direction. Across five CRISPR datasets (2,200+ perturbations spanning CRISPRa, CRISPRi, and pooled screens), stability correlates strongly with effect magnitude (Spearman ρ=0.75-0.97), with a calibrated cross-dataset correlation of 0.97. Crucially, discordant cases where the two metrics decouple expose regulatory architecture: pleiotropic master regulators such as CEBPA and GATA1 pay a "geometric tax," producing large but incoherent shifts, while lineage-specific factors such as KLF1 produce tightly coordinated responses. After controlling for magnitude, geometric instability is independently associated with elevated chaperone activation (HSPA5/BiP; ρ_{partial}=-0.34 and -0.21 across datasets), and the high-stability/high-stress quadrant is systematically depleted. The magnitude-stability relationship persists in scGPT foundation model embeddings, confirming it is a property of biological state space rather than linear projection. Perturbation stability provides a complementary axis for hit prioritization in screens, phenotypic quality control in cell manufacturing, and evaluation of in silico perturbation predictions.

  • 1 authors
·
Apr 16 2

The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent

In this paper, we study the implicit regularization of stochastic gradient descent (SGD) through the lens of {\em dynamical stability} (Wu et al., 2018). We start by revising existing stability analyses of SGD, showing how the Frobenius norm and trace of Hessian relate to different notions of stability. Notably, if a global minimum is linearly stable for SGD, then the trace of Hessian must be less than or equal to 2/eta, where eta denotes the learning rate. By contrast, for gradient descent (GD), the stability imposes a similar constraint but only on the largest eigenvalue of Hessian. We then turn to analyze the generalization properties of these stable minima, focusing specifically on two-layer ReLU networks and diagonal linear networks. Notably, we establish the {\em equivalence} between these metrics of sharpness and certain parameter norms for the two models, which allows us to show that the stable minima of SGD provably generalize well. By contrast, the stability-induced regularization of GD is provably too weak to ensure satisfactory generalization. This discrepancy provides an explanation of why SGD often generalizes better than GD. Note that the learning rate (LR) plays a pivotal role in the strength of stability-induced regularization. As the LR increases, the regularization effect becomes more pronounced, elucidating why SGD with a larger LR consistently demonstrates superior generalization capabilities. Additionally, numerical experiments are provided to support our theoretical findings.

  • 2 authors
·
May 27, 2023

VeCoR -- Velocity Contrastive Regularization for Flow Matching

Flow Matching (FM) has recently emerged as a principled and efficient alternative to diffusion models. Standard FM encourages the learned velocity field to follow a target direction; however, it may accumulate errors along the trajectory and drive samples off the data manifold, leading to perceptual degradation, especially in lightweight or low-step configurations. To enhance stability and generalization, we extend FM into a balanced attract-repel scheme that provides explicit guidance on both "where to go" and "where not to go." To be formal, we propose Velocity Contrastive Regularization (VeCoR), a complementary training scheme for flow-based generative modeling that augments the standard FM objective with contrastive, two-sided supervision. VeCoR not only aligns the predicted velocity with a stable reference direction (positive supervision) but also pushes it away from inconsistent, off-manifold directions (negative supervision). This contrastive formulation transforms FM from a purely attractive, one-sided objective into a two-sided training signal, regularizing trajectory evolution and improving perceptual fidelity across datasets and backbones. On ImageNet-1K 256times256, VeCoR yields 22\% and 35\% relative FID reductions on SiT-XL/2 and REPA-SiT-XL/2 backbones, respectively, and achieves further FID gains (32\% relative) on MS-COCO text-to-image generation, demonstrating consistent improvements in stability, convergence, and image quality, particularly in low-step and lightweight settings. Project page: https://p458732.github.io/VeCoR_Project_Page/

  • 5 authors
·
Nov 24, 2025

Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model

Robots capable of learning from demonstration (LfD) must exhibit stability while executing learned motion skills. To be effective in the real world, they should also remember multiple skills over time -- a capability lacking in current stable-LfD methods. We propose an approach to stable, continual LfD, and highlight the role of stability in improving continual learning. Our proposed hypernetwork generates the parameters of two neural networks: a trajectory learning dynamics model, and a trajectory-stabilizing Lyapunov function. These generated networks form a clock-augmented stable neural ODE solver (sNODE), a stable dynamics model that offers a superior stability-accuracy trade-off compared to the state-of-the-art. We further propose stochastic hypernetwork regularization with a single, uniformly-sampled task embedding, reducing the cumulative training time for N tasks from O(N^2) to O(N) without degrading performance on real-world tasks. We introduce high-dimensional variants of the popular LASA dataset to assess scalability and extend a dataset of robotic LfD tasks to assess real-world performance. We empirically evaluate our approach on multiple LfD datasets of varying complexity, including sequences of 7--26 tasks, trajectories of 2--32 dimensions, and real-world tasks involving position and orientation. Our thorough evaluation on multiple LfD datasets demonstrates that our approach sequentially learns and retains multiple motion skills without retraining on past demonstrations, and outperforms other relevant baselines in terms of trajectory errors, continual learning scores, and stability metrics. Notably, we show that stability greatly enhances continual learning performance, particularly in size-efficient chunked hypernetworks. Our code is available at https://github.com/sayantanauddy/clfd-snode.

  • 5 authors
·
May 10

Grokking at the Edge of Numerical Stability

Grokking, the sudden generalization that occurs after prolonged overfitting, is a surprising phenomenon challenging our understanding of deep learning. Although significant progress has been made in understanding grokking, the reasons behind the delayed generalization and its dependence on regularization remain unclear. In this work, we argue that without regularization, grokking tasks push models to the edge of numerical stability, introducing floating point errors in the Softmax function, which we refer to as Softmax Collapse (SC). We demonstrate that SC prevents grokking and that mitigating SC enables grokking without regularization. Investigating the root cause of SC, we find that beyond the point of overfitting, the gradients strongly align with what we call the na\"ive loss minimization (NLM) direction. This component of the gradient does not alter the model's predictions but decreases the loss by scaling the logits, typically by scaling the weights along their current direction. We show that this scaling of the logits explains the delay in generalization characteristic of grokking and eventually leads to SC, halting further learning. To validate our hypotheses, we introduce two key contributions that address the challenges in grokking tasks: StableMax, a new activation function that prevents SC and enables grokking without regularization, and perpGrad, a training algorithm that promotes quick generalization in grokking tasks by preventing NLM altogether. These contributions provide new insights into grokking, elucidating its delayed generalization, reliance on regularization, and the effectiveness of existing grokking-inducing methods. Code for this paper is available at https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability.

  • 4 authors
·
Jan 8, 2025

Steady-Forcing: Balancing Spatial Persistence and Motion Continuity in Long-Horizon Nature Video Diffusion

Autoregressive video diffusion models enable streaming generation but often degrade over long rollouts: static scene layouts drift, while mechanisms that improve spatial stability tend to suppress motion, causing natural flows such as water, fire, or smoke to stagnate. We study this stability-motion trade-off in fixed-camera long-horizon nature video generation, where the two failure modes can be more clearly separated than in moving-camera settings. We propose Steady-Forcing, a memory and training framework combining a persistent visual anchor (V-Sink), an exponential moving-average motion memory (EMA-Sink), block-relative temporal encoding, periodic cache purification, and distillation from a Wan2.1-14B teacher with motion-rewarded priors under task-focused configurations. Together, these components are designed to preserve background identity while sustaining visually plausible fluid dynamics over multi-minute autoregressive rollouts. Evaluations across seven baselines show that Steady-Forcing improves long horizon background consistency and imaging quality, while a blind user study indicates stronger perceived stability and motion continuity. The benchmark evaluation further suggest that generic VBench aggregate scores under-penalize fixed-camera artifacts as well as rewarding drift-induced optical flow as Dynamic Degree while not directly penalizing texture hardening or flow stagnation - motivating future task-specific benchmarks for static-camera nature-flow evaluation. Project page: https://minar09.github.io/steadyforcing/

A review of path following control strategies for autonomous robotic vehicles: theory, simulations, and experiments

This article presents an in-depth review of the topic of path following for autonomous robotic vehicles, with a specific focus on vehicle motion in two dimensional space (2D). From a control system standpoint, path following can be formulated as the problem of stabilizing a path following error system that describes the dynamics of position and possibly orientation errors of a vehicle with respect to a path, with the errors defined in an appropriate reference frame. In spite of the large variety of path following methods described in the literature we show that, in principle, most of them can be categorized in two groups: stabilization of the path following error system expressed either in the vehicle's body frame or in a frame attached to a "reference point" moving along the path, such as a Frenet-Serret (F-S) frame or a Parallel Transport (P-T) frame. With this observation, we provide a unified formulation that is simple but general enough to cover many methods available in the literature. We then discuss the advantages and disadvantages of each method, comparing them from the design and implementation standpoint. We further show experimental results of the path following methods obtained from field trials testing with under-actuated and fully-actuated autonomous marine vehicles. In addition, we introduce open-source Matlab and Gazebo/ROS simulation toolboxes that are helpful in testing path following methods prior to their integration in the combined guidance, navigation, and control systems of autonomous vehicles.

  • 9 authors
·
Apr 14, 2022

Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection

Despite significant progress in alignment, large language models (LLMs) remain vulnerable to adversarial attacks that elicit harmful behaviors. Activation steering techniques offer a promising inference-time intervention approach, but existing methods suffer from critical limitations: activation addition requires careful coefficient tuning and is sensitive to layer-specific norm variations, while directional ablation provides only binary control. Recent work on Angular Steering introduces continuous control via rotation in a 2D subspace, but its practical implementation violates norm preservation, causing distribution shift and generation collapse, particularly in models below 7B parameters. We propose Selective Steering, which addresses these limitations through two key innovations: (1) a mathematically rigorous norm-preserving rotation formulation that maintains activation distribution integrity, and (2) discriminative layer selection that applies steering only where feature representations exhibit opposite-signed class alignment. Experiments across nine models demonstrate that Selective Steering achieves 5.5x higher attack success rates than prior methods while maintaining zero perplexity violations and approximately 100\% capability retention on standard benchmarks. Our approach provides a principled, efficient framework for controllable and stable LLM behavior modification. Code: https://github.com/knoveleng/steering

The Impact of Environment Configurations on the Stability of AI-Enabled Systems

Nowadays, software systems tend to include Artificial Intelligence (AI) components. Changes in the operational environment have been known to negatively impact the stability of AI-enabled software systems by causing unintended changes in behavior. However, how an environment configuration impacts the behavior of such systems has yet to be explored. Understanding and quantifying the degree of instability caused by different environment settings can help practitioners decide the best environment configuration for the most stable AI systems. To achieve this goal, we performed experiments with eight different combinations of three key environment variables (operating system, Python version, and CPU architecture) on 30 open-source AI-enabled systems using the Travis CI platform. We determine the existence and the degree of instability introduced by each configuration using three metrics: the output of an AI component of the system (model performance), the time required to build and run the system (processing time), and the cost associated with building and running the system (expense). Our results indicate that changes in environment configurations lead to instability across all three metrics; however, it is observed more frequently with respect to processing time and expense rather than model performance. For example, between Linux and MacOS, instability is observed in 23\%, 96.67\%, and 100\% of the studied projects in model performance, processing time, and expense, respectively. Our findings underscore the importance of identifying the optimal combination of configuration settings to mitigate drops in model performance and reduce the processing time and expense before deploying an AI-enabled system.

  • 5 authors
·
Aug 5, 2024

From Syntax to Semantics: Geometric Stability as the Missing Axis of Perturbation Biology

The capacity to precisely edit genomes has outpaced our ability to predict the consequences. A cell can be genetically perfect and therapeutically useless: edited exactly as intended, yet unstable, drifting toward unintended fates, or selected for properties that compromise safety. This paradox reflects a deeper gap in how we evaluate biological intervention. Current frameworks excel at measuring what was done to a cell but remain blind to what the cell has become. We argue that this blindness stems from treating cells as collections of independent variables rather than as dynamical systems occupying positions on high-dimensional state manifolds. Drawing on Waddington's epigenetic landscape, we propose geometric stability as a missing axis of evaluation: the directional coherence of cellular responses to perturbation. This metric distinguishes interventions that guide cells coherently toward stable states from those that scatter them across the state manifold. Validation across diverse perturbation datasets reveals that geometric stability captures regulatory architecture invisible to conventional metrics, discriminating pleiotropic master regulators from lineage-specific factors without prior biological annotation. As precision medicine increasingly relies on cellular reprogramming, the question shifts from ``did the intervention occur?'' to ``is the resulting state stable?'' Geometric stability provides a framework for answering.

  • 1 authors
·
Apr 24

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

Most 3D object generators focus on aesthetic quality, often neglecting physical constraints necessary in applications. One such constraint is that the 3D object should be self-supporting, i.e., remains balanced under gravity. Prior approaches to generating stable 3D objects used differentiable physics simulators to optimize geometry at test-time, which is slow, unstable, and prone to local optima. Inspired by the literature on aligning generative models to external feedback, we propose Direct Simulation Optimization (DSO), a framework to use the feedback from a (non-differentiable) simulator to increase the likelihood that the 3D generator outputs stable 3D objects directly. We construct a dataset of 3D objects labeled with a stability score obtained from the physics simulator. We can then fine-tune the 3D generator using the stability score as the alignment metric, via direct preference optimization (DPO) or direct reward optimization (DRO), a novel objective, which we introduce, to align diffusion models without requiring pairwise preferences. Our experiments show that the fine-tuned feed-forward generator, using either DPO or DRO objective, is much faster and more likely to produce stable objects than test-time optimization. Notably, the DSO framework works even without any ground-truth 3D objects for training, allowing the 3D generator to self-improve by automatically collecting simulation feedback on its own outputs.

  • 4 authors
·
Mar 28, 2025 2

Analyzing Data Quality and Decay in Mega-Constellations: A Physics-Informed Machine Learning Approach

In the era of mega-constellations, the need for accurate and publicly available information has become fundamental for satellite operators to guarantee the safety of spacecrafts and the Low Earth Orbit (LEO) space environment. This study critically evaluates the accuracy and reliability of publicly available ephemeris data for a LEO mega-constellation - Starlink. The goal of this work is twofold: (i) compare and analyze the quality of the data against high-precision numerical propagation. (ii) Leverage Physics-Informed Machine Learning to extract relevant satellite quantities, such as non-conservative forces, during the decay process. By analyzing two months of real orbital data for approximately 1500 Starlink satellites, we identify discrepancies between high precision numerical algorithms and the published ephemerides, recognizing the use of simplified dynamics at fixed thresholds, planned maneuvers, and limitations in uncertainty propagations. Furthermore, we compare data obtained from multiple sources to track and analyze deorbiting satellites over the same period. Empirically, we extract the acceleration profile of satellites during deorbiting and provide insights relating to the effects of non-conservative forces during reentry. For non-deorbiting satellites, the position Root Mean Square Error (RMSE) was approximately 300 m, while for deorbiting satellites it increased to about 600 m. Through this in-depth analysis, we highlight potential limitations in publicly available data for accurate and robust Space Situational Awareness (SSA), and importantly, we propose a data-driven model of satellite decay in mega-constellations.

  • 3 authors
·
Oct 13, 2025

StableWorld: Towards Stable and Consistent Long Interactive Video Generation

In this paper, we explore the overlooked challenge of stability and temporal consistency in interactive video generation, which synthesizes dynamic and controllable video worlds through interactive behaviors such as camera movements and text prompts. Despite remarkable progress in world modeling, current methods still suffer from severe instability and temporal degradation, often leading to spatial drift and scene collapse during long-horizon interactions. To better understand this issue, we initially investigate the underlying causes of instability and identify that the major source of error accumulation originates from the same scene, where generated frames gradually deviate from the initial clean state and propagate errors to subsequent frames. Building upon this observation, we propose a simple yet effective method, StableWorld, a Dynamic Frame Eviction Mechanism. By continuously filtering out degraded frames while retaining geometrically consistent ones, StableWorld effectively prevents cumulative drift at its source, leading to more stable and temporal consistency of interactive generation. Promising results on multiple interactive video models, \eg, Matrix-Game, Open-Oasis, and Hunyuan-GameCraft, demonstrate that StableWorld is model-agnostic and can be applied to different interactive video generation frameworks to substantially improve stability, temporal consistency, and generalization across diverse interactive scenarios.

  • 9 authors
·
Jan 21

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

Sparse autoencoders (SAEs) are widely used to interpret neural network representations, but their utility depends on whether the learned features are reproducible across training runs. We study this question through feature stability: for each SAE feature, we estimate the probability that a similar feature reappears in an independently trained SAE. This yields a scalable per-feature signal that separates stable from unstable features. In a large-scale study across seeds, models, layers, dictionary sizes, and SAE variants, we find a pronounced functional asymmetry: stable features carry most of the reconstruction- and prediction-relevant signal, while unstable features have weak marginal impact and are dominated by low-frequency surface-form triggers in both activation statistics and automatic explanations. Geometrically, unstable features are individually non-reproducible but concentrate in reproducible lower-rank subspaces, suggesting that seed dependence often reflects basis ambiguity within a shared region of activation space rather than pure noise. A controlled synthetic model makes this mechanism explicit, showing that low-rank ground-truth features can be recovered at the subspace level while remaining non-identifiable as individual SAE latents across seeds. Finally, by pooling unique cross-seed features, we construct more stable SAEs while preserving explained variance in this setting. Together, these results show that unstable features are not merely failed or noisy latents: they have weak individual functional impact, but reflect reproducible low-dimensional structure that standard SAEs resolve differently across seeds.

t-tech T-Tech
·
Jun 9 2

Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient method for fine-tuning Large Langauge Models. It updates the weight matrix as W=W_0+sBA, where W_0 is the original frozen weight, s is a scaling factor and A,B are trainable low-rank matrices. Despite its robust empirical effectiveness, the theoretical foundations of LoRA remain insufficiently understood, particularly with respect to feature learning stability. In this paper, we first establish that, LoRA can, in principle, naturally achieve and sustain stable feature learning (i.e., be self-stabilized) under appropriate hyper-parameters and initializations of A and B. However, we also uncover a fundamental limitation that the necessary non-zero initialization of A compromises self-stability, leading to suboptimal performances. To address this challenge, we propose Stable-LoRA, a weight-shrinkage optimization strategy that dynamically enhances stability of LoRA feature learning. By progressively shrinking A during the earliest training steps, Stable-LoRA is both theoretically and empirically validated to effectively eliminate instability of LoRA feature learning while preserving the benefits of the non-zero start. Experiments show that Stable-LoRA consistently outperforms other baselines across diverse models and tasks, with no additional memory usage and only negligible computation overheads. The code is available at https://github.com/Yize-Wu/Stable-LoRA.

  • 4 authors
·
Mar 4

Small-scale proxies for large-scale Transformer training instabilities

Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study training stability and instability at smaller scales. First, we focus on two sources of training instability described in previous work: the growth of logits in attention layers (Dehghani et al., 2023) and divergence of the output logits from the log probabilities (Chowdhery et al., 2022). By measuring the relationship between learning rate and loss across scales, we show that these instabilities also appear in small models when training at high learning rates, and that mitigations previously employed at large scales are equally effective in this regime. This prompts us to investigate the extent to which other known optimizer and model interventions influence the sensitivity of the final loss to changes in the learning rate. To this end, we study methods such as warm-up, weight decay, and the muParam (Yang et al., 2022), and combine techniques to train small models that achieve similar losses across orders of magnitude of learning rate variation. Finally, to conclude our exploration we study two cases where instabilities can be predicted before they emerge by examining the scaling behavior of model activation and gradient norms.

  • 16 authors
·
Sep 25, 2023 2

DOA Estimation for Low-Altitude Networks: HAD Architectures, Methods, and Challenges

With the rapid expansion of low-altitude economy (LAE) services and the growing demand for integrated sensing and communication (ISAC) in air-ground networks, reliable direction-of-arrival (DOA) estimation has become essential for both directional communication and sensing functions. DOA underpins beam alignment, spatial-reuse scheduling, and ISAC-critical tasks such as airspace situational awareness and multi-target monitoring. Hybrid analog-digital (HAD) architectures have emerged as a practical solution for large-aperture directional operation under stringent radio frequency (RF), analog-to-digital converter (ADC), and size, weight, and power (SWaP) constraints. However, HAD compresses antenna-domain observations through analog combining, fundamentally reshaping the measurement model and introducing new algorithmic and system-level challenges for DOA estimation. This article first reviews the principles and representative architectures of HAD, highlighting their advantages for scalable beam-centric and ISAC-oriented operation in LAE scenarios. We then provide a structured overview of HAD-enabled DOA estimation methodologies, including spatial covariance matrix (SCM) reconstruction, multi-combiner scan-based acquisition, and pilot-aided estimation, along with key design tradeoffs. Finally, we discuss open challenges and outline reliability-driven research directions toward robust, deployable HAD-enabled DOA solutions for practical ISAC-enabled low-altitude environments.

  • 7 authors
·
Mar 31

EAGAN: Efficient Two-stage Evolutionary Architecture Search for GANs

Generative adversarial networks (GANs) have proven successful in image generation tasks. However, GAN training is inherently unstable. Although many works try to stabilize it by manually modifying GAN architecture, it requires much expertise. Neural architecture search (NAS) has become an attractive solution to search GANs automatically. The early NAS-GANs search only generators to reduce search complexity but lead to a sub-optimal GAN. Some recent works try to search both generator (G) and discriminator (D), but they suffer from the instability of GAN training. To alleviate the instability, we propose an efficient two-stage evolutionary algorithm-based NAS framework to search GANs, namely EAGAN. We decouple the search of G and D into two stages, where stage-1 searches G with a fixed D and adopts the many-to-one training strategy, and stage-2 searches D with the optimal G found in stage-1 and adopts the one-to-one training and weight-resetting strategies to enhance the stability of GAN training. Both stages use the non-dominated sorting method to produce Pareto-front architectures under multiple objectives (e.g., model size, Inception Score (IS), and Fr\'echet Inception Distance (FID)). EAGAN is applied to the unconditional image generation task and can efficiently finish the search on the CIFAR-10 dataset in 1.2 GPU days. Our searched GANs achieve competitive results (IS=8.81pm0.10, FID=9.91) on the CIFAR-10 dataset and surpass prior NAS-GANs on the STL-10 dataset (IS=10.44pm0.087, FID=22.18). Source code: https://github.com/marsggbo/EAGAN.

  • 5 authors
·
Nov 29, 2021

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Reinforcement learning has become essential for strengthening the reasoning abilities of large language models, yet current exploration mechanisms remain fundamentally misaligned with how these models actually learn. Entropy bonuses and external semantic comparators encourage surface level variation but offer no guarantee that sampled trajectories differ in the update directions that shape optimization. We propose G2RL, a gradient guided reinforcement learning framework in which exploration is driven not by external heuristics but by the model own first order update geometry. For each response, G2RL constructs a sequence level feature from the model final layer sensitivity, obtainable at negligible cost from a standard forward pass, and measures how each trajectory would reshape the policy by comparing these features within a sampled group. Trajectories that introduce novel gradient directions receive a bounded multiplicative reward scaler, while redundant or off manifold updates are deemphasized, yielding a self referential exploration signal that is naturally aligned with PPO style stability and KL control. Across math and general reasoning benchmarks (MATH500, AMC, AIME24, AIME25, GPQA, MMLUpro) on Qwen3 base 1.7B and 4B models, G2RL consistently improves pass@1, maj@16, and pass@k over entropy based GRPO and external embedding methods. Analyzing the induced geometry, we find that G2RL expands exploration into substantially more orthogonal and often opposing gradient directions while maintaining semantic coherence, revealing that a policy own update space provides a far more faithful and effective basis for guiding exploration in large language model reinforcement learning.

tencent Tencent
·
Dec 17, 2025 2

Direction-Preserving Number Representations

Low-precision number formats are widely used in modern machine learning systems due to their efficiency. Accurate direction representation is key to the accuracy of vector operations. This work precisely explores the extent to which the direction of a vector can be represented by selecting its scalar elements from a common finite alphabet of a given size. This is standard practice in machine learning, where low-precision significands may be narrow-width floating-point or integer values. A geometric framework is introduced for analyzing the directional coverage of such product-structured codes. This work analytically quantifies the suboptimality gap between such product-structured codes and spherical codes for the vector as a whole, in both low and asymptotically high dimensions. Furthermore, within the product code class, it is proven that the standard formats of two's complement, fixed-point, and floating-point are suboptimal, again with quantified gap, pointing to the potential to develop new scalar number formats. Such scalar alphabets are numerically optimized across multiple block dimensions for directional coverage, including the dimension used in NVIDIA's NVFP4 format. Experimental results are presented comparing the performance of standard formats and the optimized alphabet. We find that for four bits, NVIDIA's choice of E2M1 closely approximates the optimized alphabet, providing a geometric explanation for its strong performance in low-precision machine learning workloads and an analytical understanding of the link between that superiority and block size. We provide open-source formal proofs in Lean for the theorems in this work, along with the experimental code and the optimized alphabets obtained.

  • 2 authors
·
May 7

Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches

Despite the prevailing transition from single-task to multi-task approaches in video anomaly detection, we observe that many adopt sub-optimal frameworks for individual proxy tasks. Motivated by this, we contend that optimizing single-task frameworks can advance both single- and multi-task approaches. Accordingly, we leverage middle-frame prediction as the primary proxy task, and introduce an effective hybrid framework designed to generate accurate predictions for normal frames and flawed predictions for abnormal frames. This hybrid framework is built upon a bi-directional structure that seamlessly integrates both vision transformers and ConvLSTMs. Specifically, we utilize this bi-directional structure to fully analyze the temporal dimension by predicting frames in both forward and backward directions, significantly boosting the detection stability. Given the transformer's capacity to model long-range contextual dependencies, we develop a convolutional temporal transformer that efficiently associates feature maps from all context frames to generate attention-based predictions for target frames. Furthermore, we devise a layer-interactive ConvLSTM bridge that facilitates the smooth flow of low-level features across layers and time-steps, thereby strengthening predictions with fine details. Anomalies are eventually identified by scrutinizing the discrepancies between target frames and their corresponding predictions. Several experiments conducted on public benchmarks affirm the efficacy of our hybrid framework, whether used as a standalone single-task approach or integrated as a branch in a multi-task approach. These experiments also underscore the advantages of merging vision transformers and ConvLSTMs for video anomaly detection.

  • 5 authors
·
Apr 20, 2025

R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning

Large reasoning models (LRMs) aim to solve diverse and complex problems through structured reasoning. Recent advances in group-based policy optimization methods have shown promise in enabling stable advantage estimation without reliance on process-level annotations. However, these methods rely on advantage gaps induced by high-quality samples within the same batch, which makes the training process fragile and inefficient when intra-group advantages collapse under challenging tasks. To address these problems, we propose a reinforcement learning mechanism named \textbf{R^3} that along three directions: (1) a cross-context \underline{\textbf{R}eplay} strategy that maintains the intra-group advantage by recalling valuable examples from historical trajectories of the same query, (2) an in-context self-\underline{\textbf{R}eflection} mechanism enabling models to refine outputs by leveraging past failures, and (3) a structural entropy \underline{\textbf{R}anking reward}, which assigns relative rewards to truncated or failed samples by ranking responses based on token-level entropy patterns, capturing both local exploration and global stability. We implement our method on Deepseek-R1-Distill-Qwen-1.5B and train it on the DeepscaleR-40k in the math domain. Experiments demonstrate our method achieves SoTA performance on several math benchmarks, representing significant improvements and fewer reasoning tokens over the base models. Code and model will be released.

  • 8 authors
·
Jan 27

Predictor-Feedback CACC for Vehicular Platoons with Actuation and Communication Delays Based on a Multiple-Predecessor-Following CTH Nominal Strategy

We develop a predictor-feedback cooperative adaptive cruise control (CACC) design relying on a multiple-predecessor-following (MPF) topology-based nominal delay-free CACC law. We consider vehicular platoons with heterogeneous vehicles, whose dynamics are described by a third-order linear system subject to actuation delay, along with vehicle-to-vehicle (V2V) communication delay. The design achieves individual vehicle stability, string stability, and zero, steady-state speed/spacing tracking errors, for any value of the actuation delay. The proofs of individual vehicle stability, string stability, and regulation rely on employment of an input-output approach on the frequency domain, capitalizing on the delay-compensating property of the design, which enables as to derive explicit string stability conditions on control and vehicle models parameters. The theoretical guarantees of string stability and the respective conditions on parameters are illustrated also numerically. We present consistent simulation results, for a ten-vehicle platoon, illustrating the potential of the design in traffic throughput improvement, as compared with a predictor-feedback CACC design in which, each ego vehicle's controller utilizes information only from a single preceding vehicle. We also present simulation results in a realistic scenario in which the leading vehicle's trajectory is obtained from NGSIM data.

  • 3 authors
·
Apr 6

Curl Descent: Non-Gradient Learning Dynamics with Sign-Diverse Plasticity

Gradient-based algorithms are a cornerstone of artificial neural network training, yet it remains unclear whether biological neural networks use similar gradient-based strategies during learning. Experiments often discover a diversity of synaptic plasticity rules, but whether these amount to an approximation to gradient descent is unclear. Here we investigate a previously overlooked possibility: that learning dynamics may include fundamentally non-gradient "curl"-like components while still being able to effectively optimize a loss function. Curl terms naturally emerge in networks with inhibitory-excitatory connectivity or Hebbian/anti-Hebbian plasticity, resulting in learning dynamics that cannot be framed as gradient descent on any objective. To investigate the impact of these curl terms, we analyze feedforward networks within an analytically tractable student-teacher framework, systematically introducing non-gradient dynamics through neurons exhibiting rule-flipped plasticity. Small curl terms preserve the stability of the original solution manifold, resulting in learning dynamics similar to gradient descent. Beyond a critical value, strong curl terms destabilize the solution manifold. Depending on the network architecture, this loss of stability can lead to chaotic learning dynamics that destroy performance. In other cases, the curl terms can counterintuitively speed learning compared to gradient descent by allowing the weight dynamics to escape saddles by temporarily ascending the loss. Our results identify specific architectures capable of supporting robust learning via diverse learning rules, providing an important counterpoint to normative theories of gradient-based learning in neural networks.

  • 3 authors
·
Oct 3, 2025

JAWS: Enhancing Long-term Rollout of Neural Operators via Spatially-Adaptive Jacobian Regularization

Data-driven surrogate models improve the efficiency of simulating continuous dynamical systems, yet their autoregressive rollouts are often limited by instability and spectral blow-up. While global regularization techniques can enforce contractive dynamics, they uniformly damp high-frequency features, introducing a contraction-dissipation dilemma. Furthermore, long-horizon trajectory optimization methods that explicitly correct drift are bottlenecked by memory constraints. In this work, we propose Jacobian-Adaptive Weighting for Stability (JAWS), a probabilistic regularization strategy designed to mitigate these limitations. By framing operator learning as Maximum A Posteriori (MAP) estimation with spatially heteroscedastic uncertainty, JAWS dynamically modulates the regularization strength based on local physical complexity. This allows the model to enforce contraction in smooth regions to suppress noise, while relaxing constraints near singular features to preserve gradients, effectively realizing a behavior similar to numerical shock-capturing schemes. Experiments demonstrate that this spatially-adaptive prior serves as an effective spectral pre-conditioner, which reduces the base operator's burden of handling high-frequency instabilities. This reduction enables memory-efficient, short-horizon trajectory optimization to match or exceed the long-term accuracy of long-horizon baselines. Evaluated on the 1D viscous Burgers' equation, our hybrid approach improves long-term stability, shock fidelity, and out-of-distribution generalization while reducing training computational costs.

  • 2 authors
·
Mar 4

Brain-Grounded Axes for Reading and Steering LLM States

Interpretability methods for large language models (LLMs) typically derive directions from textual supervision, which can lack external grounding. We propose using human brain activity not as a training signal but as a coordinate system for reading and steering LLM states. Using the SMN4Lang MEG dataset, we construct a word-level brain atlas of phase-locking value (PLV) patterns and extract latent axes via ICA. We validate axes with independent lexica and NER-based labels (POS/log-frequency used as sanity checks), then train lightweight adapters that map LLM hidden states to these brain axes without fine-tuning the LLM. Steering along the resulting brain-derived directions yields a robust lexical (frequency-linked) axis in a mid TinyLlama layer, surviving perplexity-matched controls, and a brain-vs-text probe comparison shows larger log-frequency shifts (relative to the text probe) with lower perplexity for the brain axis. A function/content axis (axis 13) shows consistent steering in TinyLlama, Qwen2-0.5B, and GPT-2, with PPL-matched text-level corroboration. Layer-4 effects in TinyLlama are large but inconsistent, so we treat them as secondary (Appendix). Axis structure is stable when the atlas is rebuilt without GPT embedding-change features or with word2vec embeddings (|r|=0.64-0.95 across matched axes), reducing circularity concerns. Exploratory fMRI anchoring suggests potential alignment for embedding change and log frequency, but effects are sensitive to hemodynamic modeling assumptions and are treated as population-level evidence only. These results support a new interface: neurophysiology-grounded axes provide interpretable and controllable handles for LLM behavior.

  • 1 authors
·
Dec 22, 2025 2

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Large Language Models (LLMs) can significantly improve their reasoning capabilities by interacting with external tools, a paradigm known as Tool-Integrated Reasoning (TIR). However, extending TIR to multi-turn scenarios using Reinforcement Learning (RL) is often hindered by training instability and performance collapse. We identify that such instability is primarily caused by a distributional drift from external tool feedback, leading to the generation of low-probability tokens. This issue compounds over successive turns, causing catastrophic gradient norm explosions that derail the training process. To address this challenge, we introduce SimpleTIR , a plug-and-play algorithm that stabilizes multi-turn TIR training. Its core strategy is to identify and filter out trajectories containing void turns, i.e., turns that yield neither a code block nor a final answer. By removing these problematic trajectories from the policy update, SimpleTIR effectively blocks the harmful, high-magnitude gradients, thus stabilizing the learning dynamics. Extensive experiments show that SimpleTIR achieves state-of-the-art performance on challenging math reasoning benchmarks, notably elevating the AIME24 score from a text-only baseline of 22.1 to 50.5 when starting from the Qwen2.5-7B base model. Furthermore, by avoiding the constraints of supervised fine-tuning, SimpleTIR encourages the model to discover diverse and sophisticated reasoning patterns, such as self-correction and cross-validation.

  • 7 authors
·
Sep 2, 2025 2

Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens

Activation steering presupposes that task-relevant behaviors correspond to linear directions in activation space -- directions that should both steer the model and be readable along the unembedding. Function vectors (FVs), extracted as mean differences across ICL demonstrations, are the canonical test case; the prediction: steering and decoding succeed or fail together. Across 12 tasks, 6 models from 3 families, and 4,032 directed cross-template pairs, we find the opposite. FV steering routinely succeeds where the logit lens cannot decode the correct answer at any intermediate layer, while the converse -- decodable without steerable -- is nearly empty (3 of 72). The gap is not representational dialect. A diagonal tuned lens closes 1 of 14 steerable-not-decodable cases; a 2-layer MLP probe with a Hewitt \& Liang control closes 5 of 10 via nonlinearly encoded structure but leaves 5 invisible to every decoder tested. Even at > 0.90 steering accuracy, projecting the FV through the unembedding yields incoherent token distributions: FVs encode computational instructions, not answer directions. A model-family asymmetry sharpens the picture. Mistral FVs rewrite intermediate representations, while Llama and Gemma FVs steer the final output without leaving a logit-lens-visible trace, corroborated by three signals (post-steering deltas, activation-patching recovery, FV norm-transfer correlations). A previously reported negative cosine-transfer correlation dissolves at scale, adding at most ΔR^2 = 0.011 beyond task identity. These results decompose the linear representation hypothesis into linear decodability and linear steerability and show they come apart opposite to intuition, with implications for safety monitoring: vocabulary-projection tools are blind to FV-style interventions on widely deployed model families.

  • 1 authors
·
May 7

FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing

We propose FlowAnchor, a training-free framework for stable and efficient inversion-free, flow-based video editing. Inversion-free editing methods have recently shown impressive efficiency and structure preservation in images by directly steering the sampling trajectory with an editing signal. However, extending this paradigm to videos remains challenging, often failing in multi-object scenes or with increased frame counts. We identify the root cause as the instability of the editing signal in high-dimensional video latent spaces, which arises from imprecise spatial localization and length-induced magnitude attenuation. To overcome this challenge, FlowAnchor explicitly anchors both where to edit and how strongly to edit. It introduces Spatial-aware Attention Refinement, which enforces consistent alignment between textual guidance and spatial regions, and Adaptive Magnitude Modulation, which adaptively preserves sufficient editing strength. Together, these mechanisms stabilize the editing signal and guide the flow-based evolution toward the desired target distribution. Extensive experiments demonstrate that FlowAnchor achieves more faithful, temporally coherent, and computationally efficient video editing across challenging multi-object and fast-motion scenarios. The project page is available at https://cuc-mipg.github.io/FlowAnchor.github.io/.

SteerFlow: Steering Rectified Flows for Faithful Inversion-Based Image Editing

Recent advances in flow-based generative models have enabled training-free, text-guided image editing by inverting an image into its latent noise and regenerating it under a new target conditional guidance. However, existing methods struggle to preserve source fidelity: higher-order solvers incur additional model inferences, truncated inversion constrains editability, and feature injection methods lack architectural transferability. To address these limitations, we propose SteerFlow, a model-agnostic editing framework with strong theoretical guarantees on source fidelity. In the forward process, we introduce an Amortized Fixed-Point Solver that implicitly straightens the forward trajectory by enforcing velocity consistency across consecutive timesteps, yielding a high-fidelity inverted latent. In the backward process, we introduce Trajectory Interpolation, which adaptively blends target-editing and source-reconstruction velocities to keep the editing trajectory anchored to the source. To further improve background preservation, we introduce an Adaptive Masking mechanism that spatially constrains the editing signal with concept-guided segmentation and source-target velocity differences. Extensive experiments on FLUX.1-dev and Stable Diffusion 3.5 Medium demonstrate that SteerFlow consistently achieves better editing quality than existing methods. Finally, we show that SteerFlow extends naturally to a complex multi-turn editing paradigm without accumulating drift.

  • 4 authors
·
Apr 1

On the Collapse of Generative Paths: A Criterion and Correction for Diffusion Steering

Inference-time steering enables pretrained diffusion/flow models to be adapted to new tasks without retraining. A widely used approach is the ratio-of-densities method, which defines a time-indexed target path by reweighting probability-density trajectories from multiple models with positive, or in some cases, negative exponents. This construction, however, harbors a critical and previously unformalized failure mode: Marginal Path Collapse, where intermediate densities become non-normalizable even though endpoints remain valid. Collapse arises systematically when composing heterogeneous models trained on different noise schedules or datasets, including a common setting in molecular design where de-novo, conformer, and pocket-conditioned models must be combined for tasks such as flexible-pose scaffold decoration. We provide a novel and complete solution for the problem. First, we derive a simple path existence criterion that predicts exactly when collapse occurs from noise schedules and exponents alone. Second, we introduce Adaptive path Correction with Exponents (ACE), which extends Feynman-Kac steering to time-varying exponents and guarantees a valid probability path. On a synthetic 2D benchmark and on flexible-pose scaffold decoration, ACE eliminates collapse and enables high-guidance compositional generation, improving distributional and docking metrics over constant-exponent baselines and even specialized task-specific scaffold decoration models. Our work turns ratio-of-densities steering with heterogeneous experts from an unstable heuristic into a reliable tool for controllable generation.

  • 9 authors
·
Dec 10, 2025

Learning H-Infinity Locomotion Control

Stable locomotion in precipitous environments is an essential capability of quadruped robots, demanding the ability to resist various external disturbances. However, recent learning-based policies only use basic domain randomization to improve the robustness of learned policies, which cannot guarantee that the robot has adequate disturbance resistance capabilities. In this paper, we propose to model the learning process as an adversarial interaction between the actor and a newly introduced disturber and ensure their optimization with H_{infty} constraint. In contrast to the actor that maximizes the discounted overall reward, the disturber is responsible for generating effective external forces and is optimized by maximizing the error between the task reward and its oracle, i.e., "cost" in each iteration. To keep joint optimization between the actor and the disturber stable, our H_{infty} constraint mandates the bound of ratio between the cost to the intensity of the external forces. Through reciprocal interaction throughout the training phase, the actor can acquire the capability to navigate increasingly complex physical disturbances. We verify the robustness of our approach on quadrupedal locomotion tasks with Unitree Aliengo robot, and also a more challenging task with Unitree A1 robot, where the quadruped is expected to perform locomotion merely on its hind legs as if it is a bipedal robot. The simulated quantitative results show improvement against baselines, demonstrating the effectiveness of the method and each design choice. On the other hand, real-robot experiments qualitatively exhibit how robust the policy is when interfering with various disturbances on various terrains, including stairs, high platforms, slopes, and slippery terrains. All code, checkpoints, and real-world deployment guidance will be made public.

  • 6 authors
·
Apr 22, 2024 1

Optimization by Directional Attacks: Solving Problems with Neural Network Surrogates

This paper tackles optimization problems whose objective and constraints involve a trained Neural Network (NN), where the goal is to maximize f(Phi(x)) subject to c(Phi(x)) leq 0, with f smooth, c general and non-stringent, and Phi an already trained and possibly nonwhite-box NN. We address two challenges regarding this problem: identifying ascent directions for local search, and ensuring reliable convergence towards relevant local solutions. To this end, we re-purpose the notion of directional NN attacks as efficient optimization subroutines, since directional NN attacks use the neural structure of Phi to compute perturbations of x that steer Phi(x) in prescribed directions. Precisely, we develop an attack operator that computes attacks of Phi at any x along the direction nabla f(Phi(x)). Then, we propose a hybrid algorithm combining the attack operator with derivative-free optimization (DFO) techniques, designed for numerical reliability by remaining oblivious to the structure of the problem. We consider the cDSM algorithm, which offers asymptotic guarantees to converge to a local solution under mild assumptions on the problem. The resulting method alternates between attack-based steps for heuristic yet fast local intensification and cDSM steps for certified convergence and numerical reliability. Experiments on three problems show that this hybrid approach consistently outperforms standard DFO baselines.

  • 2 authors
·
Oct 1, 2025

Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration

Designing learnable information-theoretic objectives for robot exploration remains challenging. Such objectives aim to guide exploration toward data that reduces uncertainty in model parameters, yet it is often unclear what information the collected data can actually reveal. Although reinforcement learning (RL) can optimize a given objective, constructing objectives that reflect parametric learnability is difficult in high-dimensional robotic systems. Many parameter directions are weakly observable or unidentifiable, and even when identifiable directions are selected, omitted directions can still influence exploration and distort information measures. To address this challenge, we propose Quasi-Optimal Experimental Design (Q{\footnotesize OED}), an adaptive information objective grounded in optimal experimental design. Q{\footnotesize OED} (i) performs eigenspace analysis of the Fisher information matrix to identify an observable subspace and select identifiable parameter directions, and (ii) modifies the exploration objective to emphasize these directions while suppressing nuisance effects from non-critical parameters. Under bounded nuisance influence and limited coupling between critical and nuisance directions, Q{\footnotesize OED} provides a constant-factor approximation to the ideal information objective that explores all parameters. We evaluate Q{\footnotesize OED} on simulated and real-world navigation and manipulation tasks, where identifiable-direction selection and nuisance suppression yield performance improvements of 35.23{\percent} and 21.98{\percent}, respectively. When integrated as an exploration objective in model-based policy optimization, Q{\footnotesize OED} further improves policy performance over established RL baselines.

  • 5 authors
·
May 11

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

Predicting the trajectories of road agents is essential for autonomous driving systems. The recent mainstream methods follow a static paradigm, which predicts the future trajectory by using a fixed duration of historical frames. These methods make the predictions independently even at adjacent time steps, which leads to potential instability and temporal inconsistency. As successive time steps have largely overlapping historical frames, their forecasting should have intrinsic correlation, such as overlapping predicted trajectories should be consistent, or be different but share the same motion goal depending on the road situation. Motivated by this, in this work, we introduce HPNet, a novel dynamic trajectory forecasting method. Aiming for stable and accurate trajectory forecasting, our method leverages not only historical frames including maps and agent states, but also historical predictions. Specifically, we newly design a Historical Prediction Attention module to automatically encode the dynamic relationship between successive predictions. Besides, it also extends the attention range beyond the currently visible window benefitting from the use of historical predictions. The proposed Historical Prediction Attention together with the Agent Attention and Mode Attention is further formulated as the Triple Factorized Attention module, serving as the core design of HPNet.Experiments on the Argoverse and INTERACTION datasets show that HPNet achieves state-of-the-art performance, and generates accurate and stable future trajectories. Our code are available at https://github.com/XiaolongTang23/HPNet.

  • 6 authors
·
Apr 9, 2024

Zero-Shot Vision-and-Language Navigation with Collision Mitigation in Continuous Environment

We propose the zero-shot Vision-and-Language Navigation with Collision Mitigation (VLN-CM), which takes these considerations. VLN-CM is composed of four modules and predicts the direction and distance of the next movement at each step. We utilize large foundation models for each modules. To select the direction, we use the Attention Spot Predictor (ASP), View Selector (VS), and Progress Monitor (PM). The ASP employs a Large Language Model (e.g. ChatGPT) to split navigation instructions into attention spots, which are objects or scenes at the location to move to (e.g. a yellow door). The VS selects from panorama images provided at 30-degree intervals the one that includes the attention spot, using CLIP similarity. We then choose the angle of the selected image as the direction to move in. The PM uses a rule-based approach to decide which attention spot to focus on next, among multiple spots derived from the instructions. If the similarity between the current attention spot and the visual observations decreases consecutively at each step, the PM determines that the agent has passed the current spot and moves on to the next one. For selecting the distance to move, we employed the Open Map Predictor (OMP). The OMP uses panorama depth information to predict an occupancy mask. We then selected a collision-free distance in the predicted direction based on the occupancy mask. We evaluated our method using the validation data of VLN-CE. Our approach showed better performance than several baseline methods, and the OPM was effective in mitigating collisions for the agent.

  • 4 authors
·
Oct 7, 2024

Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection

Multivariate time series anomalies often manifest as shifts in cross-channel dependencies rather than simple amplitude excursions. In autonomous driving, for instance, a steering command might be internally consistent but decouple from the resulting lateral acceleration. Residual-based detectors can miss such anomalies when flexible sequence models still reconstruct signals plausibly despite altered coordination. We introduce AxonAD, an unsupervised detector that treats multi-head attention query evolution as a short horizon predictable process. A gradient-updated reconstruction pathway is coupled with a history-only predictor that forecasts future query vectors from past context. This is trained via a masked predictor-target objective against an exponential moving average (EMA) target encoder. At inference, reconstruction error is combined with a tail-aggregated query mismatch score, which measures cosine deviation between predicted and target queries on recent timesteps. This dual approach provides sensitivity to structural dependency shifts while retaining amplitude-level detection. On proprietary in-vehicle telemetry with interval annotations and on the TSB-AD multi-variate suite (17 datasets, 180 series) with threshold-free and range-aware metrics, AxonAD improves ranking quality and temporal localization over strong baselines. Ablations confirm that query prediction and combined scoring are the primary drivers of the observed gains. Code is available at the URL https://github.com/iis-esslingen/AxonAD.

Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers

Rotary positional embeddings (RoPE) are widely used in large language models to encode token positions through multiplicative rotations, yet their behavior at long context lengths remains poorly characterized. In this work, we reinterpret RoPE as phase modulation applied to a bank of complex oscillators, enabling analysis through classical signal processing theory. Under this formulation, we derive principled lower bounds on the RoPE base parameter that are necessary to preserve positional coherence over a target context length. These include a fundamental aliasing bound, analogous to a Nyquist limit, and a DC-component stability bound that constrains phase drift in low-frequency positional modes. We further extend this analysis to deep transformers, showing that repeated rotary modulation across layers compounds angular misalignment, tightening the base requirement as depth increases. Complementing these results, we derive a precision-dependent upper bound on the RoPE base arising from finite floating-point resolution. Beyond this limit, incremental phase updates become numerically indistinguishable, leading to positional erasure even in the absence of aliasing. Together, the lower and upper bounds define a precision- and depth-dependent feasibility region a Goldilocks zone for long-context transformers. We validate the framework through a comprehensive case study of state-of-the-art models, including LLaMA, Mistral, and DeepSeek variants, showing that observed successes, failures, and community retrofits align closely with the predicted bounds. Notably, models that violate the stability bound exhibit attention collapse and long-range degradation, while attempts to scale beyond one million tokens encounter a hard precision wall independent of architecture or training.

  • 1 authors
·
Feb 11