A Geometric Account of Activation Steering through Angle-Norm Decomposition
Abstract
Research challenges the assumption that hidden-state norms carry concept-relevant information in language models, demonstrating that concepts are primarily represented in angular structure while norm remains crucial for steering stability and effectiveness across multiple models.
Linear activation steering has gained popularity as a simple and empirically effective way to control language model behavior. More recently, spherical steering paradigms have been proposed to address limitations of additive interventions, often motivated by the assumption that hidden-state norm does not carry concept-relevant information. In this work, we revisit this assumption through a controlled empirical study designed to disentangle the roles of angular and radial components. We show that steering methods differ mainly in how they couple two geometric effects: changing a token's angular alignment with a concept direction and changing its hidden-state norm. Across seven language models, we find that concepts are represented primarily in angular structure, supporting the motivation for spherical methods, but that norm remains important for the stability and downstream effects of steering. Our results explain why interventions with similar concept-level effects can behave differently, and suggest that activation steering should be parameterized by interpretable angular and radial components of the intervention, rather than by a single additive coefficient that entangles these two effects.
Community
A Geometric Account of Activation Steering through Angle–Norm Decomposition
We study activation steering, the family of interventions that modify hidden states to control language model behavior, as a geometric operation on representation space. Standard additive steering applies a scalar-weighted concept direction, which simultaneously alters two distinct properties of the hidden state: its angular alignment with the concept direction and its norm. We disentangle these effects through a controlled decomposition, and ask which is responsible for semantic control and which governs downstream stability.
Each hidden state is parameterized by its norm r and unit direction u, with u further decomposed into a component along the concept direction s and an orthogonal residual v. This yields a two-parameter family of interventions over an angular target γ and a radial scale β, and situates six steering methods (additive CAA, renormalized CAA-r, matched CAA-m, additive AS, spherical S and a norm-scaled variant SN) within a single framework that controls for the geometric content of the intervention.
Across seven decoder-only language models (1B to 70B parameters) and four concept datasets, we find that concept-discriminative information is encoded almost entirely in activation direction: linear probes on unit-normalized hidden states match probes on raw states, while norm-only probes remain near chance. This supports the angular hypothesis motivating recent spherical methods. However, holding the angular target γ fixed and varying only the radial scale β shows that norm is not semantically inert. At high steering strengths, strict norm preservation produces substantial increases in perplexity and losses in downstream capability, while a modest radial increase recovers much of this stability with negligible effect on the concept metric.
These results suggest that activation steering should be parameterized neither by a single additive coefficient nor by an angular operation under strict norm preservation, but as a two-parameter intervention in which angular and radial components are controlled independently. We further hypothesize that hidden-state norm corresponds, in part, to the effective representational capacity available at a token: under strong angular intervention, a modest expansion of this capacity relieves competition between the steered concept and other context-relevant features, accounting for the observed stability gain.
Get this paper in your agent:
hf papers read 2606.06735 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper