Papers
arxiv:2605.07297

Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers

Published on May 8
Authors:
,

Abstract

Post hoc generalization bounds for Transformers use adaptive spectral complexity measures that scale more favorably with depth and dimension than traditional norm-based approaches.

AI-generated summary

Understanding why trained Transformers generalize well is a fundamental problem in modern machine learning theory, and complexity-based generalization bounds provide a principled way to study this question. While existing norm-based bounds for Transformers remove the explicit polynomial dependence on the hidden dimension, they typically impose fixed norm constraints specified a priori and can exhibit unfavorable exponential dependence on depth. In this paper, we derive spectrum-adaptive post hoc generalization bounds for multi-layer Transformers. Under layerwise spectral norm control, the bounds are expressed in terms of layerwise Schatten quantities of the query-key, value, and feedforward weight matrices. Since the Schatten indices need not be fixed a priori and can instead be selected after training, separately for each matrix type and layer, the bounds adaptively trade off spectral complexity against the dimension- and depth-dependent factors according to the learned singular-value profiles. Empirical comparisons of BERT-adapted proxies for the leading complexity factors suggest that the proxies induced by our bounds grow more slowly with depth and hidden dimension than the corresponding norm-based proxies. Overall, our results provide a complexity-based perspective on how the spectral structure of trained Transformers is reflected in generalization analyses.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.07297
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.07297 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.07297 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.07297 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.