Papers
arxiv:2602.15814

Avey-B

Published on Feb 17
Authors:
,

Abstract

Compact pretrained bidirectional encoders based on Avey architecture outperform Transformer-based models on token classification and information retrieval tasks while scaling more efficiently to long contexts.

AI-generated summary

Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.

Community

Hi @dacharya-avey is there any open implementation available for pretraining these models 🤔

·

I have found it: https://github.com/rimads/avey-b

Thanks for releasing!!!

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.15814 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.15814 in a Space README.md to link it from this page.

Collections including this paper 1