Avey B1 Base (Experimental)
⚠️ Warning: This model is an experimental research artifact. It is intended for academic and research purposes only to explore the capabilities of the Avey-B architecture. It is not optimized or intended for production environments.
⚠️ Compatibility Warning: This checkpoint was developed and tested using transformers v4. It is NOT guaranteed to work with transformers v5. Please pin your environment to a 4.x version.
Model Summary
Avey-B is a bi-directional sequence model architecture based on the Avey architecture, which departs from the standard Transformer mechanism. Instead of self-attention, Avey-B utilizes a Ranker-Processor architecture:
- The Ranker: Partitions the sequence into splits and retrieves the most relevant contexts for each split.
- The Neural Processor: Contextualizes these splits using a dynamic parameterization scheme and a neural compression module.
This design allows Avey-B to scale efficiently to long contexts while maintaining the bi-directional contextualization strengths of BERT-style models.
Project Links
- Paper: Avey-B (arXiv:2602.15814)
- Code Repository: github.avey.ai/avey-b
Model Details
This checkpoint differs slightly from the configuration described in the associated research paper. It serves as a standalone release for users to experiment with the architecture.
- Architecture: Avey-B
- Dataset: FineWeb-edu (350BT split)
- Training Volume: ~220 Billion tokens
- Context Window: Unlimited
- Parameters: 164M
For detailed insights into the architectural innovations (decoupled parameterization, stability-oriented normalization, etc.) and benchmark evaluations of the architecture, please refer to the linked paper.
Tokenization & Input Formatting
Note on Tokenizer: Avey-B uses a BPE tokenizer (similar to GPT-2) rather than BERT's WordPiece. This means spaces are often treated as part of the token (e.g., " word" vs "word").
- Fine-Tuning: For standard tasks like Sequence Classification or NER, you can pass raw text directly. The tokenizer handles spacing naturally, and the model will learn the correct patterns during training.
- Manual Prompting: If you are manually constructing strings with special tokens (like
[MASK]), be aware that the tokenizer is sensitive to whitespace. Unlike BERT, it is often more effective to omit the space before a special token (e.g., use"text[MASK]"instead of"text [MASK]").
In addition, the Avey-B tokenizer includes all the special tokens used by BERT for compatibility, but only the [MASK] token has been used during pre-training, the rest, if necessary, will have to be trained during fine-tuning.
Usage
This model is compatible with HuggingFace transformers (v4). You can use it as a drop-in replacement for BERT-based models, provided you allow remote code execution with trust_remote_code=True.
1. Inference (Feature Extraction)
Get contextualized embeddings for downstream tasks:
import torch
from transformers import AutoModel, AutoTokenizer
model_id = "avey-ai/avey-b1-base-exp"
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
text = "Avey-B offers a new approach to bi-directional encoding."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# Access the last hidden state
last_hidden_states = outputs.last_hidden_state
print(f"Output shape: {last_hidden_states.shape}")
2. Masked Language Modeling (Pipeline)
import torch
from transformers import pipeline
from pprint import pprint
pipe = pipeline(
"fill-mask",
model="avey-ai/avey-b1-base-exp",
dtype=torch.bfloat16,
trust_remote_code=True
)
input_text = "Every morning, she drinks a cup of[MASK] before going to work."
results = pipe(input_text)
pprint(results)
3. Fine-Tuning
Since Avey-B is compatible with the AutoModel API, it can be fine-tuned using the standard HuggingFace Trainer class or accelerate, just like BERT.
Citation
If you use this model or architecture in your research, please cite the original paper:
@inproceedings{2026aveyb,
title={Avey-B},
author={Acharya, Devang and Hammoud, Mohammad},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}
- Downloads last month
- 110