Fill-Mask
Transformers
Safetensors
English
avey-b
custom_code

Avey B1 Base (Experimental)

⚠️ Warning: This model is an experimental research artifact. It is intended for academic and research purposes only to explore the capabilities of the Avey-B architecture. It is not optimized or intended for production environments.

⚠️ Compatibility Warning: This checkpoint was developed and tested using transformers v4. It is NOT guaranteed to work with transformers v5. Please pin your environment to a 4.x version.

Model Summary

Avey-B is a bi-directional sequence model architecture based on the Avey architecture, which departs from the standard Transformer mechanism. Instead of self-attention, Avey-B utilizes a Ranker-Processor architecture:

  1. The Ranker: Partitions the sequence into splits and retrieves the most relevant contexts for each split.
  2. The Neural Processor: Contextualizes these splits using a dynamic parameterization scheme and a neural compression module.

This design allows Avey-B to scale efficiently to long contexts while maintaining the bi-directional contextualization strengths of BERT-style models.

Project Links

Model Details

This checkpoint differs slightly from the configuration described in the associated research paper. It serves as a standalone release for users to experiment with the architecture.

  • Architecture: Avey-B
  • Dataset: FineWeb-edu (350BT split)
  • Training Volume: ~220 Billion tokens
  • Context Window: Unlimited
  • Parameters: 164M

For detailed insights into the architectural innovations (decoupled parameterization, stability-oriented normalization, etc.) and benchmark evaluations of the architecture, please refer to the linked paper.

Tokenization & Input Formatting

Note on Tokenizer: Avey-B uses a BPE tokenizer (similar to GPT-2) rather than BERT's WordPiece. This means spaces are often treated as part of the token (e.g., " word" vs "word").

  • Fine-Tuning: For standard tasks like Sequence Classification or NER, you can pass raw text directly. The tokenizer handles spacing naturally, and the model will learn the correct patterns during training.
  • Manual Prompting: If you are manually constructing strings with special tokens (like [MASK]), be aware that the tokenizer is sensitive to whitespace. Unlike BERT, it is often more effective to omit the space before a special token (e.g., use "text[MASK]" instead of "text [MASK]").

In addition, the Avey-B tokenizer includes all the special tokens used by BERT for compatibility, but only the [MASK] token has been used during pre-training, the rest, if necessary, will have to be trained during fine-tuning.

Usage

This model is compatible with HuggingFace transformers (v4). You can use it as a drop-in replacement for BERT-based models, provided you allow remote code execution with trust_remote_code=True.

1. Inference (Feature Extraction)

Get contextualized embeddings for downstream tasks:

import torch
from transformers import AutoModel, AutoTokenizer

model_id = "avey-ai/avey-b1-base-exp"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)

text = "Avey-B offers a new approach to bi-directional encoding."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# Access the last hidden state
last_hidden_states = outputs.last_hidden_state
print(f"Output shape: {last_hidden_states.shape}")

2. Masked Language Modeling (Pipeline)

import torch
from transformers import pipeline
from pprint import pprint

pipe = pipeline(
    "fill-mask",
    model="avey-ai/avey-b1-base-exp",
    dtype=torch.bfloat16,
    trust_remote_code=True
)

input_text = "Every morning, she drinks a cup of[MASK] before going to work." 
results = pipe(input_text)
pprint(results)

3. Fine-Tuning

Since Avey-B is compatible with the AutoModel API, it can be fine-tuned using the standard HuggingFace Trainer class or accelerate, just like BERT.

Citation

If you use this model or architecture in your research, please cite the original paper:

@inproceedings{2026aveyb,
  title={Avey-B},
  author={Acharya, Devang and Hammoud, Mohammad},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}
Downloads last month
110
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train avey-ai/avey-b1-base-exp

Collection including avey-ai/avey-b1-base-exp

Paper for avey-ai/avey-b1-base-exp