iterativebert-base
A pre-trained IterativeBert encoder.
Model Description
IterativeBert is a novel encoder architecture that applies a single transformer layer iteratively for deep representation learning with minimal parameters.
Architecture Details
| Parameter | Value |
|---|---|
| Hidden Size | 312 |
| Attention Heads | 6 |
| L Cycles (Refinement) | 8 |
| Residual Mode | add |
| Vocab Size | 30522 |
| Max Position Embeddings | 2048 |
Usage
from iterative_bert.model import IterativeBert
# Load the model
encoder = IterativeBert.from_pretrained("paul-english/iterativebert-base")
# Use for encoding
outputs = encoder(input_ids, attention_mask=attention_mask)
hidden_states = outputs.last_hidden_state
Training Details
Training details not provided.
Limitations
- Best used as a backbone for fine-tuning on downstream tasks
- Sequence length limited to 2048 tokens
Citation
If you use this model, please cite:
@software{iterative_bert,
title = {Iterative Bert},
author = {English, Paul M}
url = {https://github.com/paul-english/iterative_bert}
}
- Downloads last month
- 19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Evaluation results
- Loss on SlimPajama-627B-DCself-reported1.806