metadata
license: apache-2.0
library_name: transformers
language: en
tags:
- tiner
- iterative-bert
- encoder
- pytorch
model-index:
- name: iterativebert-base
results:
- task:
type: fill-mask
dataset:
name: SlimPajama-627B-DC
type: MBZUAI-LLM/SlimPajama-627B-DC
metrics:
- name: Loss
type: loss
value: 1.8056
iterativebert-base
A pre-trained IterativeBert encoder.
Model Description
IterativeBert is a novel encoder architecture that applies a single transformer layer iteratively for deep representation learning with minimal parameters.
Architecture Details
| Parameter | Value |
|---|---|
| Hidden Size | 312 |
| Attention Heads | 6 |
| L Cycles (Refinement) | 8 |
| Residual Mode | add |
| Vocab Size | 30522 |
| Max Position Embeddings | 2048 |
Usage
from iterative_bert.model import IterativeBert
# Load the model
encoder = IterativeBert.from_pretrained("paul-english/iterativebert-base")
# Use for encoding
outputs = encoder(input_ids, attention_mask=attention_mask)
hidden_states = outputs.last_hidden_state
Training Details
Training details not provided.
Limitations
- Best used as a backbone for fine-tuning on downstream tasks
- Sequence length limited to 2048 tokens
Citation
If you use this model, please cite:
@software{iterative_bert,
title = {Iterative Bert},
author = {English, Paul M}
url = {https://github.com/paul-english/iterative_bert}
}