iterativebert-base / README.md
paul-english's picture
Upload stilted-cherry-125 (best train loss=1.8056, step 14500)
d448ff0 verified
metadata
license: apache-2.0
library_name: transformers
language: en
tags:
  - tiner
  - iterative-bert
  - encoder
  - pytorch
model-index:
  - name: iterativebert-base
    results:
      - task:
          type: fill-mask
        dataset:
          name: SlimPajama-627B-DC
          type: MBZUAI-LLM/SlimPajama-627B-DC
        metrics:
          - name: Loss
            type: loss
            value: 1.8056

iterativebert-base

A pre-trained IterativeBert encoder.

Model Description

IterativeBert is a novel encoder architecture that applies a single transformer layer iteratively for deep representation learning with minimal parameters.

Architecture Details

Parameter Value
Hidden Size 312
Attention Heads 6
L Cycles (Refinement) 8
Residual Mode add
Vocab Size 30522
Max Position Embeddings 2048

Usage

from iterative_bert.model import IterativeBert

# Load the model
encoder = IterativeBert.from_pretrained("paul-english/iterativebert-base")

# Use for encoding
outputs = encoder(input_ids, attention_mask=attention_mask)
hidden_states = outputs.last_hidden_state

Training Details

Training details not provided.

Limitations

  • Best used as a backbone for fine-tuning on downstream tasks
  • Sequence length limited to 2048 tokens

Citation

If you use this model, please cite:

@software{iterative_bert,
  title = {Iterative Bert},
  author = {English, Paul M}
  url = {https://github.com/paul-english/iterative_bert}
}