BERT-updated / README.md
Taykhoom's picture
Upload folder using huggingface_hub
ac7f7ab verified
metadata
library_name: transformers
tags:
  - bert
  - language-model
license: apache-2.0

BERT-updated

Standard BERT architecture with flash_attention_2 and sdpa support added.

This is a shared code repository — it contains no pretrained weights. It is used as the code backend for biological sequence models that share the vanilla BERT architecture (post-LN transformer, learned absolute position embeddings) but have model-specific vocabularies and hyperparameters:

Each of those repos stores weights, tokenizer, and config; their auto_map in config.json points here for the modeling code.

What was changed from stock transformers.BertModel

The standard HF BertModel (transformers 4.57.6) supports sdpa but not flash_attention_2. This repo adds a complete attn_implementation dispatch:

Backend Class Notes
eager BertSelfAttention Standard scaled dot-product, identical to original BERT
sdpa BertSdpaSelfAttention F.scaled_dot_product_attention, bool mask -> additive float mask
flash_attention_2 BertFlashSelfAttention flash_attn_varlen_func for padded inputs, flash_attn_func for unpadded

The rest of the architecture (embeddings, FFN, pooler, weight layout) is unchanged.

Usage

Do not load this repo directly. Load one of the model repos listed above:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Taykhoom/RNABERT", trust_remote_code=True)
model = AutoModel.from_pretrained("Taykhoom/RNABERT", trust_remote_code=True)

# Flash Attention 2
model = AutoModel.from_pretrained("Taykhoom/UTRBERT-3mer", trust_remote_code=True,
                                   attn_implementation="flash_attention_2")

Credits

Modeling code authored primarily by Claude Code and reviewed manually by Taykhoom Dalal.

License

Apache 2.0.