Taykhoom
/

BERT-updated

Feature Extraction

Model card Files Files and versions

BERT-updated / README.md

Taykhoom's picture

Upload folder using huggingface_hub

ac7f7ab verified 3 days ago

|

history blame contribute delete

2.45 kB

	---
	library_name: transformers
	tags:
	- bert
	- language-model
	license: apache-2.0
	---

	# BERT-updated

	Standard BERT architecture with `flash_attention_2` and `sdpa` support added.

	This is a shared code repository — it contains no pretrained weights. It is used
	as the code backend for biological sequence models that share the vanilla BERT
	architecture (post-LN transformer, learned absolute position embeddings) but have
	model-specific vocabularies and hyperparameters:

	- [Taykhoom/RNABERT](https://huggingface.co/Taykhoom/RNABERT)
	- [Taykhoom/UTRBERT-3mer](https://huggingface.co/Taykhoom/UTRBERT-3mer), [4mer](https://huggingface.co/Taykhoom/UTRBERT-4mer), [5mer](https://huggingface.co/Taykhoom/UTRBERT-5mer), [6mer](https://huggingface.co/Taykhoom/UTRBERT-6mer)
	- [Taykhoom/DNABERT-3mer](https://huggingface.co/Taykhoom/DNABERT-3mer), [4mer](https://huggingface.co/Taykhoom/DNABERT-4mer), [5mer](https://huggingface.co/Taykhoom/DNABERT-5mer), [6mer](https://huggingface.co/Taykhoom/DNABERT-6mer)

	Each of those repos stores weights, tokenizer, and config; their `auto_map` in
	`config.json` points here for the modeling code.

	## What was changed from stock `transformers.BertModel`

	The standard HF `BertModel` (transformers 4.57.6) supports `sdpa` but not
	`flash_attention_2`. This repo adds a complete `attn_implementation` dispatch:

	\| Backend \| Class \| Notes \|
	\|---\|---\|---\|
	\| `eager` \| `BertSelfAttention` \| Standard scaled dot-product, identical to original BERT \|
	\| `sdpa` \| `BertSdpaSelfAttention` \| `F.scaled_dot_product_attention`, bool mask -> additive float mask \|
	\| `flash_attention_2` \| `BertFlashSelfAttention` \| `flash_attn_varlen_func` for padded inputs, `flash_attn_func` for unpadded \|

	The rest of the architecture (embeddings, FFN, pooler, weight layout) is unchanged.

	## Usage

	Do not load this repo directly. Load one of the model repos listed above:

	```python
	from transformers import AutoTokenizer, AutoModel

	tokenizer = AutoTokenizer.from_pretrained("Taykhoom/RNABERT", trust_remote_code=True)
	model = AutoModel.from_pretrained("Taykhoom/RNABERT", trust_remote_code=True)

	# Flash Attention 2
	model = AutoModel.from_pretrained("Taykhoom/UTRBERT-3mer", trust_remote_code=True,
	attn_implementation="flash_attention_2")
	```

	## Credits

	Modeling code authored primarily by [Claude Code](https://claude.ai/code) and reviewed
	manually by Taykhoom Dalal.

	## License

	Apache 2.0.