Instructions to use Taykhoom/BERT-updated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Taykhoom/BERT-updated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Taykhoom/BERT-updated", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Taykhoom/BERT-updated", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 2,451 Bytes
ac7f7ab | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | ---
library_name: transformers
tags:
- bert
- language-model
license: apache-2.0
---
# BERT-updated
Standard BERT architecture with `flash_attention_2` and `sdpa` support added.
This is a **shared code repository** — it contains no pretrained weights. It is used
as the code backend for biological sequence models that share the vanilla BERT
architecture (post-LN transformer, learned absolute position embeddings) but have
model-specific vocabularies and hyperparameters:
- [Taykhoom/RNABERT](https://huggingface.co/Taykhoom/RNABERT)
- [Taykhoom/UTRBERT-3mer](https://huggingface.co/Taykhoom/UTRBERT-3mer), [4mer](https://huggingface.co/Taykhoom/UTRBERT-4mer), [5mer](https://huggingface.co/Taykhoom/UTRBERT-5mer), [6mer](https://huggingface.co/Taykhoom/UTRBERT-6mer)
- [Taykhoom/DNABERT-3mer](https://huggingface.co/Taykhoom/DNABERT-3mer), [4mer](https://huggingface.co/Taykhoom/DNABERT-4mer), [5mer](https://huggingface.co/Taykhoom/DNABERT-5mer), [6mer](https://huggingface.co/Taykhoom/DNABERT-6mer)
Each of those repos stores weights, tokenizer, and config; their `auto_map` in
`config.json` points here for the modeling code.
## What was changed from stock `transformers.BertModel`
The standard HF `BertModel` (transformers 4.57.6) supports `sdpa` but not
`flash_attention_2`. This repo adds a complete `attn_implementation` dispatch:
| Backend | Class | Notes |
|---|---|---|
| `eager` | `BertSelfAttention` | Standard scaled dot-product, identical to original BERT |
| `sdpa` | `BertSdpaSelfAttention` | `F.scaled_dot_product_attention`, bool mask -> additive float mask |
| `flash_attention_2` | `BertFlashSelfAttention` | `flash_attn_varlen_func` for padded inputs, `flash_attn_func` for unpadded |
The rest of the architecture (embeddings, FFN, pooler, weight layout) is unchanged.
## Usage
Do not load this repo directly. Load one of the model repos listed above:
```python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Taykhoom/RNABERT", trust_remote_code=True)
model = AutoModel.from_pretrained("Taykhoom/RNABERT", trust_remote_code=True)
# Flash Attention 2
model = AutoModel.from_pretrained("Taykhoom/UTRBERT-3mer", trust_remote_code=True,
attn_implementation="flash_attention_2")
```
## Credits
Modeling code authored primarily by [Claude Code](https://claude.ai/code) and reviewed
manually by Taykhoom Dalal.
## License
Apache 2.0.
|