Instructions to use Taykhoom/BERT-updated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Taykhoom/BERT-updated with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Taykhoom/BERT-updated", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Taykhoom/BERT-updated", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| tags: | |
| - bert | |
| - language-model | |
| license: apache-2.0 | |
| # BERT-updated | |
| Standard BERT architecture with `flash_attention_2` and `sdpa` support added. | |
| This is a **shared code repository** — it contains no pretrained weights. It is used | |
| as the code backend for biological sequence models that share the vanilla BERT | |
| architecture (post-LN transformer, learned absolute position embeddings) but have | |
| model-specific vocabularies and hyperparameters: | |
| - [Taykhoom/RNABERT](https://huggingface.co/Taykhoom/RNABERT) | |
| - [Taykhoom/UTRBERT-3mer](https://huggingface.co/Taykhoom/UTRBERT-3mer), [4mer](https://huggingface.co/Taykhoom/UTRBERT-4mer), [5mer](https://huggingface.co/Taykhoom/UTRBERT-5mer), [6mer](https://huggingface.co/Taykhoom/UTRBERT-6mer) | |
| - [Taykhoom/DNABERT-3mer](https://huggingface.co/Taykhoom/DNABERT-3mer), [4mer](https://huggingface.co/Taykhoom/DNABERT-4mer), [5mer](https://huggingface.co/Taykhoom/DNABERT-5mer), [6mer](https://huggingface.co/Taykhoom/DNABERT-6mer) | |
| Each of those repos stores weights, tokenizer, and config; their `auto_map` in | |
| `config.json` points here for the modeling code. | |
| ## What was changed from stock `transformers.BertModel` | |
| The standard HF `BertModel` (transformers 4.57.6) supports `sdpa` but not | |
| `flash_attention_2`. This repo adds a complete `attn_implementation` dispatch: | |
| | Backend | Class | Notes | | |
| |---|---|---| | |
| | `eager` | `BertSelfAttention` | Standard scaled dot-product, identical to original BERT | | |
| | `sdpa` | `BertSdpaSelfAttention` | `F.scaled_dot_product_attention`, bool mask -> additive float mask | | |
| | `flash_attention_2` | `BertFlashSelfAttention` | `flash_attn_varlen_func` for padded inputs, `flash_attn_func` for unpadded | | |
| The rest of the architecture (embeddings, FFN, pooler, weight layout) is unchanged. | |
| ## Usage | |
| Do not load this repo directly. Load one of the model repos listed above: | |
| ```python | |
| from transformers import AutoTokenizer, AutoModel | |
| tokenizer = AutoTokenizer.from_pretrained("Taykhoom/RNABERT", trust_remote_code=True) | |
| model = AutoModel.from_pretrained("Taykhoom/RNABERT", trust_remote_code=True) | |
| # Flash Attention 2 | |
| model = AutoModel.from_pretrained("Taykhoom/UTRBERT-3mer", trust_remote_code=True, | |
| attn_implementation="flash_attention_2") | |
| ``` | |
| ## Credits | |
| Modeling code authored primarily by [Claude Code](https://claude.ai/code) and reviewed | |
| manually by Taykhoom Dalal. | |
| ## License | |
| Apache 2.0. | |