Instructions to use Taykhoom/SpliceBERT-1024nt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Taykhoom/SpliceBERT-1024nt with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="Taykhoom/SpliceBERT-1024nt", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("Taykhoom/SpliceBERT-1024nt", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -53,7 +53,7 @@ Verified on GPU with PyTorch 2.7 / CUDA 11.8.
|
|
| 53 |
|
| 54 |
## Related Models
|
| 55 |
|
| 56 |
-
See the full [SpliceBERT collection](
|
| 57 |
|
| 58 |
| Model | Context | Training data | Notes |
|
| 59 |
|---|---|---|---|
|
|
@@ -65,22 +65,19 @@ See the full [SpliceBERT collection](<COLLECTION_URL>).
|
|
| 65 |
|
| 66 |
### Embedding generation
|
| 67 |
|
| 68 |
-
|
| 69 |
-
|
| 70 |
|
| 71 |
```python
|
| 72 |
import torch
|
| 73 |
-
from transformers import
|
| 74 |
|
| 75 |
-
tokenizer =
|
| 76 |
-
model =
|
| 77 |
model.eval()
|
| 78 |
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
seq_spaced = " ".join(list(seq))
|
| 82 |
-
|
| 83 |
-
enc = tokenizer(seq_spaced, return_tensors="pt")
|
| 84 |
|
| 85 |
with torch.no_grad():
|
| 86 |
out = model(**enc, output_hidden_states=True)
|
|
@@ -98,10 +95,10 @@ layer3_emb = out.hidden_states[3] # (1, seq_len+2, 512)
|
|
| 98 |
|
| 99 |
```python
|
| 100 |
import torch
|
| 101 |
-
from transformers import
|
| 102 |
|
| 103 |
-
tokenizer =
|
| 104 |
-
model =
|
| 105 |
model.eval()
|
| 106 |
|
| 107 |
seq = "A C G [MASK] A C G T"
|
|
|
|
| 53 |
|
| 54 |
## Related Models
|
| 55 |
|
| 56 |
+
See the full [SpliceBERT collection](https://huggingface.co/collections/Taykhoom/splicebert-6a20b72e9bec05b79ce009aa).
|
| 57 |
|
| 58 |
| Model | Context | Training data | Notes |
|
| 59 |
|---|---|---|---|
|
|
|
|
| 65 |
|
| 66 |
### Embedding generation
|
| 67 |
|
| 68 |
+
The tokenizer automatically handles U->T conversion and single-nucleotide spacing.
|
| 69 |
+
Pass raw sequences directly.
|
| 70 |
|
| 71 |
```python
|
| 72 |
import torch
|
| 73 |
+
from transformers import AutoTokenizer, AutoModel
|
| 74 |
|
| 75 |
+
tokenizer = AutoTokenizer.from_pretrained("Taykhoom/SpliceBERT-1024nt", trust_remote_code=True)
|
| 76 |
+
model = AutoModel.from_pretrained("Taykhoom/SpliceBERT-1024nt", trust_remote_code=True)
|
| 77 |
model.eval()
|
| 78 |
|
| 79 |
+
seq = "ACGUACGUACGUACGU" # U->T handled automatically
|
| 80 |
+
enc = tokenizer(seq, return_tensors="pt")
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
with torch.no_grad():
|
| 83 |
out = model(**enc, output_hidden_states=True)
|
|
|
|
| 95 |
|
| 96 |
```python
|
| 97 |
import torch
|
| 98 |
+
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
| 99 |
|
| 100 |
+
tokenizer = AutoTokenizer.from_pretrained("Taykhoom/SpliceBERT-1024nt", trust_remote_code=True)
|
| 101 |
+
model = AutoModelForMaskedLM.from_pretrained("Taykhoom/SpliceBERT-1024nt", trust_remote_code=True)
|
| 102 |
model.eval()
|
| 103 |
|
| 104 |
seq = "A C G [MASK] A C G T"
|