LofiAmazon
/

BarcodeBERT-Entire-BOLD

Model card Files Files and versions

vshulev commited on Jun 3, 2024

Commit

016495d

·

verified ·

1 Parent(s): 74db789

Add example usage

Files changed (1) hide show

README.md +23 -0

README.md CHANGED Viewed

@@ -8,3 +8,26 @@ The model has been trained for a total of 17 epochs.
 The loss curve is shown:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6659d7d2f5106a7f0abeaa3d/6Ypq8hLPW3ssOToGcYHDn.png)

 The loss curve is shown:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6659d7d2f5106a7f0abeaa3d/6Ypq8hLPW3ssOToGcYHDn.png)
+## Example Usage
+```
+from transformers import PreTrainedTokenizerFast, BertForMaskedLM
+model = BertForMaskedLM.from_pretrained("LofiAmazon/BarcodeBERT-Entire-BOLD")
+model.eval()
+tokenizer = PreTrainedTokenizerFast.from_pretrained("LofiAmazon/BarcodeBERT-Entire-BOLD")
+# The DNA sequence you want to predict.
+# There should be a space after every 4 characters.
+# The sequence may also have unknown characters which are not A,C,T,G.
+# The maximum DNA sequence length (not counting spaces) should be 660 characters
+dna_sequence = "AACA ATGT ATTT A-T- TTCG CCCT TGTG AATT TATT ..."
+inputs = tokenizer(dna_sequence, return_tensors="pt")
+# Obtain a DNA embedding, which is a vector of length 768.
+# The embedding is a representation of this DNA sequence in the model's latent space.
+embedding = model(**inputs).hidden_states[-1].mean(1).squeeze()
+```