neuralbioinfo
/

prokbert-mini-c-phage

Text Classification

sequence embedding

genomic language models

promoter-prediction

Model card Files Files and versions

ligeti commited on Feb 16, 2024

Commit

9de6a1b

·

verified ·

1 Parent(s): 2dc2d26

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ tags:
 - promoter-prediction
 - phage
 ---
-## ProkBERT-mini-long-phage Model
 This finetuned model is specifically designed for promoter identification and is based on the [ProkBERT-mini-c model](https://huggingface.co/neuralbioinfo/prokbert-mini-long).
@@ -37,14 +37,14 @@ The following example demonstrates how to use the ProkBERT-mini-promoter model f
 ```python
 from prokbert.prokbert_tokenizer import ProkBERTTokenizer
 from transformers import MegatronBertForSequenceClassification
-finetuned_model = "neuralbioinfo/prokbert-mini-long-phage"
 kmer = 1
 shift= 1
 tok_params = {'kmer' : kmer,
              'shift' : shift}
 tokenizer = ProkBERTTokenizer(tokenization_params=tok_params)
-model = BertForBinaryClassificationWithPooling.from_pretrained(finetuned_model)
 sequence = 'CACCGCATGGAGATCGGCACCTACTTCGACAAGCTGGAGGCGCTGCTGAAGGAGTGGTACGAGGCGCGCGGGGGTGAGGCATGACGGACTGGCAAGAGGAGCAGCGTCAGCGC'
 inputs = tokenizer(sequence, return_tensors="pt")
 # Ensure that inputs have a batch dimension

 - promoter-prediction
 - phage
 ---
+## ProkBERT-mini-c-phage Model
 This finetuned model is specifically designed for promoter identification and is based on the [ProkBERT-mini-c model](https://huggingface.co/neuralbioinfo/prokbert-mini-long).
 ```python
 from prokbert.prokbert_tokenizer import ProkBERTTokenizer
 from transformers import MegatronBertForSequenceClassification
+finetuned_model = "neuralbioinfo/prokbert-mini-c-phage"
 kmer = 1
 shift= 1
 tok_params = {'kmer' : kmer,
              'shift' : shift}
 tokenizer = ProkBERTTokenizer(tokenization_params=tok_params)
+model = MegatronBertForSequenceClassification.from_pretrained(finetuned_model)
 sequence = 'CACCGCATGGAGATCGGCACCTACTTCGACAAGCTGGAGGCGCTGCTGAAGGAGTGGTACGAGGCGCGCGGGGGTGAGGCATGACGGACTGGCAAGAGGAGCAGCGTCAGCGC'
 inputs = tokenizer(sequence, return_tensors="pt")
 # Ensure that inputs have a batch dimension