pavm595 commited on
Commit
d15aa51
·
verified ·
1 Parent(s): cccf8f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -3
README.md CHANGED
@@ -2,6 +2,14 @@
2
  license: mit
3
  ---
4
 
 
 
 
 
 
 
 
 
5
  ```python
6
  from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
7
  import re
@@ -12,10 +20,22 @@ pipeline = TextClassificationPipeline(
12
  device=0
13
  )
14
 
15
- sequences_Example = ["M A K S K N H T A H N Q T R K A H R N G I K K P K T Y K Y P S L K G V D P K F R R N H K H A L H G T A K A L A A A K K",
16
- "M G L P V S W A P P A L W V L G C C A L L L S L W A L C T A C R R P E D A V A P R K R A R R Q R A R L Q G S A T A A E A S L L R R T H L C S L S K S D T R L H E L H R G P R S S R A L R P A S M D L L R P H W L E V S R D I T G P Q A A P S A F P H Q E L P R A L P A A A A T A G C A G L E A T Y S N V G L A A L P G V S L A A S P V V A E Y A R V Q K R K G T H R S P Q E P Q Q G K T E V T P A A Q V D V L Y S R V C K P K R R D P G P T T D P L D P K G Q G A I L A L A G D L A Y Q T L P L R A L D V D S G P L E N V Y E S I R E L G D P A G R S S T C G A G T P P A S S C P S L G R G W R P L P A S L P"]
17
 
18
  sequences_Example = [re.sub(r"[UZOB]", "X", sequence) for sequence in sequences_Example]
19
 
20
  print(pipeline(sequences_Example))
21
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  ---
4
 
5
+ # ProtBert-BFD-SS3
6
+
7
+ Pretrained model on protein sequences using a masked language modeling (MLM) objective. The model makes a per-protein (pooling) predictions of membrane versus water-soluble (2-state accuracy). The model was developed by Ahmed Elnaggar et al. and more information can be found on the [GitHub repository](https://github.com/agemagician/ProtTrans) and in the [accompanying paper](https://ieeexplore.ieee.org/document/9477085). This repository is a fork of their [HuggingFace repository](https://huggingface.co/Rostlab/prot_bert_bfd_ss3).
8
+ This model is trained on uppercase amino acids: it only works with capital letter amino acids.
9
+
10
+ ## Model description
11
+ The model has no auxiliary tasks like BERT's next-sentence prediction. Only the main objective - MLM - was used.
12
+
13
  ```python
14
  from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
15
  import re
 
20
  device=0
21
  )
22
 
23
+ sequences_Example = ["MAKSKNHTAHNQTRKAHRNGIKKPKTYKYPSLKGVDPKFRRNHKHALHGTAKALAAAKK",
24
+ "MGLPVSWAPPALWVLGCCALLLSLWALCTACRRPEDAVAPRKRARRQRARLQGSATAAEASLLRRTHLCSLSKSDTRLHELHRGPRSSRALRPASMDLLRPHWLEVSRDITGPQAAPSAFPHQELPRALPAAAATAGCAGLEATYSNVGLAALPGVSLAASPVVAEYARVQKRKGTHRSPQEPQQGKTEVTPAAQVDVLYSRVCKPKRRDPGPTTDPLDPKGQGAILALAGDLAYQTLPLRALDVDSGPLENVYESIRELGDPAGRSSTCGAGTPPASSCPSLGRGWRPLPASLP"]
25
 
26
  sequences_Example = [re.sub(r"[UZOB]", "X", sequence) for sequence in sequences_Example]
27
 
28
  print(pipeline(sequences_Example))
29
+ ```
30
+
31
+ ## Input
32
+
33
+ An array of uppercase letters of amino acid residues, e.g. `["PRTEINO"]`
34
+
35
+ ## Output
36
+
37
+ A list of two dictionaries. The keys of the dictonaries are: `label` and `score`. `label` is the prediction, i.e., either `Soluble` or `Membrane`, and `score` is the confidence of the model about the prediction. Prediction for the inference example: `[{'label': 'Soluble', 'score': 0.8509202003479004}, {'label': 'Membrane', 'score': 0.8588864207267761}]`.
38
+
39
+ ## Copyright
40
+
41
+ Code derived from https://github.com/agemagician/ProtTrans is licensed under the MIT License, Copyright (c) 2025 Ahmed Elnaggar. The ProtTrans pretrained models are released under the under terms of the Academic Free License v3.0 License, Copyright (c) 2025 Ahmed Elnaggar. The other code is licensed under the MIT license, Copyright (c) 2025 Maksim Pavlov.