andrewdalpino commited on
Commit
54f047d
·
verified ·
1 Parent(s): d5e4b03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -2
README.md CHANGED
@@ -1,3 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # ESMC Protein Function Predictor
2
 
3
  An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM Cambrian Transformer architecture, pre-trained on [UniRef](https://www.uniprot.org/help/uniref), [MGnify](https://www.ebi.ac.uk/metagenomics), and the Joint Genome Institute's database and fine-tuned on the [AmiGO Boost](https://huggingface.co/datasets/andrewdalpino/AmiGO-Boost) protein function dataset, this model predicts the GO subgraph for a particular protein sequence - giving you insight into the molecular function, biological process, and location of the activity inside the cell.
@@ -17,7 +31,7 @@ The following pretrained models are available on HuggingFace Hub.
17
  | Name | Embedding Dim. | Attn. Heads | Encoder Layers | Context Length | QAT | Total Parameters |
18
  |---|---|---|---|---|---|---|
19
  | [andrewdalpino/ESMC-300M-Protein-Function](https://huggingface.co/andrewdalpino/ESMC-300M-Protein-Function) | 960 | 15 | 30 | 2048 | None | 361M |
20
- | [andrewdalpino/ESMC-300M-QAT-Protein-Function](https://huggingface.co/andrewdalpino/ESMC-300M-QAT-Protein-Function) | 960 | 15 | 30 | 2048 | int6w | 361M |
21
  | [andrewdalpino/ESMC-600M-Protein-Function](https://huggingface.co/andrewdalpino/ESMC-600M-Protein-Function) | 1152 | 18 | 36 | 2048 | None | 644M |
22
  | [andrewdalpino/ESMC-600M-QAT-Protein-Function](https://huggingface.co/andrewdalpino/ESMC-600M-QAT-Protein-Function) | 1152 | 18 | 36 | 2048 | int8w | 644M |
23
 
@@ -74,4 +88,4 @@ The training code can be found at [https://github.com/andrewdalpino/ESMC-Functio
74
  ## References:
75
 
76
  >- T. Hayes, et al. Simulating 500 million years of evolution with a language model, 2024.
77
- >- M. Ashburner, et al. Gene Ontology: tool for the unification of biology, 2000.
 
1
+ ---
2
+ datasets:
3
+ - andrewdalpino/AmiGO-Boost
4
+ metrics:
5
+ - precision
6
+ - recall
7
+ - f1
8
+ base_model:
9
+ - EvolutionaryScale/esmc-300m-2024-12
10
+ pipeline_tag: text-classification
11
+ tags:
12
+ - gene-ontology
13
+ ---
14
+
15
  # ESMC Protein Function Predictor
16
 
17
  An Evolutionary-scale Model (ESM) for protein function prediction from amino acid sequences using the Gene Ontology (GO). Based on the ESM Cambrian Transformer architecture, pre-trained on [UniRef](https://www.uniprot.org/help/uniref), [MGnify](https://www.ebi.ac.uk/metagenomics), and the Joint Genome Institute's database and fine-tuned on the [AmiGO Boost](https://huggingface.co/datasets/andrewdalpino/AmiGO-Boost) protein function dataset, this model predicts the GO subgraph for a particular protein sequence - giving you insight into the molecular function, biological process, and location of the activity inside the cell.
 
31
  | Name | Embedding Dim. | Attn. Heads | Encoder Layers | Context Length | QAT | Total Parameters |
32
  |---|---|---|---|---|---|---|
33
  | [andrewdalpino/ESMC-300M-Protein-Function](https://huggingface.co/andrewdalpino/ESMC-300M-Protein-Function) | 960 | 15 | 30 | 2048 | None | 361M |
34
+ | [andrewdalpino/ESMC-300M-QAT-Protein-Function](https://huggingface.co/andrewdalpino/ESMC-300M-QAT-Protein-Function) | 960 | 15 | 30 | 2048 | int8w | 361M |
35
  | [andrewdalpino/ESMC-600M-Protein-Function](https://huggingface.co/andrewdalpino/ESMC-600M-Protein-Function) | 1152 | 18 | 36 | 2048 | None | 644M |
36
  | [andrewdalpino/ESMC-600M-QAT-Protein-Function](https://huggingface.co/andrewdalpino/ESMC-600M-QAT-Protein-Function) | 1152 | 18 | 36 | 2048 | int8w | 644M |
37
 
 
88
  ## References:
89
 
90
  >- T. Hayes, et al. Simulating 500 million years of evolution with a language model, 2024.
91
+ >- M. Ashburner, et al. Gene Ontology: tool for the unification of biology, 2000.