andrewdalpino
/

ESMC-300M-QAT-Protein-Function

Text Classification

Model card Files Files and versions

Metrics Training metrics Community

andrewdalpino commited on Jul 30, 2025

Commit

550efd9

·

verified ·

1 Parent(s): cd256e6

Update README.md

Files changed (1) hide show

README.md +32 -9

README.md CHANGED Viewed

@@ -48,8 +48,6 @@ Then, we'll load the model weights from HuggingFace Hub and the GO graph using `
 ```python
 import torch
-import obonet
 from esm.tokenization import EsmSequenceTokenizer
 from esmc_function_classifier.model import EsmcGoTermClassifier
@@ -57,28 +55,53 @@ from esmc_function_classifier.model import EsmcGoTermClassifier
 model_name = "andrewdalpino/ESMC-300M-Protein-Function"
-# Visit https://geneontology.org/docs/download-ontology/ to download.
-go_db_path = "./dataset/go-basic.obo"
 sequence = "MPPKGHKKTADGDFRPVNSAGNTIQAKQKYSIDDLLYPKSTIKNLAKETLPDDAIISKDALTAIQRAATLFVSYMASHGNASAEAGGRKKIT"
 top_p = 0.5
-graph = obonet.read_obo(go_db_path)
 tokenizer = EsmSequenceTokenizer()
 model = EsmcGoTermClassifier.from_pretrained(model_name)
-model.load_gene_ontology(graph)
 out = tokenizer(sequence, max_length=2048, truncation=True)
 input_ids = torch.tensor(out["input_ids"], dtype=torch.int64)
 subgraph, go_term_probabilities = model.predict_subgraph(
     input_ids, top_p=top_p
 )
 ```
 ## Code Repository

 ```python
 import torch
 from esm.tokenization import EsmSequenceTokenizer
 from esmc_function_classifier.model import EsmcGoTermClassifier
 model_name = "andrewdalpino/ESMC-300M-Protein-Function"
 sequence = "MPPKGHKKTADGDFRPVNSAGNTIQAKQKYSIDDLLYPKSTIKNLAKETLPDDAIISKDALTAIQRAATLFVSYMASHGNASAEAGGRKKIT"
 top_p = 0.5
 tokenizer = EsmSequenceTokenizer()
 model = EsmcGoTermClassifier.from_pretrained(model_name)
 out = tokenizer(sequence, max_length=2048, truncation=True)
 input_ids = torch.tensor(out["input_ids"], dtype=torch.int64)
+go_term_probabilities = model.predict_terms(
+    input_ids, top_p=top_p
+)
+```
+You can also output the gene-ontology (GO) `networkx` subgraph for a given sequence like in the example below. You'll need an up-to-date gene ontology database that you can import using the `obonet` package.
+```python
+import networkx as nx
+import obonet
+# Visit https://geneontology.org/docs/download-ontology/ to download.
+go_db_path = "./dataset/go-basic.obo"
+graph = obonet.read_obo(go_db_path)
+model.load_gene_ontology(graph)
 subgraph, go_term_probabilities = model.predict_subgraph(
     input_ids, top_p=top_p
 )
+json = nx.node_link_data(subgraph)
+print(json)
+```
+### Quantized Model
+To quantize the model weights using int8 call the `quantize_weights()` method. Any model can be quantized, but we recommend one that has been quantization-aware trained (QAT) for the best performance. The `group_size` argument controls the granularity at which quantization scales are computed.
+```python
+model.quantize_weights(group_size=64)
 ```
 ## Code Repository