Update README.md
Browse files
README.md
CHANGED
|
@@ -5,8 +5,16 @@ license: bsd-3-clause
|
|
| 5 |
# ProCALM
|
| 6 |
[ProCALM](https://github.com/jsunn-y/ProCALM/tree/main) (Protein Conditionally Adapted Language Model) is a suite of models where [ProGen2-base](https://github.com/enijkamp/progen2) is finetuned with conditional adapters for conditional generation of functional enzymes, based on EC number, taxonomy, or both.
|
| 7 |
|
| 8 |
-
ProCALM models share `tokenizer.json`
|
| 9 |
-
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
More usage details can be found in [github](https://github.com/jsunn-y/ProCALM/tree/main) and in our paper.
|
|
|
|
| 5 |
# ProCALM
|
| 6 |
[ProCALM](https://github.com/jsunn-y/ProCALM/tree/main) (Protein Conditionally Adapted Language Model) is a suite of models where [ProGen2-base](https://github.com/enijkamp/progen2) is finetuned with conditional adapters for conditional generation of functional enzymes, based on EC number, taxonomy, or both.
|
| 7 |
|
| 8 |
+
ProCALM models share `tokenizer.json` and individual models are organized into subfolders. We have uploaded the most relevant models here, but please reach out if you would like to use other models from our paper. `1.5B` and `9B` refer to checkpoints trained to 1.5 and 9 billion tokens, respectively
|
|
|
|
| 9 |
|
| 10 |
+
| Name | Description |
|
| 11 |
+
|:--------|:-------:|
|
| 12 |
+
| progen2-base | Original ProGen2 model with ~760 million parameters|
|
| 13 |
+
| ec-onehot-uniref | Trained with onehot-encoded EC conditioning, on ~29e6 enzymes from Uniref |
|
| 14 |
+
| ec-onehot-swissprot | Trained with onehot-encoded EC conditioning, on ~150e3 enzymes from Swissprot Train |
|
| 15 |
+
| tax-swissprot | Trained on onehot-encoded EC taxonomy conditioning, on ~150e3 enzymes from Swissprot Train |
|
| 16 |
+
| ec+tax-swissprot | Trained jointly on onehot-encoded EC conditioning and onehot-encoded taxonomy conditioning with parallel adapters, on ~150e3 enzymes from Swissprot Train |
|
| 17 |
+
| ec-drfp-swissprot | Trained with DRFP-encoded EC conditioning, on ~150e3 enzymes from Swissprot Train |
|
| 18 |
+
| ec-creep-swissprot | Trained with CREEP-encoded EC conditioning, on ~150e3 enzymes from Swissprot Train |
|
| 19 |
|
| 20 |
More usage details can be found in [github](https://github.com/jsunn-y/ProCALM/tree/main) and in our paper.
|