Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,20 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
---
|
| 4 |
+
### Model Overview
|
| 5 |
+
|
| 6 |
+
The **OphPred** model is a machine learning-based tool developed to predict the optimal pH of enzyme activity directly from protein sequences. Utilizing the ESM-2 protein language model combined with KNN (k-nearest neighbors) and XGBoost algorithms, OphPred provides robust and reliable predictions across various enzyme classes. The model has been rigorously validated using different train-validation splitting strategies, including random, homology-based, PFAM-based, and EC-based splits. OphPred is designed to be fast and efficient, making it suitable for high-throughput screening of large protein libraries.
|
| 7 |
+
|
| 8 |
+
### Key Features:
|
| 9 |
+
- **Input**: Protein sequences.
|
| 10 |
+
- **Output**: Predicted optimal pH range for enzyme activity.
|
| 11 |
+
- **Performance**: Demonstrated strong predictive accuracy with a mean absolute error (MAE) as low as 0.6 and Spearman correlation up to 0.77 when enriched with additional data.
|
| 12 |
+
- **Use Cases**: Useful for protein engineering, enzyme optimization in biotechnology, and exploring protein space for desired enzymatic properties.
|
| 13 |
+
|
| 14 |
+
### Citation
|
| 15 |
+
If you use this model, please cite the authors as follows:
|
| 16 |
+
|
| 17 |
+
Zaretckii, M.; Buslaev, P.; Kozlovskii, I.; Morozov, A.; Popov, P. Approaching Optimal pH Enzyme Prediction with Large Language Models. *ACS Synth. Biol.* **2024,** *10*, DOI: 10.1021/acssynbio.4c00465.
|
| 18 |
+
|
| 19 |
+
### Further Reading
|
| 20 |
+
You can read the full paper describing the development and validation of OphPred at this [link](https://doi.org/10.1021/acssynbio.4c00465).
|