Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -9,16 +9,20 @@ tags:
|
|
| 9 |
library_name: pytorch
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# EMFP: ESM-2 Micropeptide Functional
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
| 15 |
|
| 16 |
## Performance
|
| 17 |
|
| 18 |
| Task | EMFP | Random Forest | ESM+MLP | ProtBERT+MLP |
|
| 19 |
|------|------|---------------|---------|--------------|
|
| 20 |
| Authenticity | **0.967** | 0.718 | 0.892 | 0.856 |
|
| 21 |
-
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## Usage
|
| 24 |
|
|
@@ -57,14 +61,14 @@ labels, strs, tokens = batch_converter(data)
|
|
| 57 |
with torch.no_grad():
|
| 58 |
logits = classifier(tokens)
|
| 59 |
probs = torch.softmax(logits, dim=1)
|
| 60 |
-
print(f"
|
| 61 |
```
|
| 62 |
|
| 63 |
## Model Details
|
| 64 |
|
| 65 |
-
- Base: ESM-2 650M
|
| 66 |
-
- Training: 26,626 sequences
|
| 67 |
-
- Optimizer: AdamW, FP16
|
| 68 |
- Size: 7.4 GB
|
| 69 |
|
| 70 |
## Download
|
|
@@ -75,13 +79,13 @@ huggingface-cli download huangruihua/EMFP best_model.pt --local-dir ./
|
|
| 75 |
|
| 76 |
## GitHub
|
| 77 |
|
| 78 |
-
Full code: https://github.com/huangruihua/EMFP
|
| 79 |
|
| 80 |
## Citation
|
| 81 |
|
| 82 |
```bibtex
|
| 83 |
@software{emfp_2026,
|
| 84 |
-
title={EMFP: ESM-2 Micropeptide Functional
|
| 85 |
author={Huang, Rui-Hua},
|
| 86 |
year={2026},
|
| 87 |
url={https://github.com/huangruihua/EMFP}
|
|
|
|
| 9 |
library_name: pytorch
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# EMFP: ESM-2 Micropeptide Predictor for Canonical Functional Proteins
|
| 13 |
|
| 14 |
+
EMFP is designed to identify peptide sequences that may encode canonical functional proteins, as defined by molecular function annotations in the UniProt database. While many peptides can be bioactive, EMFP specifically focuses on distinguishing peptides with protein-like molecular functions from those with other or unknown mechanisms of action.
|
| 15 |
+
|
| 16 |
+
Fine-tuned ESM-2 (650M) model for predicting peptides encoding canonical functional proteins.
|
| 17 |
|
| 18 |
## Performance
|
| 19 |
|
| 20 |
| Task | EMFP | Random Forest | ESM+MLP | ProtBERT+MLP |
|
| 21 |
|------|------|---------------|---------|--------------|
|
| 22 |
| Authenticity | **0.967** | 0.718 | 0.892 | 0.856 |
|
| 23 |
+
| Canonical Protein Function | **0.932** | 0.505 | 0.827 | 0.791 |
|
| 24 |
+
|
| 25 |
+
**Note**: "Canonical Protein Function" refers to peptides encoding proteins with molecular function annotations (enzyme activity, binding activity, etc.) as defined in UniProt.
|
| 26 |
|
| 27 |
## Usage
|
| 28 |
|
|
|
|
| 61 |
with torch.no_grad():
|
| 62 |
logits = classifier(tokens)
|
| 63 |
probs = torch.softmax(logits, dim=1)
|
| 64 |
+
print(f"Probability of encoding canonical functional protein: {probs[0, 1].item():.4f}")
|
| 65 |
```
|
| 66 |
|
| 67 |
## Model Details
|
| 68 |
|
| 69 |
+
- Base: ESM-2 650M (`esm2_t33_650M_UR50D`)
|
| 70 |
+
- Training: 26,626 peptide sequences from UniProt with molecular function annotations
|
| 71 |
+
- Optimizer: AdamW, FP16 precision
|
| 72 |
- Size: 7.4 GB
|
| 73 |
|
| 74 |
## Download
|
|
|
|
| 79 |
|
| 80 |
## GitHub
|
| 81 |
|
| 82 |
+
Full code: [https://github.com/huangruihua/EMFP](https://github.com/huangruihua/EMFP)
|
| 83 |
|
| 84 |
## Citation
|
| 85 |
|
| 86 |
```bibtex
|
| 87 |
@software{emfp_2026,
|
| 88 |
+
title={EMFP: ESM-2 Micropeptide Predictor for Canonical Functional Proteins},
|
| 89 |
author={Huang, Rui-Hua},
|
| 90 |
year={2026},
|
| 91 |
url={https://github.com/huangruihua/EMFP}
|