fondress commited on
Commit
0d2ebd0
·
verified ·
1 Parent(s): e5da61c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +113 -0
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PDeepPP: A Comprehensive Protein Language Model Hub
2
+
3
+ PDeepPP is a hybrid protein language model designed to predict post-translational modification (PTM) sites, analyze biologically relevant features, and support a wide range of protein sequence analysis tasks. This repository serves as the central hub for accessing and exploring various specialized PDeepPP models, each fine-tuned for specific tasks, such as PTM site prediction, bioactivity analysis, and more.
4
+
5
+ ## Overview
6
+
7
+ PDeepPP integrates state-of-the-art transformer-based self-attention mechanisms with convolutional neural networks (CNNs) to capture both global and local features in protein sequences. By leveraging pretrained embeddings from `ESM` and incorporating modular architecture components, PDeepPP offers a robust framework for protein sequence analysis.
8
+
9
+ This repository contains links to multiple task-specific PDeepPP models. These models are pre-trained or fine-tuned on publicly available datasets and are hosted on Hugging Face for easy access.
10
+
11
+ ---
12
+
13
+ ## Key Features
14
+
15
+ - **Flexible Architecture**: Combines self-attention and convolutional operations for robust feature extraction.
16
+ - **Task-Specific Models**: Includes pre-trained models for PTM prediction, bioactivity classification, and more.
17
+ - **Dataset Support**: Models are validated on datasets such as PTM and BPS, ensuring performance on real-world tasks.
18
+ - **Extensibility**: Users can fine-tune the models on custom datasets for new tasks.
19
+
20
+ ---
21
+
22
+ ## Available Models
23
+
24
+ ### General Models
25
+ - [PDeepPP Main](https://huggingface.co/fondress/PDeepPP)
26
+
27
+ ### Task-Specific Models
28
+
29
+ #### Post-Translational Modifications (PTMs)
30
+ - [PDeepPP Phosphorylation (Serine)](https://huggingface.co/fondress/PDeepPP_Phosphoserine)
31
+ - [PDeepPP Phosphorylation (Tyrosine)](https://huggingface.co/fondress/PDeepPP_Phosphorylation-Y)
32
+ - [PDeepPP Glycosylation (N-linked)](https://huggingface.co/fondress/PDeepPP_N-linked-glycosylation-N)
33
+ - [PDeepPP Glycosylation (O-linked)](https://huggingface.co/fondress/PDeepPP_O-linked-glycosylation)
34
+ - [PDeepPP Methylation (Lysine)](https://huggingface.co/fondress/PDeepPP_Methylation-K)
35
+ - [PDeepPP Methylation (Arginine)](https://huggingface.co/fondress/PDeepPP_Methylation-R)
36
+ - [PDeepPP SUMOylation](https://huggingface.co/fondress/PDeepPP_SUMOylation)
37
+ - [PDeepPP Ubiquitin](https://huggingface.co/fondress/PDeepPP_Ubiquitin)
38
+
39
+ #### Bioactivity Prediction
40
+ - [PDeepPP ACE](https://huggingface.co/fondress/PDeepPP_ACE)
41
+ - [PDeepPP BBP](https://huggingface.co/fondress/PDeepPP_BBP)
42
+ - [PDeepPP DPPIV](https://huggingface.co/fondress/PDeepPP_DPPIV)
43
+ - [PDeepPP Toxicity](https://huggingface.co/fondress/PDeepPP_Toxicity)
44
+ - [PDeepPP Antimalarial](https://huggingface.co/fondress/PDeepPP_Antimalarial-main)
45
+ - [PDeepPP Anticancer](https://huggingface.co/fondress/PDeepPP_Anticancer-main)
46
+ - [PDeepPP Antiviral](https://huggingface.co/fondress/PDeepPP_Antiviral)
47
+ - [PDeepPP Antioxidant](https://huggingface.co/fondress/PDeepPP_Antioxidant)
48
+ - [PDeepPP Antibacterial](https://huggingface.co/fondress/PDeepPP_Antibacterial)
49
+ - [PDeepPP Antifungal](https://huggingface.co/fondress/PDeepPP_Antifungal)
50
+ - [PDeepPP Bitter](https://huggingface.co/fondress/PDeepPP_bitter)
51
+ - [PDeepPP Umami](https://huggingface.co/fondress/PDeepPP_umami)
52
+ - [PDeepPP Quorum](https://huggingface.co/fondress/PDeepPP_Quorum)
53
+ - [PDeepPP TTCA](https://huggingface.co/fondress/PDeepPP_TTCA)
54
+ ---
55
+
56
+ ## Model Architecture
57
+
58
+ PDeepPP is built on a hybrid architecture that includes:
59
+
60
+ - **Self-Attention Global Features**: Captures long-range dependencies in protein sequences.
61
+ - **TransConv1d Module**: Combines transformer layers with convolutional layers for local feature extraction.
62
+ - **PosCNN Module**: Incorporates position-aware convolutional operations to enhance sequence representation.
63
+
64
+ ---
65
+
66
+ ## How to Use
67
+
68
+ To use any of the models, you need to install the required dependencies, such as `torch` and `transformers`:
69
+
70
+ ```bash
71
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
72
+ pip install transformers
73
+ ```
74
+ Here’s a quick example of how to load and use a model:
75
+
76
+ ```python
77
+ from transformers import AutoModel, AutoTokenizer
78
+
79
+ # Load the model
80
+ model_name = "fondress/PDeepPP_ACE"
81
+ model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
82
+
83
+ # Example input
84
+ protein_sequence = "VELYP"
85
+ # Preprocess the sequence (refer to specific model documentation for preprocessing steps)
86
+
87
+ # Forward pass
88
+ outputs = model(input_ids=processed_input)
89
+ logits = outputs.logits
90
+ ```
91
+
92
+ ## Training and Customization
93
+
94
+ You can fine-tune PDeepPP for custom tasks using your own datasets. The model supports:
95
+
96
+ - **Custom PTM types**: Extend the model to predict additional post-translational modifications.
97
+ - **Sequence classification tasks**: Adapt the model to classify protein sequences based on custom labels.
98
+ - **Feature extraction for downstream analyses**: Use PDeepPP to generate embeddings for tasks like clustering or similarity calculation.
99
+
100
+ Refer to the `PDeepPPConfig` class in the source repository for details on available hyperparameters and customization options.
101
+
102
+ ---
103
+ ## Citation
104
+ If you use any of the PDeepPP models in your research, please cite the associated paper or repository:
105
+
106
+ ```
107
+ @article{your_reference,
108
+ title={`PDeepPP`: A Hybrid Model for Protein Sequence Analysis},
109
+ author={Author Name},
110
+ journal={Journal Name},
111
+ year={2025}
112
+ }
113
+ ```