MuthuS97 commited on
Commit
48d872f
·
verified ·
1 Parent(s): f4c9b19

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: pyod
5
+ tags:
6
+ - protein-structure
7
+ - anomaly-detection
8
+ - one-class
9
+ - autoencoder
10
+ - protease-inhibitor
11
+ - structural-filtering
12
+ - biology
13
+ - bioinformatics
14
+ - unsupervised-learning
15
+
16
+ datasets:
17
+ - MEROPS
18
+ - UniProt
19
+ - AlphaFold
20
+
21
+ model_name: Structural_module-protease_inhibitor
22
+
23
+ model_description: |
24
+ Structural_module-protease_inhibitoris an unsupervised, one-class deep learning model for filtering protein
25
+ structures that are structurally inconsistent with curated protease inhibitor (PI) like features learned from (RCSBembeddingmodel, github.com/rcsb/rcsb-embedding-model) from protease inhibitor databases. The model learns the structural embedding manifold of known protease
26
+ inhibitors and assigns higher reconstruction error to structurally dissimilar inputs.
27
+ user_interface: |
28
+ Easy-to-use inference interface:
29
+ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1JLhLpvXG4plzPtIliG_CJnYui6P8Pu1J?usp=sharing)"
30
+ training_data_description: |
31
+ The model was trained on 17,889 curated protease inhibitor structures from the MEROPS
32
+ database. MEROPS sequences were mapped via similarity search against taxonomy-
33
+ restricted UniProt datasets (fungi, plants, bacteria), and corresponding structures
34
+ were obtained from the AlphaFold Protein Structure Database and used for traning the model.
35
+
36
+ input_format: |
37
+ Fixed-length continuous protein structure embeddings derived from three-dimensional
38
+ structural features (RCSBembedding model, github.com/rcsb/rcsb-embedding-model). Embeddings must be standardized using the provided scaler.pkl
39
+ before inference.
40
+
41
+ model_architecture: |
42
+ Fully connected autoencoder implemented in PyTorch via the PyOD library, featuring
43
+ a geometrically decreasing encoder, latent bottleneck, symmetric decoder, batch
44
+ normalization, dropout regularization, and mean squared reconstruction loss.
45
+
46
+ training_procedure: |
47
+ The model was trained using the Adam optimizer with weight decay and mini-batch
48
+ stochastic gradient descent. Hyperparameters were optimized using Bayesian
49
+ optimization (Optuna, TPE sampler) on an independent 10% validation split
50
+ (~1,789 structures). The tuning objective was to minimize reconstruction error on
51
+ unseen but structurally valid protease inhibitor examples. The final model was
52
+ retrained on the full dataset using the optimal hyperparameters with fixed random
53
+ seeds.
54
+
55
+ outputs: |
56
+ The model outputs a reconstruction-based anomaly score, an outlier probability,
57
+ and a confidence estimate. Low reconstruction-based anomaly scores indicate structural consistency with known
58
+ protease inhibitor folds, while high scores indicate structural dissimilarity.
59
+
60
+ intended_use: |
61
+ Structural filtering and pre-selection of protease inhibitor–like protein structures
62
+ in large-scale datasets.
63
+
64
+ limitations: |
65
+ Novel PI folds absent from the training data may be incorrectly rejected.
66
+
67
+ not_intended_use: |
68
+ Functional annotation, clinical
69
+ decision-making.
70
+
71
+ reproducibility: |
72
+ All preprocessing parameters, training configurations are provided
73
+ to enable exact reproduction of results.
74
+
75
+ citation: |
76
+ Please cite the associated publication and acknowledge the PyOD library, the MEROPS
77
+ protease inhibitor database, and the AlphaFold Protein Structure Database.
78
+
79
+
80
+ ---