Biocoder09 commited on
Commit
6d46837
·
verified ·
1 Parent(s): ebab832

Create README

Browse files
Files changed (1) hide show
  1. README +107 -0
README ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧬 CANLoc — Protein Subcellular Localization Predictor
2
+
3
+ CANLoc is a machine learning web application for predicting the subcellular localization of proteins directly from protein sequences.
4
+ It provides accurate, fast, and interpretable predictions through a modern deep-learning–assisted pipeline and an interactive web interface.
5
+
6
+ ---
7
+
8
+ ## 🔬 Model Overview
9
+
10
+ CANLoc combines:
11
+
12
+ - **ESM2 (Transformer-based protein language model)**
13
+ Used for extracting rich sequence embeddings without alignment.
14
+
15
+ - **Mean pooling of residue embeddings**
16
+ Produces fixed-length feature vectors.
17
+
18
+ - **XGBoost classifier**
19
+ Trained on curated protein datasets for robust multiclass prediction.
20
+
21
+ ### Predicted Classes
22
+ - Cytoplasm
23
+ - Nucleus
24
+ - Membrane
25
+ - Mitochondria
26
+
27
+ Each prediction includes **class probabilities** and **confidence visualization.**
28
+
29
+ ---
30
+
31
+ ## 📊 Features
32
+
33
+ - Single sequence prediction
34
+ - Batch prediction via FASTA file upload
35
+ - Probability bar chart and radar plot
36
+ - Confidence-based interpretation
37
+ - Clean, responsive bioinformatics-style UI
38
+ - Dockerized for reproducible deployment
39
+ - FastAPI backend + modern frontend
40
+
41
+ ---
42
+
43
+ ## 🧪 Input Formats
44
+
45
+ ### Single Sequence
46
+ Paste a raw amino acid sequence: MVKFKKYGIP...
47
+
48
+
49
+ ### FASTA File
50
+ Upload a standard FASTA file with one or multiple sequences:
51
+ sp|P25296|CANB_YEAST
52
+ MSLIHPDTAKYPFKFEPF...
53
+
54
+
55
+ ---
56
+
57
+ ## 📈 Output Interpretation
58
+
59
+ - **Predicted Location**
60
+ The most probable subcellular class.
61
+
62
+ - **Class Probabilities**
63
+ Displayed as percentages for all four classes.
64
+
65
+ - **Confidence Levels**
66
+ - High: ≥ 75%
67
+ - Medium: 60–75%
68
+ - Low: < 60% (interpret with caution)
69
+
70
+ ---
71
+
72
+ ## ⚙️ Evaluation & Validation
73
+
74
+ The model was evaluated using:
75
+ - Train/test split
76
+ - 10-fold stratified cross-validation
77
+ - Precision, recall, F1-score
78
+ - Sensitivity and specificity analysis
79
+ - ROC curves per class
80
+
81
+ These evaluations confirm CANLoc’s reliability for academic/research workflows..
82
+
83
+
84
+ ---
85
+
86
+ ## 🚀 Deployment
87
+
88
+ CANLoc is containerized and deployed using **Docker**.
89
+
90
+ ## 📄 License
91
+ This project is licensed under the Apache License 2.0.
92
+
93
+ >Free for academic and commercial use
94
+ >Includes patent protection
95
+ >No restrictions on deployment or modification
96
+
97
+ See the LICENSE file for details.
98
+
99
+
100
+ ## 📬 Contact
101
+
102
+ For questions, bug report or feedback:
103
+ majidkhan>jssmsc@gmail.com
104
+
105
+ ## 📌 Citation
106
+
107
+ If you use CANLoc in academic work, please cite appropriately.