File size: 2,425 Bytes
6d46837
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1b664a1
6d46837
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# 🧬 CANLoc — Protein Subcellular Localization Predictor

CANLoc is a machine learning web application for predicting the subcellular localization of proteins directly from protein sequences.  
It provides accurate, fast, and interpretable predictions through a modern deep-learning–assisted pipeline and an interactive web interface.

---

## 🔬 Model Overview

CANLoc combines:

- **ESM2 (Transformer-based protein language model)**  
  Used for extracting rich sequence embeddings without alignment.

- **Mean pooling of residue embeddings**  
  Produces fixed-length feature vectors.

- **XGBoost classifier**  
  Trained on curated protein datasets for robust multiclass prediction.

### Predicted Classes
- Cytoplasm  
- Nucleus  
- Membrane  
- Mitochondria  

Each prediction includes **class probabilities** and **confidence visualization.**

---

## 📊 Features

- Single sequence prediction
- Batch prediction via FASTA file upload
- Probability bar chart and radar plot
- Confidence-based interpretation
- Clean, responsive bioinformatics-style UI
- Dockerized for reproducible deployment
- FastAPI backend + modern frontend

---

## 🧪 Input Formats

### Single Sequence
Paste a raw amino acid sequence: MVKFKKYGIP...


### FASTA File
Upload a standard FASTA file with one or multiple sequences:
sp|P25296|CANB_YEAST
MSLIHPDTAKYPFKFEPF...


---

## 📈 Output Interpretation

- **Predicted Location**  
  The most probable subcellular class.

- **Class Probabilities**  
  Displayed as percentages for all four classes.

- **Confidence Levels**
  - High:  75%
  - Medium: 60–75%
  - Low: < 60% (interpret with caution)

---

## ⚙️ Evaluation & Validation

The model was evaluated using:
- Train/test split
- 10-fold stratified cross-validation
- Precision, recall, F1-score
- Sensitivity and specificity analysis
- ROC curves per class

These evaluations confirm CANLoc’s reliability for academic/research workflows..


---

## 🚀 Deployment

CANLoc is containerized and deployed using **Docker**.

## 📄 License
This project is licensed under the Apache License 2.0.

>Free for academic and commercial use
>Includes patent protection
>No restrictions on deployment or modification

See the LICENSE file for details.


## 📬 Contact

For questions, bug report or feedback:
majidkhan.jssmsc@gmail.com

## 📌 Citation

If you use CANLoc in academic work, please cite appropriately.