Spaces:
Sleeping
Sleeping
File size: 2,425 Bytes
6d46837 1b664a1 6d46837 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | # 🧬 CANLoc — Protein Subcellular Localization Predictor
CANLoc is a machine learning web application for predicting the subcellular localization of proteins directly from protein sequences.
It provides accurate, fast, and interpretable predictions through a modern deep-learning–assisted pipeline and an interactive web interface.
---
## 🔬 Model Overview
CANLoc combines:
- **ESM2 (Transformer-based protein language model)**
Used for extracting rich sequence embeddings without alignment.
- **Mean pooling of residue embeddings**
Produces fixed-length feature vectors.
- **XGBoost classifier**
Trained on curated protein datasets for robust multiclass prediction.
### Predicted Classes
- Cytoplasm
- Nucleus
- Membrane
- Mitochondria
Each prediction includes **class probabilities** and **confidence visualization.**
---
## 📊 Features
- Single sequence prediction
- Batch prediction via FASTA file upload
- Probability bar chart and radar plot
- Confidence-based interpretation
- Clean, responsive bioinformatics-style UI
- Dockerized for reproducible deployment
- FastAPI backend + modern frontend
---
## 🧪 Input Formats
### Single Sequence
Paste a raw amino acid sequence: MVKFKKYGIP...
### FASTA File
Upload a standard FASTA file with one or multiple sequences:
sp|P25296|CANB_YEAST
MSLIHPDTAKYPFKFEPF...
---
## 📈 Output Interpretation
- **Predicted Location**
The most probable subcellular class.
- **Class Probabilities**
Displayed as percentages for all four classes.
- **Confidence Levels**
- High: ≥ 75%
- Medium: 60–75%
- Low: < 60% (interpret with caution)
---
## ⚙️ Evaluation & Validation
The model was evaluated using:
- Train/test split
- 10-fold stratified cross-validation
- Precision, recall, F1-score
- Sensitivity and specificity analysis
- ROC curves per class
These evaluations confirm CANLoc’s reliability for academic/research workflows..
---
## 🚀 Deployment
CANLoc is containerized and deployed using **Docker**.
## 📄 License
This project is licensed under the Apache License 2.0.
>Free for academic and commercial use
>Includes patent protection
>No restrictions on deployment or modification
See the LICENSE file for details.
## 📬 Contact
For questions, bug report or feedback:
majidkhan.jssmsc@gmail.com
## 📌 Citation
If you use CANLoc in academic work, please cite appropriately.
|