canloc / README
Biocoder09's picture
Update README
1b664a1 verified
raw
history blame
2.43 kB
# 🧬 CANLoc — Protein Subcellular Localization Predictor
CANLoc is a machine learning web application for predicting the subcellular localization of proteins directly from protein sequences.
It provides accurate, fast, and interpretable predictions through a modern deep-learning–assisted pipeline and an interactive web interface.
---
## 🔬 Model Overview
CANLoc combines:
- **ESM2 (Transformer-based protein language model)**
Used for extracting rich sequence embeddings without alignment.
- **Mean pooling of residue embeddings**
Produces fixed-length feature vectors.
- **XGBoost classifier**
Trained on curated protein datasets for robust multiclass prediction.
### Predicted Classes
- Cytoplasm
- Nucleus
- Membrane
- Mitochondria
Each prediction includes **class probabilities** and **confidence visualization.**
---
## 📊 Features
- Single sequence prediction
- Batch prediction via FASTA file upload
- Probability bar chart and radar plot
- Confidence-based interpretation
- Clean, responsive bioinformatics-style UI
- Dockerized for reproducible deployment
- FastAPI backend + modern frontend
---
## 🧪 Input Formats
### Single Sequence
Paste a raw amino acid sequence: MVKFKKYGIP...
### FASTA File
Upload a standard FASTA file with one or multiple sequences:
sp|P25296|CANB_YEAST
MSLIHPDTAKYPFKFEPF...
---
## 📈 Output Interpretation
- **Predicted Location**
The most probable subcellular class.
- **Class Probabilities**
Displayed as percentages for all four classes.
- **Confidence Levels**
- High: 75%
- Medium: 60–75%
- Low: < 60% (interpret with caution)
---
## ⚙️ Evaluation & Validation
The model was evaluated using:
- Train/test split
- 10-fold stratified cross-validation
- Precision, recall, F1-score
- Sensitivity and specificity analysis
- ROC curves per class
These evaluations confirm CANLoc’s reliability for academic/research workflows..
---
## 🚀 Deployment
CANLoc is containerized and deployed using **Docker**.
## 📄 License
This project is licensed under the Apache License 2.0.
>Free for academic and commercial use
>Includes patent protection
>No restrictions on deployment or modification
See the LICENSE file for details.
## 📬 Contact
For questions, bug report or feedback:
majidkhan.jssmsc@gmail.com
## 📌 Citation
If you use CANLoc in academic work, please cite appropriately.