Spaces:
Sleeping
Sleeping
| # 🧬 CANLoc — Protein Subcellular Localization Predictor | |
| CANLoc is a machine learning web application for predicting the subcellular localization of proteins directly from protein sequences. | |
| It provides accurate, fast, and interpretable predictions through a modern deep-learning–assisted pipeline and an interactive web interface. | |
| ## 🔬 Model Overview | |
| CANLoc combines: | |
| - **ESM2 (Transformer-based protein language model)** | |
| Used for extracting rich sequence embeddings without alignment. | |
| - **Mean pooling of residue embeddings** | |
| Produces fixed-length feature vectors. | |
| - **XGBoost classifier** | |
| Trained on curated protein datasets for robust multiclass prediction. | |
| ### Predicted Classes | |
| - Cytoplasm | |
| - Nucleus | |
| - Membrane | |
| - Mitochondria | |
| Each prediction includes **class probabilities** and **confidence visualization.** | |
| ## 📊 Features | |
| - Single sequence prediction | |
| - Batch prediction via FASTA file upload | |
| - Probability bar chart and radar plot | |
| - Confidence-based interpretation | |
| - Clean, responsive bioinformatics-style UI | |
| - Dockerized for reproducible deployment | |
| - FastAPI backend + modern frontend | |
| ## 🧪 Input Formats | |
| ### Single Sequence | |
| Paste a raw amino acid sequence: MVKFKKYGIP... | |
| ### FASTA File | |
| Upload a standard FASTA file with one or multiple sequences: | |
| sp|P25296|CANB_YEAST | |
| MSLIHPDTAKYPFKFEPF... | |
| ## 📈 Output Interpretation | |
| - **Predicted Location** | |
| The most probable subcellular class. | |
| - **Class Probabilities** | |
| Displayed as percentages for all four classes. | |
| - **Confidence Levels** | |
| - High: ≥ 75% | |
| - Medium: 60–75% | |
| - Low: < 60% (interpret with caution) | |
| ## ⚙️ Evaluation & Validation | |
| The model was evaluated using: | |
| - Train/test split | |
| - 10-fold stratified cross-validation | |
| - Precision, recall, F1-score | |
| - Sensitivity and specificity analysis | |
| - ROC curves per class | |
| These evaluations confirm CANLoc’s reliability for academic/research workflows.. | |
| ## 🚀 Deployment | |
| CANLoc is containerized and deployed using **Docker**. | |
| ## 📄 License | |
| This project is licensed under the Apache License 2.0. | |
| >Free for academic and commercial use | |
| >Includes patent protection | |
| >No restrictions on deployment or modification | |
| See the LICENSE file for details. | |
| ## 📬 Contact | |
| For questions, bug report or feedback: | |
| majidkhan.jssmsc@gmail.com | |
| ## 📌 Citation | |
| If you use CANLoc in academic work, please cite appropriately. | |