# 🧬 CANLoc — Protein Subcellular Localization Predictor CANLoc is a machine learning web application for predicting the subcellular localization of proteins directly from protein sequences. It provides accurate, fast, and interpretable predictions through a modern deep-learning–assisted pipeline and an interactive web interface. --- ## 🔬 Model Overview CANLoc combines: - **ESM2 (Transformer-based protein language model)** Used for extracting rich sequence embeddings without alignment. - **Mean pooling of residue embeddings** Produces fixed-length feature vectors. - **XGBoost classifier** Trained on curated protein datasets for robust multiclass prediction. ### Predicted Classes - Cytoplasm - Nucleus - Membrane - Mitochondria Each prediction includes **class probabilities** and **confidence visualization.** --- ## 📊 Features - Single sequence prediction - Batch prediction via FASTA file upload - Probability bar chart and radar plot - Confidence-based interpretation - Clean, responsive bioinformatics-style UI - Dockerized for reproducible deployment - FastAPI backend + modern frontend --- ## 🧪 Input Formats ### Single Sequence Paste a raw amino acid sequence: MVKFKKYGIP... ### FASTA File Upload a standard FASTA file with one or multiple sequences: sp|P25296|CANB_YEAST MSLIHPDTAKYPFKFEPF... --- ## 📈 Output Interpretation - **Predicted Location** The most probable subcellular class. - **Class Probabilities** Displayed as percentages for all four classes. - **Confidence Levels** - High: ≥ 75% - Medium: 60–75% - Low: < 60% (interpret with caution) --- ## ⚙️ Evaluation & Validation The model was evaluated using: - Train/test split - 10-fold stratified cross-validation - Precision, recall, F1-score - Sensitivity and specificity analysis - ROC curves per class These evaluations confirm CANLoc’s reliability for academic/research workflows.. --- ## 🚀 Deployment CANLoc is containerized and deployed using **Docker**. ## 📄 License This project is licensed under the Apache License 2.0. >Free for academic and commercial use >Includes patent protection >No restrictions on deployment or modification See the LICENSE file for details. ## 📬 Contact For questions, bug report or feedback: majidkhan.jssmsc@gmail.com ## 📌 Citation If you use CANLoc in academic work, please cite appropriately.