{% extends "base.html" %} {% block content %}

About CANLoc

CANLoc is a machine-learning system designed to predict the subcellular localization of proteins directly from the protein sequence. It combines transformer-based embeddings from the ESM2 model with an optimized XGBoost classifier trained on curated protein datasets.

Performance & Evaluation

CANLoc achieves high accuracy, precision, recall, and F1-scores across all classes. We additionally validate the model using:

  • Train/test split evaluation
  • 10-fold stratified cross-validation
  • ROC curves for each class
  • Sensitivity and specificity analysis

These evaluations confirm that CANLoc predictions are reliable for academic and research workflows.

Intended Use

  • Functional protein studies
  • Localization-oriented drug delivery strategy

Model Strengths

  • Fast and scalable for single or batch prediction
  • Transformer embeddings provide rich biological context
  • High accuracy with interpretable confidence metrics
  • No alignment or preprocessing required beyond the raw sequence

Limitations

  • Performance depends on sequence length and quality
  • Ambiguous sequences may reduce confidence
  • Designed for four major classes only

CANLoc represents a balance between modern deep learning and classical machine learning methods, producing a system that is both reliable and lightweight enough to deploy in real-world web applications.

Eukaryotic cell diagram showing Subcellular Location
{% endblock %}