File size: 2,320 Bytes
ecd5b53
 
 
 
0afa2b1
 
 
e477b98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d4025d9
e477b98
 
 
ecd5b53
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
{% extends "base.html" %}
{% block content %}

<div class="main-wrap">
  <section class="card about-flex">

    <div class="about-text">
      <h2>About CANLoc</h2>

      <p>
        CANLoc is a machine-learning system designed to predict the subcellular
        localization of proteins directly from the protein sequence. It combines
        transformer-based embeddings from the <b>ESM2</b> model
        with an optimized <b>XGBoost</b> classifier trained on curated
        protein datasets.
      </p>

      <h3>Performance & Evaluation</h3>

      <p>
        CANLoc achieves high accuracy, precision, recall, and F1-scores across all
        classes. We additionally validate the model using:
      </p>

      <ul>
        <li>Train/test split evaluation</li>
        <li>10-fold stratified cross-validation</li>
        <li>ROC curves for each class</li>
        <li>Sensitivity and specificity analysis</li>
      </ul>

      <p>
        These evaluations confirm that CANLoc predictions are reliable for academic
        and research workflows.
      </p>

      <h3>Intended Use</h3>

      <ul>
        <li>Functional protein studies</li>
        <li>Localization-oriented drug delivery strategy</li>
      </ul>

      <h3>Model Strengths</h3>
      <ul>
        <li>Fast and scalable for single or batch prediction</li>
        <li>Transformer embeddings provide rich biological context</li>
        <li>High accuracy with interpretable confidence metrics</li>
        <li>No alignment or preprocessing required beyond the raw sequence</li>
      </ul>

      <h3>Limitations</h3>
      <ul>
        <li>Performance depends on sequence length and quality</li>
        <li>Ambiguous sequences may reduce confidence</li>
        <li>Designed for four major classes only</li>
      </ul>

      <p>
        CANLoc represents a balance between modern deep learning and classical
        machine learning methods, producing a system that is both
        <b>reliable</b> and <b>lightweight enough to deploy</b>
        in real-world web applications.
      </p>
    </div>

    <div class="about-image">
      <figure>
        <img src="/static/images/cell_diagram.png"
             alt="Eukaryotic cell diagram showing Subcellular Location">
      </figure>
    </div>

  </section>
</div>

{% endblock %}