Spaces:
Sleeping
Sleeping
add knowledge graph visualization section to user guide
Browse files- ProtHGT_app.py +4 -2
- pages/About.py +1 -1
- pages/User_Guide.py +202 -11
ProtHGT_app.py
CHANGED
|
@@ -22,7 +22,7 @@ import random
|
|
| 22 |
# with open(".setup_done", "w") as f:
|
| 23 |
# f.write("done")
|
| 24 |
|
| 25 |
-
# #
|
| 26 |
# loading_placeholder.empty()
|
| 27 |
|
| 28 |
from run_prothgt_app import *
|
|
@@ -70,10 +70,12 @@ with st.expander("π Upcoming Features"):
|
|
| 70 |
|
| 71 |
- **Real-time data retrieval for new proteins**: Currently, ProtHGT can only generate predictions for proteins that already exist in our knowledge graph. We are developing a new feature that will allow users to **predict functions for entirely new proteins starting from their sequences**. This will work by **retrieving relevant relationship data in real time from external source databases** (e.g., UniProt, STRING, and other biological repositories). The system will dynamically construct a knowledge graph for the query protein, incorporating its interactions, domains, pathways, and other biological associations before running function prediction. This approach will enable ProtHGT to analyze newly discovered or less-studied proteins even if they are not pre-annotated in our dataset.
|
| 72 |
- **Expanded embedding options**: Currently, this application represents proteins using **TAPE embeddings**, which serve as the initial numerical representations of protein sequences before being processed in the heterogeneous graph model. We are working on integrating **ProtT5** and **ESM-2** as alternative initial embeddings, allowing users to choose different sequence representations that may enhance performance for specific tasks. A detailed comparison of how these embeddings influence function prediction accuracy will be included in our upcoming publication.
|
| 73 |
-
- **Knowledge graph visualization for interpretability**: To improve model explainability, we are developing an interactive **knowledge graph visualization** feature. This will allow users to explore the biological relationships that contributed to ProtHGT's predictions for a given protein. Users will be able to inspect **protein interactions, GO annotations, domains, pathways, and other key connections** in a structured graphical format, making it easier to interpret and validate predictions.
|
| 74 |
|
| 75 |
Stay tuned for updates and future publications!
|
| 76 |
""")
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
with st.sidebar:
|
| 79 |
|
|
|
|
| 22 |
# with open(".setup_done", "w") as f:
|
| 23 |
# f.write("done")
|
| 24 |
|
| 25 |
+
# # Remove the info message after initialization is complete
|
| 26 |
# loading_placeholder.empty()
|
| 27 |
|
| 28 |
from run_prothgt_app import *
|
|
|
|
| 70 |
|
| 71 |
- **Real-time data retrieval for new proteins**: Currently, ProtHGT can only generate predictions for proteins that already exist in our knowledge graph. We are developing a new feature that will allow users to **predict functions for entirely new proteins starting from their sequences**. This will work by **retrieving relevant relationship data in real time from external source databases** (e.g., UniProt, STRING, and other biological repositories). The system will dynamically construct a knowledge graph for the query protein, incorporating its interactions, domains, pathways, and other biological associations before running function prediction. This approach will enable ProtHGT to analyze newly discovered or less-studied proteins even if they are not pre-annotated in our dataset.
|
| 72 |
- **Expanded embedding options**: Currently, this application represents proteins using **TAPE embeddings**, which serve as the initial numerical representations of protein sequences before being processed in the heterogeneous graph model. We are working on integrating **ProtT5** and **ESM-2** as alternative initial embeddings, allowing users to choose different sequence representations that may enhance performance for specific tasks. A detailed comparison of how these embeddings influence function prediction accuracy will be included in our upcoming publication.
|
|
|
|
| 73 |
|
| 74 |
Stay tuned for updates and future publications!
|
| 75 |
""")
|
| 76 |
+
st.success("""
|
| 77 |
+
β
**Knowledge graph visualization for interpretability** β Now available! Explore the biological relationships behind ProtHGT's predictions interactively. For each query protein, you can inspect protein interactions, GO annotations, domains, pathways, and other key connections in a structured graphical format. Navigate to the **"View Knowledge Graphs"** tab after generating predictions to try it out.
|
| 78 |
+
""")
|
| 79 |
|
| 80 |
with st.sidebar:
|
| 81 |
|
pages/About.py
CHANGED
|
@@ -31,7 +31,7 @@ ProtHGT is a **heterogeneous graph transformer-based model** for automated prote
|
|
| 31 |
|
| 32 |
Using transformer-based message passing, ProtHGT models complex biological relationships by propagating information across proteins and their functional associations in a structured graph. The model represents proteins using initial embeddings from **advanced protein language models (e.g., TAPE, ProtT5)** while integrating contextual information from pathways, domains, and molecular interactions. By employing knowledge **graph attention mechanisms**, ProtHGT learns to prioritize the most biologically relevant connections, improving prediction accuracy and interpretability.
|
| 33 |
|
| 34 |
-
ProtHGT outperforms existing sequence- and graph-based methods, as demonstrated in evaluations on
|
| 35 |
|
| 36 |
Overall workflow of ProtHGT is shown below.
|
| 37 |
""")
|
|
|
|
| 31 |
|
| 32 |
Using transformer-based message passing, ProtHGT models complex biological relationships by propagating information across proteins and their functional associations in a structured graph. The model represents proteins using initial embeddings from **advanced protein language models (e.g., TAPE, ProtT5)** while integrating contextual information from pathways, domains, and molecular interactions. By employing knowledge **graph attention mechanisms**, ProtHGT learns to prioritize the most biologically relevant connections, improving prediction accuracy and interpretability.
|
| 33 |
|
| 34 |
+
ProtHGT outperforms existing sequence- and graph-based methods, as demonstrated in evaluations on DeepHGAT and PROBE datasets. By incorporating a broader biological context through its knowledge graph, **the model improves function prediction across all Gene Ontology (GO) sub-ontologies**. Additionally, its attention-based framework allows researchers to trace predictions back to key contributing relationships in the graph, making it possible to explore new functional links, validate known annotations, and generate testable biological hypotheses.
|
| 35 |
|
| 36 |
Overall workflow of ProtHGT is shown below.
|
| 37 |
""")
|
pages/User_Guide.py
CHANGED
|
@@ -3,14 +3,14 @@ import streamlit as st
|
|
| 3 |
st.sidebar.markdown('''
|
| 4 |
# Sections
|
| 5 |
- [How to use](#how-to-use)
|
|
|
|
|
|
|
| 6 |
''', unsafe_allow_html=True)
|
| 7 |
|
| 8 |
st.markdown('''
|
| 9 |
# ProtHGT User Guide
|
| 10 |
''')
|
| 11 |
|
| 12 |
-
import streamlit as st
|
| 13 |
-
|
| 14 |
st.markdown("""
|
| 15 |
ProtHGT is a web-based tool for **automated protein function prediction** using heterogeneous graph transformers and knowledge graphs. Follow the steps below to generate predictions for your proteins.
|
| 16 |
""")
|
|
@@ -18,6 +18,7 @@ ProtHGT is a web-based tool for **automated protein function prediction** using
|
|
| 18 |
st.subheader("1. Select Proteins")
|
| 19 |
st.markdown("""
|
| 20 |
In the **sidebar**, choose how to input your proteins:
|
|
|
|
| 21 |
- **Search Proteins**: Select or search UniProt IDs from the available dataset.
|
| 22 |
- **Upload a File**: Upload a text file (.txt) containing UniProt IDs (one per line, max 100).
|
| 23 |
""")
|
|
@@ -34,31 +35,221 @@ Select which **Gene Ontology (GO) sub-ontology** to use for function prediction:
|
|
| 34 |
- **All Categories** β Runs predictions for all three categories
|
| 35 |
""")
|
| 36 |
|
| 37 |
-
st.subheader("3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
st.markdown("""
|
| 39 |
Click **"Generate Predictions"** to start the analysis. The model will process the selected proteins and return predicted functional annotations.
|
| 40 |
|
| 41 |
-
π **Processing time**: A few minutes (depending on input size).
|
| 42 |
""")
|
| 43 |
|
| 44 |
-
st.subheader("
|
| 45 |
st.markdown("""
|
| 46 |
Once predictions are generated, use the filter options to refine the output:
|
| 47 |
- **Filter by Protein** (UniProt ID)
|
| 48 |
- **Filter by GO Category**
|
| 49 |
-
- **
|
|
|
|
| 50 |
|
| 51 |
-
Results are displayed in a sortable table, with **probabilities** indicating prediction confidence.
|
| 52 |
""")
|
| 53 |
|
| 54 |
st.info("π₯ Filtered predictions can be downloaded as a CSV file.")
|
| 55 |
|
| 56 |
-
st.subheader("
|
| 57 |
st.markdown("""
|
| 58 |
-
After generating predictions, you can start a new query by selecting different options from the sidebar.
|
| 59 |
""")
|
| 60 |
|
| 61 |
-
st.
|
|
|
|
|
|
|
|
|
|
| 62 |
st.markdown("""
|
| 63 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
""")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
st.sidebar.markdown('''
|
| 4 |
# Sections
|
| 5 |
- [How to use](#how-to-use)
|
| 6 |
+
- [Knowledge Graph Visualization](#knowledge-graph-visualization)
|
| 7 |
+
- [Running Locally](#running-locally)
|
| 8 |
''', unsafe_allow_html=True)
|
| 9 |
|
| 10 |
st.markdown('''
|
| 11 |
# ProtHGT User Guide
|
| 12 |
''')
|
| 13 |
|
|
|
|
|
|
|
| 14 |
st.markdown("""
|
| 15 |
ProtHGT is a web-based tool for **automated protein function prediction** using heterogeneous graph transformers and knowledge graphs. Follow the steps below to generate predictions for your proteins.
|
| 16 |
""")
|
|
|
|
| 18 |
st.subheader("1. Select Proteins")
|
| 19 |
st.markdown("""
|
| 20 |
In the **sidebar**, choose how to input your proteins:
|
| 21 |
+
- **Use example query**: Loads a random set of 5 proteins from the dataset to quickly explore the tool.
|
| 22 |
- **Search Proteins**: Select or search UniProt IDs from the available dataset.
|
| 23 |
- **Upload a File**: Upload a text file (.txt) containing UniProt IDs (one per line, max 100).
|
| 24 |
""")
|
|
|
|
| 35 |
- **All Categories** β Runs predictions for all three categories
|
| 36 |
""")
|
| 37 |
|
| 38 |
+
st.subheader("3. Set Generation Threshold (Optional)")
|
| 39 |
+
st.markdown("""
|
| 40 |
+
Use the **Generation threshold** slider to filter predictions at the point of generation. Only predictions with a probability equal to or above the threshold will be included in the output.
|
| 41 |
+
|
| 42 |
+
Setting a threshold above 0 can significantly **reduce output size** and **speed up the app**, especially when processing many proteins or all GO categories.
|
| 43 |
+
""")
|
| 44 |
+
|
| 45 |
+
st.subheader("4. Generate Predictions")
|
| 46 |
st.markdown("""
|
| 47 |
Click **"Generate Predictions"** to start the analysis. The model will process the selected proteins and return predicted functional annotations.
|
| 48 |
|
| 49 |
+
π **Processing time**: A few minutes (depending on input size). On the **first run of a new session**, the knowledge graph file (~1 GB) will be downloaded automatically β this may take a few additional minutes.
|
| 50 |
""")
|
| 51 |
|
| 52 |
+
st.subheader("5. View and Filter Results")
|
| 53 |
st.markdown("""
|
| 54 |
Once predictions are generated, use the filter options to refine the output:
|
| 55 |
- **Filter by Protein** (UniProt ID)
|
| 56 |
- **Filter by GO Category**
|
| 57 |
+
- **Filter by GO Term ID** β Enter a specific GO term (e.g., GO:0003674) to search for it directly
|
| 58 |
+
- **Set Probability Range** β Adjust prediction confidence thresholds
|
| 59 |
|
| 60 |
+
Results are displayed in a sortable table, with **probabilities** indicating prediction confidence. UniProt IDs and GO IDs are clickable links that open the corresponding entry in UniProt and QuickGO, respectively.
|
| 61 |
""")
|
| 62 |
|
| 63 |
st.info("π₯ Filtered predictions can be downloaded as a CSV file.")
|
| 64 |
|
| 65 |
+
st.subheader("6. Start a New Query")
|
| 66 |
st.markdown("""
|
| 67 |
+
After generating predictions, you can start a new query by selecting different options from the sidebar. Changing the protein selection or GO category will automatically reset the results.
|
| 68 |
""")
|
| 69 |
|
| 70 |
+
st.divider()
|
| 71 |
+
|
| 72 |
+
st.markdown("""## Knowledge Graph Visualization <span style="background-color: #28a745; color: white; font-size: 14px; font-weight: bold; padding: 3px 10px; border-radius: 12px; vertical-align: middle;">NEW</span>""", unsafe_allow_html=True)
|
| 73 |
+
|
| 74 |
st.markdown("""
|
| 75 |
+
After generating predictions, switch to the **"View Knowledge Graphs"** tab to explore the biological context behind the model's predictions. This feature is available when **10 or fewer proteins** are selected.
|
| 76 |
+
""")
|
| 77 |
+
|
| 78 |
+
st.subheader("Generating a Visualization")
|
| 79 |
+
st.markdown("""
|
| 80 |
+
Each query protein gets its own subtab. For each protein:
|
| 81 |
+
|
| 82 |
+
1. Use the **"Maximum neighbors per edge type (first-degree)"** slider to control how many direct neighbors are shown per relationship type. Higher values show a denser graph.
|
| 83 |
+
2. Use the **"Maximum neighbors per edge type (second-degree)"** slider to control neighbors-of-neighbors when second-degree edges are enabled. This is intentionally kept low (2β10) to avoid cluttered graphs.
|
| 84 |
+
3. Click **"Generate Visualization"** to render the graph.
|
| 85 |
+
""")
|
| 86 |
+
|
| 87 |
+
st.subheader("What the Graph Shows")
|
| 88 |
+
st.markdown("""
|
| 89 |
+
The visualization renders a **subgraph of the full ProtHGT knowledge graph**, centered on the query protein. Edges and nodes are selected as follows:
|
| 90 |
+
|
| 91 |
+
- **First-degree edges**: All relationship types that directly connect the query protein to other nodes in the knowledge graph are included (e.g., protein-domain, PPI, GO term annotations). For GO term edges, nodes are ranked by predicted probability and the top-N are shown based on the slider setting.
|
| 92 |
+
- **Second-degree edges**: When enabled via the **"Include second-degree edges"** checkbox, neighbor nodes are also expanded β their own connections (excluding edges back to the query protein) are added to the graph, again limited by the second-degree slider.
|
| 93 |
""")
|
| 94 |
+
|
| 95 |
+
st.subheader("Reading the Graph")
|
| 96 |
+
|
| 97 |
+
st.components.v1.html("""
|
| 98 |
+
<style>
|
| 99 |
+
.kg-legend {
|
| 100 |
+
margin-top: 10px;
|
| 101 |
+
margin-bottom: 10px;
|
| 102 |
+
padding: 20px;
|
| 103 |
+
border: 1px solid #ddd;
|
| 104 |
+
border-radius: 5px;
|
| 105 |
+
font-family: Arial, sans-serif;
|
| 106 |
+
display: flex;
|
| 107 |
+
gap: 20px;
|
| 108 |
+
}
|
| 109 |
+
.legend-section-nodes {
|
| 110 |
+
flex: 2;
|
| 111 |
+
}
|
| 112 |
+
.legend-section-edges {
|
| 113 |
+
flex: 1;
|
| 114 |
+
}
|
| 115 |
+
.legend-title {
|
| 116 |
+
margin-bottom: 15px;
|
| 117 |
+
color: #333;
|
| 118 |
+
font-size: 16px;
|
| 119 |
+
font-weight: bold;
|
| 120 |
+
}
|
| 121 |
+
.nodes-grid {
|
| 122 |
+
display: grid;
|
| 123 |
+
grid-template-columns: repeat(2, 1fr);
|
| 124 |
+
gap: 12px;
|
| 125 |
+
}
|
| 126 |
+
.edges-grid {
|
| 127 |
+
display: grid;
|
| 128 |
+
grid-template-columns: 1fr;
|
| 129 |
+
gap: 12px;
|
| 130 |
+
}
|
| 131 |
+
.legend-item {
|
| 132 |
+
display: flex;
|
| 133 |
+
align-items: center;
|
| 134 |
+
padding: 4px;
|
| 135 |
+
}
|
| 136 |
+
.node-indicator {
|
| 137 |
+
width: 15px;
|
| 138 |
+
height: 15px;
|
| 139 |
+
border-radius: 50%;
|
| 140 |
+
margin-right: 10px;
|
| 141 |
+
flex-shrink: 0;
|
| 142 |
+
}
|
| 143 |
+
.edge-indicator {
|
| 144 |
+
width: 40px;
|
| 145 |
+
height: 3px;
|
| 146 |
+
margin-right: 10px;
|
| 147 |
+
flex-shrink: 0;
|
| 148 |
+
}
|
| 149 |
+
.legend-label {
|
| 150 |
+
font-size: 14px;
|
| 151 |
+
}
|
| 152 |
+
</style>
|
| 153 |
+
<div class="kg-legend">
|
| 154 |
+
<div class="legend-section-nodes">
|
| 155 |
+
<div class="legend-title">Node Types</div>
|
| 156 |
+
<div class="nodes-grid">
|
| 157 |
+
<div class="legend-item">
|
| 158 |
+
<div class="node-indicator" style="background-color: #079dbb;"></div>
|
| 159 |
+
<span class="legend-label">Disease</span>
|
| 160 |
+
</div>
|
| 161 |
+
<div class="legend-item">
|
| 162 |
+
<div class="node-indicator" style="background-color: #58d0e8;"></div>
|
| 163 |
+
<span class="legend-label">Phenotype</span>
|
| 164 |
+
</div>
|
| 165 |
+
<div class="legend-item">
|
| 166 |
+
<div class="node-indicator" style="background-color: #815ac0;"></div>
|
| 167 |
+
<span class="legend-label">Drug</span>
|
| 168 |
+
</div>
|
| 169 |
+
<div class="legend-item">
|
| 170 |
+
<div class="node-indicator" style="background-color: #d2b7e5;"></div>
|
| 171 |
+
<span class="legend-label">Compound</span>
|
| 172 |
+
</div>
|
| 173 |
+
<div class="legend-item">
|
| 174 |
+
<div class="node-indicator" style="background-color: #6bbf59;"></div>
|
| 175 |
+
<span class="legend-label">Domain</span>
|
| 176 |
+
</div>
|
| 177 |
+
<div class="legend-item">
|
| 178 |
+
<div class="node-indicator" style="background-color: #ff8800;"></div>
|
| 179 |
+
<span class="legend-label">Biological Process</span>
|
| 180 |
+
</div>
|
| 181 |
+
<div class="legend-item">
|
| 182 |
+
<div class="node-indicator" style="background-color: #ffaa00;"></div>
|
| 183 |
+
<span class="legend-label">Molecular Function</span>
|
| 184 |
+
</div>
|
| 185 |
+
<div class="legend-item">
|
| 186 |
+
<div class="node-indicator" style="background-color: #ffc300;"></div>
|
| 187 |
+
<span class="legend-label">Cellular Component</span>
|
| 188 |
+
</div>
|
| 189 |
+
<div class="legend-item">
|
| 190 |
+
<div class="node-indicator" style="background-color: #720026;"></div>
|
| 191 |
+
<span class="legend-label">Pathway</span>
|
| 192 |
+
</div>
|
| 193 |
+
<div class="legend-item">
|
| 194 |
+
<div class="node-indicator" style="background-color: #ce4257;"></div>
|
| 195 |
+
<span class="legend-label">EC Number</span>
|
| 196 |
+
</div>
|
| 197 |
+
<div class="legend-item">
|
| 198 |
+
<div class="node-indicator" style="background-color: #3aa6a4;"></div>
|
| 199 |
+
<span class="legend-label">Protein</span>
|
| 200 |
+
</div>
|
| 201 |
+
</div>
|
| 202 |
+
</div>
|
| 203 |
+
<div class="legend-section-edges">
|
| 204 |
+
<div class="legend-title">Edge Colors</div>
|
| 205 |
+
<div class="edges-grid">
|
| 206 |
+
<div class="legend-item">
|
| 207 |
+
<div class="edge-indicator" style="background-color: #8338ec;"></div>
|
| 208 |
+
<span class="legend-label">Confirmed Prediction (Found in Ground Truth)</span>
|
| 209 |
+
</div>
|
| 210 |
+
<div class="legend-item">
|
| 211 |
+
<div class="edge-indicator" style="background-color: #c1121f;"></div>
|
| 212 |
+
<span class="legend-label">Novel Prediction (Not in Ground Truth)</span>
|
| 213 |
+
</div>
|
| 214 |
+
<div class="legend-item">
|
| 215 |
+
<div class="edge-indicator" style="background-color: #219ebc;"></div>
|
| 216 |
+
<span class="legend-label">Existing GO Term Annotation</span>
|
| 217 |
+
</div>
|
| 218 |
+
<div class="legend-item">
|
| 219 |
+
<div class="edge-indicator" style="background-color: #666666;"></div>
|
| 220 |
+
<span class="legend-label">Other Relationships</span>
|
| 221 |
+
</div>
|
| 222 |
+
</div>
|
| 223 |
+
</div>
|
| 224 |
+
</div>
|
| 225 |
+
""", height=320)
|
| 226 |
+
|
| 227 |
+
st.markdown("""
|
| 228 |
+
**Node colors** indicate the biological entity type of each node in the graph. The **query protein** is always shown as a larger node with a white background and red border, making it easy to identify at a glance.
|
| 229 |
+
|
| 230 |
+
**Edge colors** indicate the nature of each connection:
|
| 231 |
+
- **Purple** β The predicted GO term was also found in the ground truth annotations, confirming the model's prediction.
|
| 232 |
+
- **Red** β The predicted GO term was not in the ground truth β this is a novel prediction made by the model.
|
| 233 |
+
- **Blue** β An existing GO term annotation already present in the knowledge graph, included for context but not generated as a new prediction.
|
| 234 |
+
- **Gray** β Other biological relationships such as protein-protein interactions, protein-domain associations, drug-target links, and pathway memberships.
|
| 235 |
+
|
| 236 |
+
Hovering over any node or edge reveals a **tooltip** with the entity name, type, and prediction probability where applicable. Nodes and GO IDs are clickable links to their respective database entries (UniProt, QuickGO, InterPro, etc.).
|
| 237 |
+
""")
|
| 238 |
+
|
| 239 |
+
st.subheader("Controls and Downloads")
|
| 240 |
+
st.markdown("""
|
| 241 |
+
- **Include second-degree edges** checkbox β Toggles between first-degree-only and expanded graph views.
|
| 242 |
+
- **Regenerate Visualization** β Re-renders the graph with updated slider settings. Use this after changing the neighbor limits.
|
| 243 |
+
- **Download Visualized Edges** β Downloads a JSON file containing all edges shown in the current visualization, including source/target node IDs and prediction probabilities.
|
| 244 |
+
""")
|
| 245 |
+
|
| 246 |
+
st.warning("If you change the slider settings after generating a visualization, a warning will appear. Click **'Regenerate Visualization'** to apply the new settings.")
|
| 247 |
+
|
| 248 |
+
st.info("For deeper exploration beyond second-degree connections, the complete knowledge graph can be downloaded from the link provided in the visualization tab.")
|
| 249 |
+
|
| 250 |
+
st.divider()
|
| 251 |
+
|
| 252 |
+
st.markdown("## Running Locally")
|
| 253 |
+
st.markdown("""
|
| 254 |
+
For **larger datasets** or **custom analyses**, you can run ProtHGT locally using our [**GitHub repository**](https://github.com/HUBioDataLab/ProtHGT).
|
| 255 |
+
""")
|