Spaces:

Valmbd
/

Petimot

Running

App Files Files Community

Valmbd commited on 16 days ago

Commit

b47954d

1 Parent(s): 442fb8f

Add Streamlit explorer app with Docker deployment

Browse files

Files changed (19) hide show

.gitignore +1 -0
Dockerfile +28 -0
README.md +41 -66
app/app.py +145 -0
app/components/__init__.py +0 -0
app/components/mode_panel.py +139 -0
app/components/prediction_analysis.py +320 -0
app/components/sequence_viewer.py +268 -0
app/components/viewer_3d.py +203 -0
app/pages/1_🔍_Explorer.py +169 -0
app/pages/2_🔮_Inference.py +162 -0
app/pages/3_📊_Statistics.py +158 -0
app/requirements.txt +15 -0
app/utils/__init__.py +0 -0
app/utils/data_loader.py +87 -0
app/utils/download.py +121 -0
app/utils/inference.py +98 -0
claude.md +203 -0
requirements.txt +15 -6

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ __pycache__/

Dockerfile ADDED Viewed

	@@ -0,0 +1,28 @@

+FROM python:3.11-slim
+WORKDIR /app
+# System deps
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential git && \
+    rm -rf /var/lib/apt/lists/*
+# Python deps
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy app code
+COPY . .
+# Install PETIMOT package
+RUN pip install --no-cache-dir -e .
+# Streamlit config
+RUN mkdir -p /root/.streamlit
+RUN echo '[server]\nheadless = true\nport = 7860\nenableCORS = false\nenableXsrfProtection = false\n' > /root/.streamlit/config.toml
+EXPOSE 7860
+HEALTHCHECK CMD curl --fail http://localhost:7860/_stcore/health
+ENTRYPOINT ["streamlit", "run", "app/app.py", "--server.port=7860", "--server.address=0.0.0.0"]

README.md CHANGED Viewed

@@ -1,71 +1,46 @@
-# PETIMOT: Protein Motion Inference from Sparse Data
-PETIMOT (Protein sEquence and sTructure-based Inference of MOTions) predicts protein conformational changes using SE(3)-equivariant graph neural networks and pre-trained protein language models.
-## Installation
-```bash
-# Create and activate conda environment
-conda create -n petimot python=3.9
-conda activate petimot
-# Clone and install
-git clone https://github.com/PhyloSofS-Team/PETIMOT.git
-cd petimot
-pip install -r requirements.txt
-```
 ## Usage
-### Reproduce paper results
-1. Download resources from [Figshare](https://figshare.com/s/ab400d852b4669a83b64):
-- Download `default_2025-02-07_21-54-02_epoch_33.pt` into the `weights/` directory
-- Download and extract `ground_truth.zip` into the `ground_truth/` directory
-2. Run inference and evaluation:
-```bash
-python -m petimot infer_and_evaluate \
-    --model-path weights/default_2025-02-07_21-54-02_epoch_33.pt \
-    --list-path eval_list.txt \
-    --ground-truth-path ground_truth/ \
-    --prediction-path predictions/ \
-    --evaluation-path evaluation/
-```
-### Compare with baseline methods
-1. Download baseline predictions from [Figshare](https://figshare.com/s/ab400d852b4669a83b64) :
-- Download and extract `baseline_predictions.zip` into the `baselines/` directory
-2. Run evaluation:
 ```bash
-python -m petimot evaluate \
-    --prediction-path baselines/alphaflow_pdb_distilled/ \
-    --ground-truth-path ground_truth/ \
-    --output-path evaluation/
-```
-Available baseline predictions:
-- AlphaFlow (distilled)
-- ESMFlow (distilled)
-- Normal Mode Analysis
-### Predict motions for your own PDB files
-```bash
-# Single PDB structure
-python -m petimot infer \
-    --model-path weights/default_2025-02-07_21-54-02_epoch_33.pt \
-    --list-path protein.pdb \
-    --output-path predictions/
-# Multiple structures (provide paths in a text file)
-python -m petimot infer \
-    --model-path weights/default_2025-02-07_21-54-02_epoch_33.pt \
-    --list-path protein_list.txt \
-    --output-path predictions/
 ```

+---
+title: PETIMOT Explorer
+emoji: 🧬
+colorFrom: indigo
+colorTo: purple
+sdk: streamlit
+sdk_version: "1.40.0"
+app_file: app/app.py
+pinned: true
+license: gpl-3.0
+tags:
+  - protein
+  - motion
+  - GNN
+  - bioinformatics
+  - structural-biology
+short_description: "Explore SE(3)-equivariant protein motion predictions"
+---
+# 🧬 PETIMOT Explorer
+**Protein Motion Inference from Sparse Data**
+Interactive explorer for protein motion predictions using SE(3)-equivariant Graph Neural Networks.
+## Features
+- 🔍 **Explorer** — Browse pre-computed predictions for ~36K proteins
+- 🔮 **Inference** — Predict motion for any protein (PDB ID or upload)
+- 📊 **Statistics** — Dataset-wide analysis and distributions
+- 🎨 **3D Viewer** — Interactive motion visualization with displacement arrows
+- 🧬 **Sequence View** — Per-residue displacement heatmap with coverage overlay
+## Paper
+> Lombard, Grudinin & Laine — *PETIMOT: SE(3)-Equivariant GNNs for Protein Motion Prediction*
+> [arXiv 2504.02839](https://arxiv.org/abs/2504.02839)
 ## Usage
 ```bash
+git clone https://github.com/PhyloSofS-Team/PETIMOT
+cd PETIMOT
+pip install -r app/requirements.txt
+streamlit run app/app.py
 ```

app/app.py ADDED Viewed

	@@ -0,0 +1,145 @@

+import streamlit as st
+import os, sys
+# ── Page Config ──
+st.set_page_config(
+    page_title="PETIMOT Explorer",
+    page_icon="🧬",
+    layout="wide",
+    initial_sidebar_state="expanded",
+)
+# ── Ensure PETIMOT is importable ──
+PETIMOT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+if PETIMOT_ROOT not in sys.path:
+    sys.path.insert(0, PETIMOT_ROOT)
+# ── Custom CSS ──
+st.markdown("""
+<style>
+    /* Dark theme overrides */
+    .stApp { background-color: #0f0d1a; }
+    .block-container { padding-top: 1rem; }
+    /* Sidebar styling */
+    section[data-testid="stSidebar"] {
+        background-color: #1a1730;
+        border-right: 1px solid #2d2b55;
+    }
+    /* Headers */
+    h1, h2, h3 { color: #c4b5fd !important; }
+    /* Metric cards */
+    [data-testid="stMetric"] {
+        background-color: #1e1b4b;
+        border: 1px solid #312e81;
+        border-radius: 12px;
+        padding: 12px 16px;
+    }
+    [data-testid="stMetricLabel"] { color: #a5b4fc !important; }
+    [data-testid="stMetricValue"] { color: #e0e7ff !important; }
+    /* Dataframe */
+    .stDataFrame { border-radius: 8px; overflow: hidden; }
+    /* Tabs */
+    .stTabs [data-baseweb="tab"] {
+        background-color: #1e1b4b;
+        border-radius: 8px 8px 0 0;
+        color: #a5b4fc;
+    }
+    .stTabs [data-baseweb="tab"][aria-selected="true"] {
+        background-color: #312e81;
+        color: white;
+    }
+</style>
+""", unsafe_allow_html=True)
+# ── Sidebar ──
+with st.sidebar:
+    st.image("https://raw.githubusercontent.com/PhyloSofS-Team/PETIMOT/main/logo.png",
+             use_container_width=True)
+    st.markdown("# 🧬 PETIMOT")
+    st.markdown("**Protein Motion from Sparse Data**")
+    st.markdown("SE(3)-Equivariant GNNs")
+    st.divider()
+    # Global settings
+    st.markdown("### ⚙️ Settings")
+    weights_dir = os.path.join(PETIMOT_ROOT, "weights")
+    pt_files = []
+    if os.path.isdir(weights_dir):
+        for root, dirs, files in os.walk(weights_dir):
+            for f in files:
+                if f.endswith(".pt"):
+                    pt_files.append(os.path.join(root, f))
+    if pt_files:
+        selected_weights = st.selectbox(
+            "Model weights",
+            pt_files,
+            format_func=lambda x: os.path.basename(x),
+            key="weights"
+        )
+    else:
+        selected_weights = None
+        st.warning("No weights found in `weights/`")
+    st.divider()
+    st.markdown("""
+    **Links**
+    - [Paper](https://arxiv.org/abs/2504.02839)
+    - [GitHub](https://github.com/PhyloSofS-Team/PETIMOT)
+    - [Data](https://figshare.com/s/ab400d852b4669a83b64)
+    """)
+    st.caption("GPL-3.0 · Lombard, Grudinin & Laine")
+# ── Main Page ──
+st.title("🧬 PETIMOT Explorer")
+st.markdown("""
+Explore protein motion predictions from the PETIMOT framework.
+Navigate using the sidebar pages:
+| Page | Description |
+|------|-------------|
+| 🔍 **Explorer** | Browse pre-computed predictions for ~36K proteins |
+| 🔮 **Inference** | Predict motion for a new protein (PDB ID or upload) |
+| 📊 **Statistics** | Dataset-wide analysis and distributions |
+""")
+# ── Data Status ──
+from app.utils.download import check_data_status, ensure_weights
+status = check_data_status(PETIMOT_ROOT)
+col1, col2, col3 = st.columns(3)
+with col1:
+    st.metric("Ground Truth", f"{status['ground_truth']:,}",
+              delta="✅" if status['has_gt'] else "Missing")
+with col2:
+    st.metric("Predictions", f"{status['predictions']:,}",
+              delta="✅" if status['has_predictions'] else "Not yet computed")
+with col3:
+    st.metric("Model Weights", "4.7M params",
+              delta="✅" if status['has_weights'] else "Missing")
+# Auto-download if missing
+if not status['has_weights']:
+    st.divider()
+    st.warning("⚠️ Model weights not found.")
+    if st.button("⬇️ Download weights from Figshare (18 MB)", type="primary"):
+        with st.spinner("Downloading..."):
+            wt = ensure_weights(PETIMOT_ROOT)
+        if wt:
+            st.success(f"✅ Weights downloaded: {os.path.basename(wt)}")
+            st.rerun()
+        else:
+            st.error("Download failed. Please manually download from "
+                     "[Figshare](https://figshare.com/s/ab400d852b4669a83b64) "
+                     "and place in `weights/`")
+if not status['has_predictions'] and status['has_weights']:
+    st.info("💡 No pre-computed predictions yet. Use the **Inference** page to predict "
+            "individual proteins, or run batch inference from the Colab notebook.")

app/components/__init__.py ADDED Viewed

File without changes

app/components/mode_panel.py ADDED Viewed

	@@ -0,0 +1,139 @@

+"""Mode selection panel with per-mode statistics."""
+import streamlit as st
+import numpy as np
+import plotly.graph_objects as go
+def render_mode_panel(
+    modes: dict,
+    seq: str = "",
+    eigenvalues: np.ndarray = None,
+) -> int:
+    """Render mode selector with per-mode stats. Returns selected mode index.
+    Args:
+        modes: {k: np.ndarray (N, 3)} displacement vectors per mode
+        seq: Amino acid sequence
+        eigenvalues: Ground truth eigenvalues (if available)
+    Returns:
+        Selected mode index
+    """
+    n_modes = len(modes)
+    if n_modes == 0:
+        st.warning("No modes available")
+        return 0
+    # Mode tabs
+    tabs = st.tabs([f"Mode {k}" for k in range(n_modes)])
+    selected = 0
+    for k in range(n_modes):
+        with tabs[k]:
+            vecs = modes[k]
+            mags = np.linalg.norm(vecs, axis=1)
+            n_res = len(mags)
+            # Stats columns
+            c1, c2, c3, c4 = st.columns(4)
+            c1.metric("Mean", f"{mags.mean():.3f} Å")
+            c2.metric("Max", f"{mags.max():.3f} Å")
+            c3.metric("Std", f"{mags.std():.3f} Å")
+            if eigenvalues is not None and k < len(eigenvalues):
+                c4.metric("λ", f"{eigenvalues[k]:.4f}")
+            else:
+                c4.metric("Residues", f"{n_res}")
+            # Top mobile residues
+            top5 = np.argsort(mags)[-5:][::-1]
+            top_data = []
+            for idx in top5:
+                aa = seq[idx] if idx < len(seq) else "?"
+                top_data.append({
+                    "Residue": f"{aa}{idx + 1}",
+                    "Displacement": f"{mags[idx]:.3f} Å",
+                    "Rank": f"#{np.where(np.argsort(mags)[::-1] == idx)[0][0] + 1}",
+                })
+            st.markdown("**Most mobile residues:**")
+            st.dataframe(top_data, use_container_width=True, hide_index=True)
+    selected = st.session_state.get("_active_mode_tab", 0)
+    return selected
+def render_mode_correlation(modes: dict):
+    """Render mode correlation matrix as Plotly heatmap."""
+    n_modes = len(modes)
+    if n_modes < 2:
+        return
+    # Compute displacement profile correlation
+    profiles = []
+    for k in sorted(modes.keys()):
+        mags = np.linalg.norm(modes[k], axis=1)
+        profiles.append(mags)
+    corr = np.corrcoef(profiles)
+    fig = go.Figure(go.Heatmap(
+        z=corr,
+        x=[f"M{k}" for k in range(n_modes)],
+        y=[f"M{k}" for k in range(n_modes)],
+        colorscale="RdBu_r",
+        zmin=-1, zmax=1,
+        text=np.round(corr, 2),
+        texttemplate="%{text:.2f}",
+        textfont={"size": 12},
+    ))
+    fig.update_layout(
+        title="Mode Displacement Correlation",
+        template="plotly_dark",
+        height=300, width=300,
+        paper_bgcolor="rgba(0,0,0,0)",
+        plot_bgcolor="rgba(30,27,75,0.5)",
+        margin=dict(l=30, r=30, t=40, b=30),
+    )
+    st.plotly_chart(fig, use_container_width=False)
+def render_eigenvalue_spectrum(eigenvalues: np.ndarray):
+    """Render eigenvalue bar chart with cumulative variance line."""
+    if eigenvalues is None or len(eigenvalues) == 0:
+        return
+    fig = go.Figure()
+    # Bars
+    fig.add_trace(go.Bar(
+        x=[f"λ{k+1}" for k in range(len(eigenvalues))],
+        y=eigenvalues,
+        marker_color="#6366f1",
+        name="Eigenvalue",
+    ))
+    # Cumulative variance line
+    cum = np.cumsum(eigenvalues) / eigenvalues.sum() * 100
+    fig.add_trace(go.Scatter(
+        x=[f"λ{k+1}" for k in range(len(eigenvalues))],
+        y=cum,
+        mode="lines+markers",
+        name="Cumul. variance %",
+        marker=dict(color="#ef4444", size=6),
+        line=dict(color="#ef4444", width=2),
+        yaxis="y2",
+    ))
+    fig.update_layout(
+        title="Eigenvalue Spectrum",
+        template="plotly_dark",
+        height=250,
+        paper_bgcolor="rgba(0,0,0,0)",
+        plot_bgcolor="rgba(30,27,75,0.5)",
+        yaxis=dict(title="Eigenvalue"),
+        yaxis2=dict(title="Cumul. %", overlaying="y", side="right", range=[0, 105]),
+        legend=dict(orientation="h", y=1.15),
+        margin=dict(l=40, r=40, t=40, b=30),
+    )
+    st.plotly_chart(fig, use_container_width=True)

app/components/prediction_analysis.py ADDED Viewed

	@@ -0,0 +1,320 @@

+"""Enhanced prediction analysis — sign-invariant modes and per-residue normalization."""
+import numpy as np
+import streamlit as st
+import plotly.graph_objects as go
+from plotly.subplots import make_subplots
+def canonicalize_sign(modes: dict) -> dict:
+    """Make eigenvectors sign-consistent.
+    Eigenvectors are defined up to ±1 global sign. We canonicalize by choosing
+    the sign such that the component with the largest absolute value is positive.
+    This ensures consistent visualization across different runs/proteins.
+    """
+    canonical = {}
+    for k, vecs in modes.items():
+        # Flatten to (3N,), find component with max absolute value
+        flat = vecs.flatten()
+        max_idx = np.argmax(np.abs(flat))
+        if flat[max_idx] < 0:
+            canonical[k] = -vecs  # Flip sign
+        else:
+            canonical[k] = vecs.copy()
+    return canonical
+def per_residue_relative_norm(vecs: np.ndarray) -> np.ndarray:
+    """Normalize displacement magnitudes to [0, 1] relative to max.
+    Args:
+        vecs: (N, 3) displacement vectors
+    Returns:
+        (N,) relative magnitudes in [0, 1]
+    """
+    mags = np.linalg.norm(vecs, axis=1)
+    max_m = mags.max()
+    return mags / max_m if max_m > 1e-12 else mags
+def per_residue_direction(vecs: np.ndarray, ca_coords: np.ndarray) -> np.ndarray:
+    """Compute relative direction of displacement vs protein backbone.
+    Projects displacement onto local backbone direction (CA_i → CA_{i+1}).
+    Returns signed projection: positive = along backbone, negative = against.
+    Args:
+        vecs: (N, 3) displacement vectors
+        ca_coords: (N, 3) CA coordinates
+    Returns:
+        (N,) signed projections normalized by displacement magnitude
+    """
+    n = len(vecs)
+    projections = np.zeros(n)
+    for i in range(n):
+        # Local backbone direction
+        if i < n - 1:
+            backbone = ca_coords[i + 1] - ca_coords[i]
+        else:
+            backbone = ca_coords[i] - ca_coords[i - 1]
+        bb_norm = np.linalg.norm(backbone)
+        if bb_norm < 1e-8:
+            continue
+        disp_mag = np.linalg.norm(vecs[i])
+        if disp_mag < 1e-8:
+            continue
+        # Cosine angle between displacement and backbone direction
+        projections[i] = np.dot(vecs[i], backbone) / (disp_mag * bb_norm)
+    return projections
+def render_prediction_analysis(
+    modes: dict,
+    seq: str,
+    ca_coords: np.ndarray = None,
+    coverage: np.ndarray = None,
+    eigenvalues: np.ndarray = None,
+    gt_modes: dict = None,
+    protein_name: str = "",
+):
+    """Comprehensive prediction analysis panel.
+    Shows:
+    1. Normalized displacement heatmap (all modes × residues)
+    2. Sign-canonical direction analysis
+    3. Prediction vs ground truth comparison (if available)
+    4. Per-residue statistics table
+    """
+    # Canonicalize signs
+    modes_c = canonicalize_sign(modes)
+    n_modes = len(modes_c)
+    n_res = len(list(modes_c.values())[0])
+    if coverage is None:
+        coverage = np.ones(n_res)
+    # ── Tab layout ──
+    tab_norm, tab_dir, tab_compare, tab_table = st.tabs([
+        "📊 Normalized Displacement", "🧭 Direction Analysis",
+        "⚖️ Pred vs GT", "📋 Per-Residue Table"
+    ])
+    # ═══════════════════════════════════════════
+    # Tab 1: Normalized displacement heatmap
+    # ═══════════════════════════════════════════
+    with tab_norm:
+        # Compute relative norms for all modes
+        rel_norms = np.zeros((n_modes, n_res))
+        abs_mags = np.zeros((n_modes, n_res))
+        for k in range(n_modes):
+            abs_mags[k] = np.linalg.norm(modes_c[k], axis=1)
+            rel_norms[k] = per_residue_relative_norm(modes_c[k])
+        # Hover text with sequence
+        hover = [[f"{seq[j] if j < len(seq) else '?'}{j+1}<br>"
+                   f"Abs: {abs_mags[k][j]:.3f}Å<br>"
+                   f"Rel: {rel_norms[k][j]:.2%}<br>"
+                   f"Cov: {coverage[j]:.2f}"
+                   for j in range(n_res)] for k in range(n_modes)]
+        fig = make_subplots(rows=3, cols=1, row_heights=[0.4, 0.4, 0.2],
+                            shared_xaxes=True, vertical_spacing=0.06,
+                            subplot_titles=["Absolute Displacement (Å)",
+                                            "Relative Displacement (0-1)",
+                                            "Coverage"])
+        # Absolute heatmap
+        fig.add_trace(go.Heatmap(
+            z=abs_mags, colorscale="YlOrRd",
+            y=[f"Mode {k}" for k in range(n_modes)],
+            text=hover, hovertemplate="%{text}<extra></extra>",
+            colorbar=dict(title="Å", x=1.01, len=0.35, y=0.85),
+        ), row=1, col=1)
+        # Relative heatmap
+        fig.add_trace(go.Heatmap(
+            z=rel_norms, colorscale="Viridis", zmin=0, zmax=1,
+            y=[f"Mode {k}" for k in range(n_modes)],
+            text=hover, hovertemplate="%{text}<extra></extra>",
+            colorbar=dict(title="Rel", x=1.08, len=0.35, y=0.5),
+        ), row=2, col=1)
+        # Coverage bar
+        fig.add_trace(go.Bar(
+            x=list(range(n_res)), y=coverage[:n_res],
+            marker_color=["#10b981" if c > 0.5 else "#ef4444" for c in coverage[:n_res]],
+            hovertemplate="Res %{x}<br>Coverage: %{y:.3f}<extra></extra>",
+            showlegend=False,
+        ), row=3, col=1)
+        # Sequence ticks
+        step = max(1, n_res // 50)
+        tick_vals = list(range(0, n_res, step))
+        tick_text = [f"{seq[i] if i < len(seq) else '?'}{i+1}" for i in tick_vals]
+        fig.update_xaxes(tickvals=tick_vals, ticktext=tick_text, tickangle=45,
+                          tickfont=dict(size=8), row=3, col=1)
+        fig.update_layout(
+            template="plotly_dark", height=550,
+            paper_bgcolor="rgba(0,0,0,0)",
+            plot_bgcolor="rgba(30,27,75,0.3)",
+            margin=dict(l=60, r=80, t=30, b=50),
+        )
+        st.plotly_chart(fig, use_container_width=True)
+        # Key insight
+        for k in range(min(n_modes, 4)):
+            top3 = np.argsort(abs_mags[k])[-3:][::-1]
+            top_str = ", ".join([f"**{seq[i] if i<len(seq) else '?'}{i+1}** ({abs_mags[k][i]:.2f}Å)"
+                                  for i in top3])
+            st.markdown(f"Mode {k} hotspots: {top_str}")
+    # ═══════════════════════════════════════════
+    # Tab 2: Direction analysis
+    # ═══════════════════════════════════════════
+    with tab_dir:
+        if ca_coords is not None and len(ca_coords) == n_res:
+            st.markdown("""
+            **Direction Analysis**: Projects displacement onto the local backbone direction (CA→CA).
+            - 🔵 **Blue** = motion along backbone (stretching/compressing)
+            - 🔴 **Red** = motion perpendicular to backbone (lateral/hinge)
+            - Sign is arbitrary for eigenvectors → we show absolute cosine similarity
+            """)
+            fig_dir = go.Figure()
+            colors = ["#6366f1", "#ef4444", "#10b981", "#f59e0b"]
+            for k in range(min(n_modes, 4)):
+                proj = per_residue_direction(modes_c[k], ca_coords)
+                # Show absolute cosine (sign-invariant)
+                abs_proj = np.abs(proj)
+                fig_dir.add_trace(go.Scatter(
+                    x=list(range(1, n_res + 1)), y=abs_proj,
+                    mode="lines", name=f"Mode {k}",
+                    line=dict(color=colors[k], width=1.5),
+                    fill="tozeroy",
+                    fillcolor=colors[k].replace(")", ",0.1)").replace("#", "rgba(").replace(
+                        "rgba(6366f1", "rgba(99,102,241").replace(
+                        "rgba(ef4444", "rgba(239,68,68").replace(
+                        "rgba(10b981", "rgba(16,185,129").replace(
+                        "rgba(f59e0b", "rgba(245,158,11"),
+                    hovertemplate="Res %{x}<br>|cos θ|: %{y:.3f}<extra>Mode " + str(k) + "</extra>",
+                ))
+            fig_dir.add_hline(y=0.5, line_dash="dash", line_color="#94a3b8",
+                               annotation_text="isotropic threshold")
+            fig_dir.update_layout(
+                template="plotly_dark", height=350,
+                paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(30,27,75,0.3)",
+                xaxis_title="Residue", yaxis_title="|cos θ| (backbone projection)",
+                yaxis_range=[0, 1.05],
+                margin=dict(l=50, r=20, t=30, b=50),
+            )
+            st.plotly_chart(fig_dir, use_container_width=True)
+            # Direction heatmap
+            st.markdown("**Per-residue × mode direction matrix:**")
+            dir_matrix = np.zeros((n_modes, n_res))
+            for k in range(n_modes):
+                dir_matrix[k] = np.abs(per_residue_direction(modes_c[k], ca_coords))
+            fig_dh = go.Figure(go.Heatmap(
+                z=dir_matrix, colorscale="RdBu_r", zmin=0, zmax=1,
+                y=[f"Mode {k}" for k in range(n_modes)],
+                colorbar=dict(title="|cos θ|"),
+            ))
+            fig_dh.update_layout(
+                template="plotly_dark", height=200,
+                paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(30,27,75,0.3)",
+                margin=dict(l=60, r=60, t=10, b=30),
+            )
+            st.plotly_chart(fig_dh, use_container_width=True)
+        else:
+            st.info("Direction analysis requires CA coordinates (ground truth or PDB needed)")
+    # ═══════════════════════════════════════════
+    # Tab 3: Prediction vs Ground Truth
+    # ═════��═════════════════════════════════════
+    with tab_compare:
+        if gt_modes is not None and len(gt_modes) > 0:
+            gt_c = canonicalize_sign(gt_modes)
+            n_gt = len(gt_c)
+            st.markdown("**Pred vs GT displacement profiles (sign-canonicalized):**")
+            for k in range(min(n_modes, n_gt, 4)):
+                pred_mag = np.linalg.norm(modes_c[k], axis=1)
+                gt_mag = np.linalg.norm(gt_c[k], axis=1)
+                # Normalize both to [0, 1]
+                pred_rel = pred_mag / (pred_mag.max() + 1e-12)
+                gt_rel = gt_mag / (gt_mag.max() + 1e-12)
+                fig_cmp = go.Figure()
+                fig_cmp.add_trace(go.Scatter(
+                    x=list(range(1, n_res + 1)), y=gt_rel,
+                    mode="lines", name="Ground Truth",
+                    line=dict(color="#10b981", width=2),
+                ))
+                fig_cmp.add_trace(go.Scatter(
+                    x=list(range(1, n_res + 1)), y=pred_rel,
+                    mode="lines", name="Prediction",
+                    line=dict(color="#6366f1", width=2, dash="dot"),
+                ))
+                # Correlation
+                corr = np.corrcoef(pred_rel, gt_rel)[0, 1]
+                rmse = np.sqrt(np.mean((pred_rel - gt_rel) ** 2))
+                fig_cmp.update_layout(
+                    template="plotly_dark", height=200,
+                    title=f"Mode {k} — r={corr:.3f}, RMSE={rmse:.3f}",
+                    paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(30,27,75,0.3)",
+                    margin=dict(l=40, r=20, t=40, b=30),
+                    legend=dict(orientation="h", y=1.15),
+                )
+                st.plotly_chart(fig_cmp, use_container_width=True)
+        else:
+            st.info("No ground truth available for comparison. "
+                    "Ground truth is only available for proteins in the training database.")
+    # ═══════════════════════════════════════════
+    # Tab 4: Per-residue table
+    # ═══════════════════════════════════════════
+    with tab_table:
+        import pandas as pd
+        rows = []
+        for i in range(n_res):
+            row = {
+                "Residue": i + 1,
+                "AA": seq[i] if i < len(seq) else "?",
+                "Coverage": f"{coverage[i]:.3f}" if i < len(coverage) else "—",
+            }
+            for k in range(min(n_modes, 4)):
+                mag = np.linalg.norm(modes_c[k][i])
+                rel = per_residue_relative_norm(modes_c[k])[i]
+                row[f"M{k} (Å)"] = f"{mag:.3f}"
+                row[f"M{k} rel"] = f"{rel:.2%}"
+            rows.append(row)
+        df = pd.DataFrame(rows)
+        st.dataframe(df, use_container_width=True, height=500,
+                      column_config={
+                          "Residue": st.column_config.NumberColumn(width="small"),
+                          "AA": st.column_config.TextColumn(width="small"),
+                      })
+        # Download CSV
+        csv = df.to_csv(index=False)
+        st.download_button("📥 Download CSV", csv,
+                            f"{protein_name}_analysis.csv", "text/csv")

app/components/sequence_viewer.py ADDED Viewed

	@@ -0,0 +1,268 @@

+"""Interactive sequence viewer with per-residue displacement heatmap."""
+import streamlit as st
+import numpy as np
+# Amino acid property classification
+AA_PROPS = {
+    "A": "hydrophobic", "I": "hydrophobic", "L": "hydrophobic", "M": "hydrophobic",
+    "F": "hydrophobic", "W": "hydrophobic", "V": "hydrophobic", "P": "hydrophobic",
+    "D": "charged", "E": "charged", "K": "charged", "R": "charged", "H": "charged",
+    "S": "polar", "T": "polar", "N": "polar", "Q": "polar", "C": "polar", "Y": "polar",
+    "G": "special", "X": "unknown",
+}
+PROP_COLORS = {
+    "hydrophobic": "#f59e0b",
+    "charged": "#ef4444",
+    "polar": "#10b981",
+    "special": "#94a3b8",
+    "unknown": "#64748b",
+}
+def render_sequence_viewer(
+    seq: str,
+    displacements: np.ndarray,
+    coverage: np.ndarray = None,
+    mode_label: str = "Mode 0",
+    max_chars_per_row: int = 80,
+):
+    """Render interactive HTML sequence viewer with displacement coloring.
+    Each residue is displayed as a colored cell where:
+    - Background: displacement magnitude (white → red gradient)
+    - Border: coverage (thick = low coverage)
+    - Tooltip: residue info
+    Args:
+        seq: Amino acid sequence (1-letter codes)
+        displacements: Per-residue displacement magnitudes (N,)
+        coverage: Per-residue coverage (N,) in [0, 1]
+        mode_label: Label for the current mode
+        max_chars_per_row: Characters per row before wrapping
+    """
+    n = len(seq)
+    if coverage is None:
+        coverage = np.ones(n)
+    max_d = displacements.max() + 1e-8
+    html = f"""
+    <style>
+        .seq-container {{
+            font-family: 'Consolas', 'Monaco', monospace;
+            background: #1e1b4b;
+            border-radius: 8px;
+            padding: 12px;
+            margin: 8px 0;
+        }}
+        .seq-header {{
+            color: #a5b4fc;
+            font-size: 13px;
+            margin-bottom: 8px;
+            font-weight: bold;
+        }}
+        .seq-row {{
+            display: flex;
+            flex-wrap: wrap;
+            gap: 1px;
+            margin-bottom: 2px;
+        }}
+        .res {{
+            display: inline-flex;
+            align-items: center;
+            justify-content: center;
+            width: 14px;
+            height: 22px;
+            font-size: 10px;
+            font-weight: bold;
+            border-radius: 2px;
+            cursor: pointer;
+            transition: transform 0.1s;
+            position: relative;
+        }}
+        .res:hover {{
+            transform: scale(1.8);
+            z-index: 10;
+            box-shadow: 0 0 8px rgba(99, 102, 241, 0.8);
+        }}
+        .res:hover::after {{
+            content: attr(data-tooltip);
+            position: absolute;
+            top: -38px;
+            left: 50%;
+            transform: translateX(-50%);
+            background: #312e81;
+            color: white;
+            padding: 4px 8px;
+            border-radius: 4px;
+            font-size: 10px;
+            white-space: nowrap;
+            z-index: 100;
+            border: 1px solid #6366f1;
+        }}
+        .seq-ruler {{
+            display: flex;
+            gap: 1px;
+            margin-bottom: 1px;
+        }}
+        .ruler-mark {{
+            width: 14px;
+            font-size: 7px;
+            color: #64748b;
+            text-align: center;
+        }}
+        .legend {{
+            display: flex;
+            gap: 16px;
+            margin-top: 8px;
+            font-size: 11px;
+            color: #94a3b8;
+        }}
+        .legend-item {{
+            display: flex;
+            align-items: center;
+            gap: 4px;
+        }}
+        .legend-swatch {{
+            width: 12px;
+            height: 12px;
+            border-radius: 2px;
+        }}
+    </style>
+    <div class="seq-container">
+        <div class="seq-header">{mode_label} — Per-residue displacement ({n} residues)</div>
+    """
+    # Build rows
+    for row_start in range(0, n, max_chars_per_row):
+        row_end = min(row_start + max_chars_per_row, n)
+        # Ruler
+        html += '<div class="seq-ruler">'
+        for i in range(row_start, row_end):
+            if (i + 1) % 10 == 0:
+                html += f'<div class="ruler-mark">{i + 1}</div>'
+            elif (i + 1) % 5 == 0:
+                html += '<div class="ruler-mark">·</div>'
+            else:
+                html += '<div class="ruler-mark"></div>'
+        html += "</div>"
+        # Residues
+        html += '<div class="seq-row">'
+        for i in range(row_start, row_end):
+            aa = seq[i] if i < len(seq) else "X"
+            d = displacements[i]
+            c = coverage[i] if i < len(coverage) else 1.0
+            t = d / max_d  # Normalized displacement
+            # Background: displacement heatmap (dark purple → bright red)
+            r = int(30 + 225 * t)
+            g = int(27 + 20 * (1 - t))
+            b = int(75 - 50 * t)
+            bg = f"rgb({r},{g},{b})"
+            # Text color: white for high displacement, light for low
+            txt_color = "white" if t > 0.3 else "#a5b4fc"
+            # Border: thicker = lower coverage
+            border_w = max(0, int(3 * (1 - c)))
+            border = f"{border_w}px solid #ef4444" if border_w > 0 else "none"
+            prop = AA_PROPS.get(aa, "unknown")
+            tooltip = f"{aa}{i+1} | {d:.3f}Å | cov={c:.2f} | {prop}"
+            html += (
+                f'<div class="res" style="background:{bg};color:{txt_color};'
+                f'border:{border}" data-tooltip="{tooltip}">{aa}</div>'
+            )
+        html += "</div>"
+    # Legend
+    html += """
+    <div class="legend">
+        <div class="legend-item">
+            <div class="legend-swatch" style="background:linear-gradient(90deg,#1e1b4b,#ff3030)"></div>
+            Low → High displacement
+        </div>
+        <div class="legend-item">
+            <div class="legend-swatch" style="border:2px solid #ef4444;background:none"></div>
+            Red border = low coverage
+        </div>
+    </div>
+    </div>
+    """
+    st.markdown(html, unsafe_allow_html=True)
+def render_displacement_chart(
+    displacements: dict,
+    seq: str = "",
+    coverage: np.ndarray = None,
+):
+    """Render interactive displacement profile chart using Plotly.
+    Args:
+        displacements: {mode_idx: np.ndarray of per-residue magnitudes}
+        seq: Amino acid sequence
+        coverage: Per-residue coverage
+    """
+    import plotly.graph_objects as go
+    from plotly.subplots import make_subplots
+    n_modes = len(displacements)
+    n_res = len(list(displacements.values())[0])
+    residues = np.arange(1, n_res + 1)
+    # Hover text with AA identity
+    hover_text = [f"{seq[i] if i < len(seq) else '?'}{i+1}" for i in range(n_res)]
+    fig = make_subplots(
+        rows=2, cols=1, row_heights=[0.75, 0.25],
+        shared_xaxes=True, vertical_spacing=0.08,
+        subplot_titles=["Displacement by Mode", "Coverage"]
+    )
+    colors = ["#6366f1", "#ef4444", "#10b981", "#f59e0b", "#ec4899", "#8b5cf6"]
+    for k, d in displacements.items():
+        mags = np.linalg.norm(d, axis=1) if d.ndim == 2 else d
+        fig.add_trace(go.Scatter(
+            x=residues, y=mags,
+            mode="lines",
+            name=f"Mode {k} (μ={mags.mean():.3f}Å)",
+            line=dict(color=colors[k % len(colors)], width=1.5),
+            fill="tozeroy",
+            fillcolor=colors[k % len(colors)].replace(")", ",0.1)").replace("rgb", "rgba"),
+            text=hover_text,
+            hovertemplate="%{text}<br>Displacement: %{y:.3f}Å<extra>Mode " + str(k) + "</extra>",
+        ), row=1, col=1)
+    # Coverage
+    if coverage is not None:
+        fig.add_trace(go.Scatter(
+            x=residues, y=coverage[:n_res],
+            mode="lines",
+            name="Coverage",
+            line=dict(color="#94a3b8", width=1.5),
+            fill="tozeroy",
+            fillcolor="rgba(148,163,184,0.15)",
+            hovertemplate="%{x}<br>Coverage: %{y:.3f}<extra></extra>",
+        ), row=2, col=1)
+    fig.update_layout(
+        template="plotly_dark",
+        height=400,
+        paper_bgcolor="rgba(0,0,0,0)",
+        plot_bgcolor="rgba(30,27,75,0.5)",
+        legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
+        margin=dict(l=50, r=20, t=40, b=30),
+    )
+    fig.update_xaxes(title_text="Residue", row=2, col=1)
+    fig.update_yaxes(title_text="Displacement (Å)", row=1, col=1)
+    fig.update_yaxes(title_text="Coverage", range=[0, 1.1], row=2, col=1)
+    st.plotly_chart(fig, use_container_width=True, key=f"disp_chart")

app/components/viewer_3d.py ADDED Viewed

	@@ -0,0 +1,203 @@

+"""3D protein motion visualization using py3Dmol via stmol."""
+import streamlit as st
+import numpy as np
+try:
+    from stmol import showmol
+    import py3Dmol
+    HAS_STMOL = True
+except ImportError:
+    HAS_STMOL = False
+def render_motion_viewer(
+    pdb_text: str,
+    ca_coords: np.ndarray,
+    mode_vecs: np.ndarray,
+    seq: str = "",
+    amplitude: float = 3.0,
+    arrow_scale: float = 1.0,
+    color_scheme: str = "magnitude",
+    show_cartoon: bool = True,
+    show_labels: bool = True,
+    min_displacement: float = 0.01,
+    width: int = 800,
+    height: int = 500,
+    key: str = "viewer",
+):
+    """Render interactive 3D motion viewer with displacement arrows.
+    Args:
+        pdb_text: PDB file content as string
+        ca_coords: CA coordinates (N, 3)
+        mode_vecs: Displacement vectors for one mode (N, 3)
+        seq: Amino acid sequence (1-letter)
+        amplitude: Arrow length multiplier
+        arrow_scale: Arrow thickness multiplier
+        color_scheme: "magnitude"|"rainbow"|"bfactor"|"chain"|"residue_type"
+        show_cartoon: Show cartoon backbone
+        show_labels: Label top mobile residues
+        min_displacement: Hide arrows below this threshold
+        width, height: Viewer dimensions
+        key: Streamlit widget key
+    """
+    if not HAS_STMOL:
+        st.error("Install `stmol`: `pip install stmol`")
+        return
+    n_res = len(ca_coords)
+    mags = np.linalg.norm(mode_vecs, axis=1)
+    max_mag = mags.max() + 1e-8
+    view = py3Dmol.view(width=width, height=height)
+    view.addModel(pdb_text, "pdb")
+    # Backbone style
+    if show_cartoon:
+        view.setStyle({"cartoon": {"color": "#e2e8f0", "opacity": 0.45}})
+    else:
+        view.setStyle({"stick": {"radius": 0.06, "color": "#94a3b8"}})
+    # Draw displacement arrows
+    for i in range(n_res):
+        if mags[i] < min_displacement:
+            continue
+        s = ca_coords[i]
+        d = mode_vecs[i] * amplitude
+        e = s + d
+        t = mags[i] / max_mag  # Normalized intensity [0, 1]
+        # Color assignment
+        col = _get_color(i, t, n_res, seq, color_scheme)
+        # Arrow shaft — radius proportional to displacement
+        base_r = 0.08 * arrow_scale
+        shaft_r = base_r + base_r * t
+        view.addCylinder({
+            "start": {"x": float(s[0]), "y": float(s[1]), "z": float(s[2])},
+            "end": {"x": float(e[0]), "y": float(e[1]), "z": float(e[2])},
+            "radius": shaft_r,
+            "color": col,
+            "fromCap": True,
+        })
+        # Arrow tip (cone-like)
+        dn = d / (np.linalg.norm(d) + 1e-8)
+        tip = e + dn * 0.25 * amplitude * arrow_scale
+        tip_r = shaft_r * 2.2
+        view.addCylinder({
+            "start": {"x": float(e[0]), "y": float(e[1]), "z": float(e[2])},
+            "end": {"x": float(tip[0]), "y": float(tip[1]), "z": float(tip[2])},
+            "radius": tip_r,
+            "color": col,
+            "toCap": True,
+        })
+    # Label top-5 mobile residues
+    if show_labels and n_res > 0:
+        top_n = min(5, n_res)
+        top_idx = np.argsort(mags)[-top_n:][::-1]
+        for idx in top_idx:
+            if mags[idx] < min_displacement:
+                continue
+            pos = ca_coords[idx]
+            aa = seq[idx] if idx < len(seq) else "?"
+            view.addLabel(
+                f"{aa}{idx + 1}\n{mags[idx]:.2f}Å",
+                {
+                    "position": {"x": float(pos[0]), "y": float(pos[1] + 2.5), "z": float(pos[2])},
+                    "fontSize": 11,
+                    "fontColor": "white",
+                    "backgroundColor": "#312e81",
+                    "backgroundOpacity": 0.85,
+                    "borderColor": "#6366f1",
+                    "borderThickness": 1,
+                },
+            )
+    view.zoomTo()
+    showmol(view, height=height, width=width)
+def _get_color(idx: int, intensity: float, n_res: int, seq: str, scheme: str) -> str:
+    """Get color for a residue based on the color scheme."""
+    if scheme == "magnitude":
+        # Blue → Purple → Red gradient
+        r = int(99 + 156 * intensity)
+        g = int(102 - 62 * intensity)
+        b = int(241 - 180 * intensity)
+        return f"rgb({r},{g},{b})"
+    elif scheme == "rainbow":
+        import colorsys
+        h = idx / max(n_res - 1, 1)
+        r, g, b = [int(255 * c) for c in colorsys.hsv_to_rgb(h, 0.85, 0.92)]
+        return f"rgb({r},{g},{b})"
+    elif scheme == "residue_type":
+        aa = seq[idx] if idx < len(seq) else "X"
+        hydrophobic = "AILMFWVP"
+        charged = "DEKRH"
+        polar = "STNQCY"
+        if aa in hydrophobic: return "#f59e0b"
+        if aa in charged: return "#ef4444"
+        if aa in polar: return "#10b981"
+        return "#94a3b8"
+    elif scheme == "bfactor":
+        r = int(255 * intensity)
+        g = int(100 * (1 - intensity))
+        b = int(50 * (1 - intensity))
+        return f"rgb({r},{g},{b})"
+    else:
+        return "#6366f1"
+def render_mode_comparison(
+    pdb_text: str,
+    ca_coords: np.ndarray,
+    modes: dict,
+    seq: str = "",
+    amplitude: float = 3.0,
+    arrow_scale: float = 1.0,
+    width: int = 900,
+    height: int = 350,
+):
+    """Render side-by-side mode comparison grid."""
+    if not HAS_STMOL:
+        st.error("Install `stmol`")
+        return
+    n_modes = min(4, len(modes))
+    if n_modes == 0:
+        st.warning("No modes to display")
+        return
+    colors = ["#6366f1", "#ef4444", "#10b981", "#f59e0b"]
+    cols = st.columns(n_modes)
+    for k in range(n_modes):
+        with cols[k]:
+            vecs = modes[k]
+            mags = np.linalg.norm(vecs, axis=1)
+            st.caption(f"**Mode {k}** · μ={mags.mean():.3f}Å · max={mags.max():.3f}Å")
+            view = py3Dmol.view(width=width // n_modes, height=height)
+            view.addModel(pdb_text, "pdb")
+            view.setStyle({"cartoon": {"color": "#e2e8f0", "opacity": 0.35}})
+            max_m = mags.max() + 1e-8
+            for i in range(len(ca_coords)):
+                if mags[i] < 0.01: continue
+                s = ca_coords[i]; d = vecs[i] * amplitude; e = s + d
+                t = mags[i] / max_m
+                view.addCylinder({
+                    "start": {"x": float(s[0]), "y": float(s[1]), "z": float(s[2])},
+                    "end": {"x": float(e[0]), "y": float(e[1]), "z": float(e[2])},
+                    "radius": 0.08 * arrow_scale + 0.05 * t * arrow_scale,
+                    "color": colors[k],
+                })
+            view.zoomTo()
+            showmol(view, height=height, width=width // n_modes)

app/pages/1_🔍_Explorer.py ADDED Viewed

	@@ -0,0 +1,169 @@

+"""🔍 Database Explorer — Browse pre-computed PETIMOT predictions."""
+import streamlit as st
+import os, sys
+import numpy as np
+# Imports
+ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+PETIMOT_ROOT = os.path.dirname(ROOT)
+if PETIMOT_ROOT not in sys.path:
+    sys.path.insert(0, PETIMOT_ROOT)
+from app.utils.data_loader import (
+    find_predictions_dir, load_prediction_index, load_modes, load_ground_truth
+)
+from app.components.viewer_3d import render_motion_viewer, render_mode_comparison
+from app.components.sequence_viewer import render_sequence_viewer, render_displacement_chart
+from app.components.mode_panel import render_mode_correlation, render_eigenvalue_spectrum
+from app.components.prediction_analysis import render_prediction_analysis
+st.header("🔍 Database Explorer")
+# ── Find predictions ──
+pred_dir = find_predictions_dir(PETIMOT_ROOT)
+gt_dir = os.path.join(PETIMOT_ROOT, "ground_truth")
+if not pred_dir:
+    st.error("No predictions found. Run inference first (notebook cell 4.3 or Inference page).")
+    st.stop()
+# ── Load index ──
+with st.spinner("Loading prediction index..."):
+    df = load_prediction_index(pred_dir)
+if df.empty:
+    st.warning("No predictions in index.")
+    st.stop()
+st.success(f"**{len(df):,}** proteins indexed from `{os.path.basename(pred_dir)}`")
+# ── Filters ──
+with st.expander("🔧 Filters", expanded=False):
+    col1, col2, col3 = st.columns(3)
+    with col1:
+        search = st.text_input("🔍 Search by name", "", placeholder="e.g. 1ake")
+    with col2:
+        len_range = st.slider("Sequence length", int(df.seq_len.min()), int(df.seq_len.max()),
+                               (int(df.seq_len.min()), int(df.seq_len.max())))
+    with col3:
+        sort_by = st.selectbox("Sort by", ["name", "seq_len", "mean_disp_m0", "max_disp_m0"],
+                                index=2)
+mask = (df.seq_len >= len_range[0]) & (df.seq_len <= len_range[1])
+if search:
+    mask &= df.name.str.contains(search, case=False, na=False)
+df_filtered = df[mask].sort_values(sort_by, ascending=(sort_by == "name"))
+st.markdown(f"Showing **{len(df_filtered)}** / {len(df)} proteins")
+# ── Table ──
+selected_idx = st.dataframe(
+    df_filtered[["name", "seq_len", "n_modes", "mean_disp_m0", "max_disp_m0", "top_residue"]].rename(
+        columns={"name": "Protein", "seq_len": "Length", "n_modes": "Modes",
+                 "mean_disp_m0": "Mean Δ (M0)", "max_disp_m0": "Max Δ (M0)", "top_residue": "Top Res"}
+    ),
+    use_container_width=True, hide_index=True,
+    on_select="rerun", selection_mode="single-row", height=350,
+)
+# ── Protein detail panel ──
+selected_rows = selected_idx.selection.rows if selected_idx.selection.rows else []
+if not selected_rows:
+    st.info("👆 Click a row to view detailed analysis")
+    st.stop()
+protein_name = df_filtered.iloc[selected_rows[0]]["name"]
+st.divider()
+st.subheader(f"🧬 {protein_name}")
+# Load data
+modes = load_modes(pred_dir, protein_name)
+gt = load_ground_truth(gt_dir, protein_name)
+if not modes:
+    st.error(f"No mode files found for {protein_name}")
+    st.stop()
+n_res = len(list(modes.values())[0])
+seq = gt.get("seq", "X" * n_res) if gt else "X" * n_res
+ca = gt["bb"][:, 1] if gt and "bb" in gt else np.zeros((n_res, 3))
+coverage = gt.get("coverage", np.ones(n_res)) if gt else np.ones(n_res)
+eigenvalues = gt.get("eigvals", None) if gt else None
+pdb_text = None
+pdb_path = os.path.join(PETIMOT_ROOT, "pdbs", f"{protein_name}.pdb")
+if os.path.exists(pdb_path):
+    with open(pdb_path) as f:
+        pdb_text = f.read()
+# ── Sidebar controls ──
+with st.sidebar:
+    st.divider()
+    st.markdown(f"### 🎛️ {protein_name}")
+    mode_idx = st.slider("Mode", 0, len(modes) - 1, 0, key="mode_sel")
+    amplitude = st.slider("Arrow amplitude", 0.5, 15.0, 3.0, 0.5, key="amp")
+    arrow_scale = st.slider("Arrow thickness", 0.3, 3.0, 1.0, 0.1, key="arrow_s")
+    color_scheme = st.selectbox("Color", ["magnitude", "rainbow", "residue_type", "bfactor"], key="col_s")
+    show_labels = st.checkbox("Show labels", True, key="labels")
+    min_disp = st.slider("Min displacement", 0.0, 0.1, 0.01, 0.005, key="min_d")
+# ── 3D viewer + stats ──
+col_3d, col_info = st.columns([2, 1])
+with col_3d:
+    current_mode = modes[mode_idx]
+    mags = np.linalg.norm(current_mode, axis=1)
+    if pdb_text and ca is not None and np.any(ca != 0):
+        render_motion_viewer(
+            pdb_text=pdb_text, ca_coords=ca, mode_vecs=current_mode, seq=seq,
+            amplitude=amplitude, arrow_scale=arrow_scale, color_scheme=color_scheme,
+            show_labels=show_labels, min_displacement=min_disp,
+            width=700, height=480,
+        )
+    else:
+        st.info("No PDB structure — showing displacement data only.")
+with col_info:
+    st.metric("Residues", n_res)
+    st.metric(f"Mode {mode_idx} mean Δ", f"{mags.mean():.3f} Å")
+    st.metric(f"Mode {mode_idx} max Δ", f"{mags.max():.3f} Å")
+    top5 = np.argsort(mags)[-5:][::-1]
+    st.markdown("**Top mobile residues:**")
+    for rank, idx in enumerate(top5):
+        aa = seq[idx] if idx < len(seq) else "?"
+        st.markdown(f"`#{rank+1}` **{aa}{idx+1}** — {mags[idx]:.3f} Å")
+    if eigenvalues is not None:
+        render_eigenvalue_spectrum(eigenvalues)
+# ── Sequence viewer ──
+st.markdown("### 🧬 Sequence × Displacement")
+render_sequence_viewer(seq, mags, coverage, mode_label=f"Mode {mode_idx}")
+# ── Displacement chart ──
+st.markdown("### 📈 Displacement Profiles")
+render_displacement_chart(modes, seq, coverage)
+# ── Mode comparison ──
+st.markdown("### 🔀 Mode Comparison")
+col_corr, col_grid = st.columns([1, 2])
+with col_corr:
+    render_mode_correlation(modes)
+with col_grid:
+    if pdb_text and np.any(ca != 0):
+        render_mode_comparison(pdb_text, ca, modes, seq, amplitude, arrow_scale)
+# ── Deep Analysis ──
+st.divider()
+st.markdown("### 🔬 Prediction Analysis")
+gt_modes = None
+if gt and "eigvects" in gt:
+    n_gt_modes = min(4, gt["eigvects"].shape[1] if gt["eigvects"].ndim == 2 else 4)
+    ev = gt["eigvects"][:, :n_gt_modes]  # (3N, K)
+    ev = ev.reshape(-1, 3, n_gt_modes).transpose(0, 2, 1)  # (N, K, 3)
+    gt_modes = {k: ev[:, k] for k in range(n_gt_modes)}
+render_prediction_analysis(
+    modes=modes, seq=seq, ca_coords=ca, coverage=coverage,
+    eigenvalues=eigenvalues, gt_modes=gt_modes, protein_name=protein_name
+)

app/pages/2_🔮_Inference.py ADDED Viewed

	@@ -0,0 +1,162 @@

+"""🔮 Inference — Predict motion for a new protein."""
+import streamlit as st
+import os, sys, tempfile
+import numpy as np
+ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+PETIMOT_ROOT = os.path.dirname(ROOT)
+if PETIMOT_ROOT not in sys.path:
+    sys.path.insert(0, PETIMOT_ROOT)
+from app.utils.inference import run_inference, download_pdb
+from app.components.viewer_3d import render_motion_viewer, render_mode_comparison
+from app.components.sequence_viewer import render_sequence_viewer, render_displacement_chart
+from app.components.mode_panel import render_mode_correlation, render_eigenvalue_spectrum
+st.header("🔮 Custom Inference")
+st.markdown("Predict protein motion modes for any structure. Runs on CPU (~5-30s).")
+# ── Input method ──
+input_mode = st.radio("Input method", ["PDB ID (RCSB)", "Upload PDB file"], horizontal=True)
+pdb_path = None
+if input_mode == "PDB ID (RCSB)":
+    col1, col2 = st.columns([3, 1])
+    with col1:
+        pdb_id = st.text_input("PDB ID", "1akeA", placeholder="e.g. 1akeA, 4akeA",
+                                 help="4-char PDB code + optional chain letter")
+    with col2:
+        st.markdown("<br>", unsafe_allow_html=True)
+        fetch = st.button("🔍 Fetch", use_container_width=True)
+    if fetch and pdb_id:
+        with st.spinner(f"Downloading {pdb_id} from RCSB..."):
+            pdb_path = download_pdb(pdb_id)
+        if pdb_path:
+            st.success(f"Downloaded {pdb_id}")
+        else:
+            st.error(f"Could not download {pdb_id}. Check the PDB ID.")
+else:
+    uploaded = st.file_uploader("Upload PDB", type=["pdb"], key="pdb_upload")
+    if uploaded:
+        tmp = tempfile.NamedTemporaryFile(suffix=".pdb", delete=False)
+        tmp.write(uploaded.read())
+        tmp.close()
+        pdb_path = tmp.name
+        st.success(f"Uploaded: {uploaded.name}")
+# ── Weights selection ──
+weights_path = st.session_state.get("weights", None)
+if not weights_path:
+    weights_dir = os.path.join(PETIMOT_ROOT, "weights")
+    pt_files = []
+    if os.path.isdir(weights_dir):
+        for root, dirs, files in os.walk(weights_dir):
+            for f in files:
+                if f.endswith(".pt"):
+                    pt_files.append(os.path.join(root, f))
+    if pt_files:
+        weights_path = pt_files[0]
+if not weights_path:
+    st.error("No model weights found in `weights/`. Download them from Figshare.")
+    st.stop()
+# ── Run inference ──
+if pdb_path:
+    if st.button("🚀 Run PETIMOT Inference", type="primary", use_container_width=True):
+        with st.spinner("Running inference... (loading embeddings + forward pass)"):
+            try:
+                result = run_inference(pdb_path, weights_path)
+            except Exception as e:
+                st.error(f"Inference failed: {e}")
+                st.exception(e)
+                st.stop()
+        st.session_state["inference_result"] = result
+        st.success(f"✅ Predicted {len(result['modes'])} modes for {result['name']} ({result['n_res']} residues)")
+# ── Display results ──
+result = st.session_state.get("inference_result", None)
+if result:
+    modes = result["modes"]
+    ca = result["ca_coords"]
+    seq = result["seq"]
+    pdb_text = result["pdb_text"]
+    n_res = result["n_res"]
+    st.divider()
+    st.subheader(f"🧬 {result['name']} — {n_res} residues, {len(modes)} modes")
+    # Controls
+    with st.sidebar:
+        st.divider()
+        st.markdown(f"### 🎛️ {result['name']}")
+        mode_idx = st.slider("Mode", 0, max(0, len(modes) - 1), 0, key="inf_mode")
+        amplitude = st.slider("Amplitude", 0.5, 15.0, 3.0, 0.5, key="inf_amp")
+        arrow_scale = st.slider("Arrow size", 0.3, 3.0, 1.0, 0.1, key="inf_arrow")
+        color_scheme = st.selectbox("Colors",
+                                     ["magnitude", "rainbow", "residue_type", "bfactor"],
+                                     key="inf_color")
+    # 3D viewer
+    current_mode = modes.get(mode_idx, list(modes.values())[0])
+    mags = np.linalg.norm(current_mode, axis=1)
+    col_3d, col_stats = st.columns([2, 1])
+    with col_3d:
+        render_motion_viewer(
+            pdb_text=pdb_text, ca_coords=ca, mode_vecs=current_mode, seq=seq,
+            amplitude=amplitude, arrow_scale=arrow_scale, color_scheme=color_scheme,
+            width=700, height=500, key="inf_viewer",
+        )
+    with col_stats:
+        st.metric("Residues", n_res)
+        st.metric(f"Mode {mode_idx} mean", f"{mags.mean():.3f} Å")
+        st.metric(f"Mode {mode_idx} max", f"{mags.max():.3f} Å")
+        top3 = np.argsort(mags)[-3:][::-1]
+        st.markdown("**Top mobile:**")
+        for idx in top3:
+            aa = seq[idx] if idx < len(seq) else "?"
+            st.markdown(f"**{aa}{idx+1}** — {mags[idx]:.3f} Å")
+    # Sequence viewer
+    st.markdown("### 🧬 Sequence × Displacement")
+    render_sequence_viewer(seq, mags, mode_label=f"Mode {mode_idx}")
+    # Displacement chart
+    st.markdown("### 📈 Profiles")
+    render_displacement_chart(modes, seq)
+    # Mode comparison
+    if len(modes) > 1:
+        st.markdown("### 🔀 Mode Comparison")
+        render_mode_comparison(pdb_text, ca, modes, seq, amplitude, arrow_scale)
+        render_mode_correlation(modes)
+    # Export
+    st.divider()
+    st.markdown("### 💾 Export")
+    col_e1, col_e2 = st.columns(2)
+    with col_e1:
+        # CSV export
+        import pandas as pd
+        export_data = []
+        for k, v in modes.items():
+            m = np.linalg.norm(v, axis=1)
+            for i in range(len(m)):
+                export_data.append({
+                    "residue": i + 1,
+                    "aa": seq[i] if i < len(seq) else "?",
+                    "mode": k,
+                    "dx": v[i, 0], "dy": v[i, 1], "dz": v[i, 2],
+                    "magnitude": m[i],
+                })
+        csv = pd.DataFrame(export_data).to_csv(index=False)
+        st.download_button("📥 Download modes (CSV)", csv,
+                            f"{result['name']}_modes.csv", "text/csv")
+    with col_e2:
+        st.download_button("📥 Download PDB", pdb_text,
+                            f"{result['name']}.pdb", "chemical/x-pdb")

app/pages/3_📊_Statistics.py ADDED Viewed

	@@ -0,0 +1,158 @@

+"""📊 Statistics — Dataset-wide analysis of PETIMOT predictions."""
+import streamlit as st
+import os, sys
+import numpy as np
+import plotly.graph_objects as go
+from plotly.subplots import make_subplots
+ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+PETIMOT_ROOT = os.path.dirname(ROOT)
+if PETIMOT_ROOT not in sys.path:
+    sys.path.insert(0, PETIMOT_ROOT)
+from app.utils.data_loader import find_predictions_dir, load_prediction_index
+st.header("📊 Dataset Statistics")
+pred_dir = find_predictions_dir(PETIMOT_ROOT)
+if not pred_dir:
+    st.warning("No predictions found. Run batch inference first.")
+    st.stop()
+with st.spinner("Building index..."):
+    df = load_prediction_index(pred_dir)
+if df.empty:
+    st.warning("Empty index")
+    st.stop()
+st.success(f"**{len(df):,}** proteins analyzed")
+# ── Summary metrics ──
+c1, c2, c3, c4, c5 = st.columns(5)
+c1.metric("Proteins", f"{len(df):,}")
+c2.metric("Median length", f"{df.seq_len.median():.0f}")
+c3.metric("Mean Δ (M0)", f"{df.mean_disp_m0.mean():.3f} Å")
+c4.metric("Max Δ (M0)", f"{df.max_disp_m0.max():.3f} Å")
+c5.metric("Avg modes", f"{df.n_modes.mean():.1f}")
+# ── Distribution plots ──
+st.markdown("### 📈 Distributions")
+fig = make_subplots(rows=2, cols=2,
+    subplot_titles=[
+        "Sequence Length Distribution",
+        "Mean Mode-0 Displacement",
+        "Max Mode-0 Displacement",
+        "Length vs Mean Displacement",
+    ])
+# 1. Sequence length
+fig.add_trace(go.Histogram(
+    x=df.seq_len, nbinsx=60, marker_color="#6366f1", name="Length",
+    hovertemplate="Length: %{x}<br>Count: %{y}<extra></extra>",
+), row=1, col=1)
+fig.add_vline(x=df.seq_len.median(), line_dash="dash", line_color="#ef4444",
+              annotation_text=f"Med={df.seq_len.median():.0f}", row=1, col=1)
+# 2. Mean displacement
+fig.add_trace(go.Histogram(
+    x=df.mean_disp_m0, nbinsx=60, marker_color="#10b981", name="Mean Δ",
+    hovertemplate="Mean Δ: %{x:.3f}Å<br>Count: %{y}<extra></extra>",
+), row=1, col=2)
+# 3. Max displacement
+fig.add_trace(go.Histogram(
+    x=df.max_disp_m0, nbinsx=60, marker_color="#f59e0b", name="Max Δ",
+    hovertemplate="Max Δ: %{x:.3f}Å<br>Count: %{y}<extra></extra>",
+), row=2, col=1)
+# 4. Scatter: length vs displacement
+fig.add_trace(go.Scattergl(
+    x=df.seq_len, y=df.mean_disp_m0,
+    mode="markers",
+    marker=dict(size=3, color=df.max_disp_m0, colorscale="Viridis",
+                showscale=True, colorbar=dict(title="Max Δ")),
+    name="Proteins",
+    hovertemplate="%{text}<br>Length: %{x}<br>Mean Δ: %{y:.3f}Å<extra></extra>",
+    text=df.name,
+), row=2, col=2)
+fig.update_layout(
+    template="plotly_dark",
+    height=600,
+    showlegend=False,
+    paper_bgcolor="rgba(0,0,0,0)",
+    plot_bgcolor="rgba(30,27,75,0.5)",
+    margin=dict(l=40, r=40, t=40, b=30),
+)
+st.plotly_chart(fig, use_container_width=True)
+# ── Ground truth analysis (if available) ──
+gt_dir = os.path.join(PETIMOT_ROOT, "ground_truth")
+if os.path.isdir(gt_dir) and len(os.listdir(gt_dir)) > 0:
+    st.markdown("### 🎯 Ground Truth Analysis")
+    st.info("Loading eigenvalue statistics from ground truth...")
+    import torch, glob
+    gt_files = sorted(glob.glob(os.path.join(gt_dir, "*.pt")))[:2000]
+    eigendata = []
+    for gf in gt_files:
+        try:
+            d = torch.load(gf, map_location="cpu", weights_only=True)
+            ev = d["eigvals"].numpy() if "eigvals" in d else None
+            cov = d["coverage"].numpy() if "coverage" in d else None
+            if ev is not None:
+                eigendata.append({
+                    "name": os.path.basename(gf).replace(".pt", ""),
+                    "n_res": len(d["bb"]),
+                    "eigval_0": float(ev[0]),
+                    "eigval_ratio": float(ev[0] / (ev[1] + 1e-12)) if len(ev) > 1 else 0,
+                    "mode1_var_frac": float(ev[0] / (ev.sum() + 1e-12)),
+                    "mean_coverage": float(cov.mean()) if cov is not None else 1.0,
+                })
+        except Exception:
+            continue
+    if eigendata:
+        import pandas as pd
+        df_gt = pd.DataFrame(eigendata)
+        fig2 = make_subplots(rows=1, cols=3,
+            subplot_titles=["Dominant Eigenvalue (λ₁)", "Mode Dominance (λ₁/λ₂)", "Mode 1 Variance %"])
+        fig2.add_trace(go.Histogram(
+            x=df_gt.eigval_0, nbinsx=60, marker_color="#ec4899", name="λ₁",
+        ), row=1, col=1)
+        fig2.add_trace(go.Histogram(
+            x=df_gt.eigval_ratio, nbinsx=60, marker_color="#8b5cf6", name="λ₁/λ₂",
+        ), row=1, col=2)
+        fig2.add_trace(go.Histogram(
+            x=df_gt.mode1_var_frac * 100, nbinsx=50, marker_color="#06b6d4", name="Var %",
+        ), row=1, col=3)
+        fig2.update_layout(
+            template="plotly_dark", height=300, showlegend=False,
+            paper_bgcolor="rgba(0,0,0,0)",
+            plot_bgcolor="rgba(30,27,75,0.5)",
+            margin=dict(l=40, r=20, t=40, b=30),
+        )
+        st.plotly_chart(fig2, use_container_width=True)
+        # Summary
+        st.markdown(f"""
+        | Metric | Value |
+        |--------|-------|
+        | Samples analyzed | {len(df_gt):,} |
+        | λ₁ median | {df_gt.eigval_0.median():.4f} |
+        | λ₁/λ₂ median | {df_gt.eigval_ratio.median():.2f} |
+        | Mode 1 variance | {df_gt.mode1_var_frac.median()*100:.1f}% median |
+        | Coverage | {df_gt.mean_coverage.mean():.3f} ± {df_gt.mean_coverage.std():.3f} |
+        """)
+# ── Data table (searchable) ──
+st.markdown("### 📋 Full Data Table")
+st.dataframe(df, use_container_width=True, height=400)

app/requirements.txt ADDED Viewed

	@@ -0,0 +1,15 @@

+# PETIMOT Streamlit Explorer
+streamlit>=1.30.0
+stmol>=0.0.9
+py3Dmol>=2.0.0
+plotly>=5.18.0
+pandas>=2.0.0
+numpy>=1.24.0
+torch>=2.0.0
+torch_geometric>=2.0.0
+transformers>=4.30.0
+sentencepiece>=0.1.99
+scipy>=1.10.0
+biopython>=1.80
+requests>=2.28.0
+tqdm>=4.65.0

app/utils/__init__.py ADDED Viewed

File without changes

app/utils/data_loader.py ADDED Viewed

	@@ -0,0 +1,87 @@

+"""Data loading utilities for pre-computed PETIMOT predictions."""
+import os, json, glob, torch
+import numpy as np
+import pandas as pd
+from pathlib import Path
+from functools import lru_cache
+def find_predictions_dir(root: str) -> str | None:
+    """Find the predictions directory (most recent model)."""
+    pred_root = os.path.join(root, "predictions")
+    if not os.path.isdir(pred_root):
+        return None
+    subdirs = [os.path.join(pred_root, d) for d in os.listdir(pred_root)
+               if os.path.isdir(os.path.join(pred_root, d))]
+    if not subdirs:
+        return None
+    return max(subdirs, key=os.path.getmtime)
+@lru_cache(maxsize=1)
+def load_prediction_index(pred_dir: str) -> pd.DataFrame:
+    """Build index of all predicted proteins with metadata."""
+    rows = []
+    mode_files = glob.glob(os.path.join(pred_dir, "*_mode_0.txt"))
+    for mf in mode_files:
+        base = os.path.basename(mf).replace("_mode_0.txt", "")
+        # Load mode 0 for stats
+        try:
+            vecs = np.loadtxt(mf)
+            n_res = len(vecs)
+            mag = np.linalg.norm(vecs, axis=1)
+            # Count available modes
+            n_modes = 0
+            for k in range(10):
+                if os.path.exists(os.path.join(pred_dir, f"{base}_mode_{k}.txt")):
+                    n_modes += 1
+                else:
+                    break
+            rows.append({
+                "name": base,
+                "seq_len": n_res,
+                "n_modes": n_modes,
+                "mean_disp_m0": float(mag.mean()),
+                "max_disp_m0": float(mag.max()),
+                "top_residue": int(np.argmax(mag)) + 1,
+            })
+        except Exception:
+            continue
+    return pd.DataFrame(rows).sort_values("name").reset_index(drop=True)
+def load_modes(pred_dir: str, name: str) -> dict[int, np.ndarray]:
+    """Load all mode files for a protein."""
+    modes = {}
+    for k in range(10):
+        for pfx in [f"extracted_{name}", name]:
+            mf = os.path.join(pred_dir, f"{pfx}_mode_{k}.txt")
+            if os.path.exists(mf):
+                modes[k] = np.loadtxt(mf)
+                break
+    return modes
+def load_ground_truth(gt_dir: str, name: str) -> dict | None:
+    """Load ground truth data for a protein."""
+    path = os.path.join(gt_dir, f"{name}.pt")
+    if not os.path.exists(path):
+        return None
+    try:
+        data = torch.load(path, map_location="cpu", weights_only=True)
+        return {k: v.numpy() if isinstance(v, torch.Tensor) else v
+                for k, v in data.items()}
+    except Exception:
+        return None
+def load_pdb_text(pdb_path: str) -> str | None:
+    """Load PDB file as text."""
+    if not os.path.exists(pdb_path):
+        return None
+    with open(pdb_path) as f:
+        return f.read()

app/utils/download.py ADDED Viewed

	@@ -0,0 +1,121 @@

+"""Auto-download PETIMOT data from Figshare on first run."""
+import os, zipfile, requests
+from pathlib import Path
+from tqdm import tqdm
+FIGSHARE_PRIVATE_KEY = "ab400d852b4669a83b64"
+FIGSHARE_FILES = {
+    "ground_truth.zip": "52349453",
+    "default_2025-02-07_21-54-02_epoch_33.pt": "52349456",
+    "baseline_predictions.zip": "52349480",
+}
+def download_file(url: str, dest: str, desc: str = "") -> bool:
+    """Download a file with progress bar."""
+    try:
+        r = requests.get(url, stream=True, allow_redirects=True, timeout=60)
+        r.raise_for_status()
+        total = int(r.headers.get("content-length", 0))
+        with open(dest, "wb") as f:
+            with tqdm(total=total, unit="B", unit_scale=True, desc=desc) as pbar:
+                for chunk in r.iter_content(8192):
+                    f.write(chunk)
+                    pbar.update(len(chunk))
+        if os.path.getsize(dest) < 1000:
+            os.remove(dest)
+            return False
+        return True
+    except Exception as e:
+        print(f"Download failed: {e}")
+        return False
+def ensure_weights(root: str) -> str | None:
+    """Ensure model weights are available. Returns path to weights or None."""
+    weights_dir = os.path.join(root, "weights")
+    os.makedirs(weights_dir, exist_ok=True)
+    # Check for existing weights
+    for f in os.listdir(weights_dir):
+        if f.endswith(".pt"):
+            return os.path.join(weights_dir, f)
+    # Try downloading from Figshare
+    wt_name = "default_2025-02-07_21-54-02_epoch_33.pt"
+    wt_path = os.path.join(weights_dir, wt_name)
+    fid = FIGSHARE_FILES[wt_name]
+    url = f"https://figshare.com/ndownloader/files/{fid}?private_link={FIGSHARE_PRIVATE_KEY}"
+    print(f"⬇️ Downloading model weights ({wt_name})...")
+    if download_file(url, wt_path, "weights"):
+        print(f"✅ Weights saved to {wt_path}")
+        return wt_path
+    # Try Figshare API
+    try:
+        api_url = f"https://api.figshare.com/v2/articles/28679143/files"
+        r = requests.get(api_url, timeout=10)
+        if r.ok:
+            for f in r.json():
+                if "epoch" in f["name"] and f["name"].endswith(".pt"):
+                    if download_file(f["download_url"], wt_path, "weights"):
+                        return wt_path
+    except:
+        pass
+    return None
+def ensure_ground_truth(root: str) -> bool:
+    """Ensure ground truth data is available."""
+    gt_dir = os.path.join(root, "ground_truth")
+    os.makedirs(gt_dir, exist_ok=True)
+    if len(list(Path(gt_dir).rglob("*.pt"))) > 0:
+        return True
+    # Try downloading
+    zip_path = os.path.join(root, "ground_truth.zip")
+    fid = FIGSHARE_FILES["ground_truth.zip"]
+    url = f"https://figshare.com/ndownloader/files/{fid}?private_link={FIGSHARE_PRIVATE_KEY}"
+    print(f"⬇️ Downloading ground truth (958 MB)...")
+    if download_file(url, zip_path, "ground_truth"):
+        print("📦 Extracting...")
+        with zipfile.ZipFile(zip_path) as z:
+            z.extractall(root)
+        os.remove(zip_path)
+        return True
+    return False
+def check_data_status(root: str) -> dict:
+    """Check what data is available."""
+    gt_dir = os.path.join(root, "ground_truth")
+    weights_dir = os.path.join(root, "weights")
+    pred_dir = os.path.join(root, "predictions")
+    n_gt = len(list(Path(gt_dir).rglob("*.pt"))) if os.path.isdir(gt_dir) else 0
+    n_weights = 0
+    if os.path.isdir(weights_dir):
+        n_weights = len([f for f in os.listdir(weights_dir) if f.endswith(".pt")])
+    n_pred = 0
+    if os.path.isdir(pred_dir):
+        for d in os.listdir(pred_dir):
+            dp = os.path.join(pred_dir, d)
+            if os.path.isdir(dp):
+                n_pred = len([f for f in os.listdir(dp) if f.endswith("_mode_0.txt")])
+                break
+    return {
+        "ground_truth": n_gt,
+        "weights": n_weights,
+        "predictions": n_pred,
+        "has_weights": n_weights > 0,
+        "has_gt": n_gt > 0,
+        "has_predictions": n_pred > 0,
+    }

app/utils/inference.py ADDED Viewed

	@@ -0,0 +1,98 @@

+"""PETIMOT inference utilities for custom proteins."""
+import os, sys, torch
+import numpy as np
+from pathlib import Path
+# Ensure PETIMOT is importable
+PETIMOT_ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+if PETIMOT_ROOT not in sys.path:
+    sys.path.insert(0, PETIMOT_ROOT)
+EMBEDDING_DIM_MAP = {"prostt5": 1024, "esmc_300m": 960, "esmc_600m": 1152}
+def run_inference(pdb_path: str, weights_path: str, config_path: str = None,
+                  output_dir: str = "/tmp/petimot_pred") -> dict:
+    """Run PETIMOT inference on a single PDB file.
+    Args:
+        pdb_path: Path to input PDB file
+        weights_path: Path to model weights .pt
+        config_path: Path to config YAML (default: configs/default.yaml)
+        output_dir: Where to save predictions
+    Returns:
+        dict with modes, ca_coords, seq, etc.
+    """
+    from petimot.infer.infer import infer
+    from petimot.data.pdb_utils import load_backbone_coordinates
+    if config_path is None:
+        config_path = os.path.join(PETIMOT_ROOT, "configs", "default.yaml")
+    os.makedirs(output_dir, exist_ok=True)
+    # Run inference
+    infer(model_path=weights_path, config_file=config_path,
+          input_list=[pdb_path], output_path=output_dir)
+    # Collect results
+    stem = os.path.splitext(os.path.basename(weights_path))[0]
+    pred_subdir = os.path.join(output_dir, stem)
+    basename = os.path.splitext(os.path.basename(pdb_path))[0]
+    # Load structure
+    bb_data = load_backbone_coordinates(pdb_path, allow_hetatm=True)
+    ca = bb_data["bb"][:, 1].numpy()
+    seq = bb_data.get("seq", "X" * len(ca))
+    if not isinstance(seq, str):
+        seq = "X" * len(ca)
+    # Load predicted modes
+    modes = {}
+    for k in range(10):
+        for pfx in [f"extracted_{basename}", basename]:
+            mf = os.path.join(pred_subdir, f"{pfx}_mode_{k}.txt")
+            if os.path.exists(mf):
+                modes[k] = np.loadtxt(mf)
+                break
+    with open(pdb_path) as f:
+        pdb_text = f.read()
+    return {
+        "name": basename,
+        "ca_coords": ca,
+        "seq": seq,
+        "modes": modes,
+        "pdb_text": pdb_text,
+        "pred_dir": pred_subdir,
+        "n_res": len(ca),
+    }
+def download_pdb(pdb_id: str, output_dir: str = "/tmp/petimot_pdbs") -> str | None:
+    """Download PDB from RCSB."""
+    import requests
+    os.makedirs(output_dir, exist_ok=True)
+    code4 = pdb_id[:4].lower()
+    chain = pdb_id[4:].upper() if len(pdb_id) > 4 else ""
+    out_path = os.path.join(output_dir, f"{pdb_id}.pdb")
+    if os.path.exists(out_path):
+        return out_path
+    r = requests.get(f"https://files.rcsb.org/download/{code4}.pdb", timeout=30)
+    if not r.ok:
+        return None
+    lines = r.text.split("\n")
+    if chain:
+        lines = [l for l in lines
+                 if (l.startswith("ATOM") and len(l) > 21 and l[21] == chain)
+                 or not l.startswith(("ATOM", "HETATM"))]
+    with open(out_path, "w") as f:
+        f.write("\n".join(lines))
+    return out_path

claude.md ADDED Viewed

	@@ -0,0 +1,203 @@

+# PETIMOT — Development State
+> Last updated: 2026-03-20
+> Maintainer: Valentin Lombard (valou82160@gmail.com)
+## Project Overview
+**PETIMOT** = SE(3)-equivariant GNN for Protein Motion prediction from sparse data.
+- Paper: [arXiv 2504.02839](https://arxiv.org/abs/2504.02839) — Lombard, Grudinin & Laine
+- Public repo: `PhyloSofS-Team/PETIMOT`
+- Private repos: `Vlmbd/petimot_temp`, `Vlmbd/petimot` (contain training code, identical to each other)
+## Architecture
+- **Model**: `ProteinMotionMPNN` in `petimot/model/neural_net.py`
+  - SE(3)-equivariant message passing neural network
+  - Input: PLM embeddings (ProstT5 1024d, or ESM 960d/1152d) + KNN graph
+  - Output: K motion modes as (N, 3) displacement vectors per protein
+  - 15 independent layers, ~4.7M params (default config)
+- **Loss functions** in `petimot/model/loss.py`:
+  - `compute_nsse_loss`: Normalized SSE with Hungarian assignment (per-mode matching)
+  - `compute_rmsip_loss`: Root Mean Square Inner Product (subspace similarity)
+  - `compute_ortho_loss`: Independence Score (orthogonality regularizer)
+  - Default weights: 0.5×NSSE + 0.5×RMSIP + 0.0×IS
+- **Data** in `petimot/data/`:
+  - `BaseDataset` / `InferenceDataset` in `data_set.py` (public repo)
+  - `TrainingDataset` in private repos — extends BaseDataset with ground truth `.pt` loading
+  - Ground truth `.pt` format: `{eigvects: (3N,K), eigvals: (K,), bb: (N,4,3), seq: str, coverage: (N,)}`
+  - Embeddings: cached as `{name}_{model}.pt` files in `embeddings/` directory
+## Repository Structure
+```
+PETIMOT/
+├── configs/default.yaml          # Default hyperparameters
+├── petimot/
+│   ├── __main__.py              # CLI: infer, evaluate (+ train in private repo)
+│   ├── model/
+│   │   ├── neural_net.py        # ProteinMotionMPNN
+│   │   ├── loss.py              # NSSE, RMSIP, IS losses
+│   │   └── optimizer.py         # get_optimizer factory
+│   ├── data/
+│   │   ├── data_set.py          # BaseDataset, InferenceDataset
+│   │   ├── embeddings.py        # EmbeddingManager (ProstT5/ESM)
+│   │   └── pdb_utils.py         # load_backbone_coordinates
+│   ├── infer/infer.py           # Inference pipeline
+│   ├── eval/eval.py             # Evaluation with metrics
+│   └── utils/
+│       ├── seeding.py           # set_seed (reproducibility)
+│       └── rigid_utils.py       # Quaternion/rotation utilities
+├── full_train_list.txt          # ~26K training samples
+├── full_val_list.txt            # ~5K validation samples
+├── eval_list.txt                # Test set
+├── split_script.py              # Family-based split generation
+├── PETIMOT_workflow.ipynb       # ⭐ Main deliverable — Colab notebook
+└── weights/                     # Pretrained models
+```
+## Colab Notebook (`PETIMOT_workflow.ipynb`)
+### Current State: ✅ Functional, iterating on polish
+The notebook is the primary deliverable. It's built via Python patch scripts in `/tmp/` and contains 37 cells across 9 sections:
+| # | Section | Status | Notes |
+|---|---------|--------|-------|
+| 0 | Setup | ✅ | Install (no torch reinstall), GPU check, Drive mount, WandB |
+| 1 | Data | ✅ | Figshare manual download + auto-extract, rich dataset stats (6 panels) |
+| 2 | Training | ✅ | Full loop with gradient norms, ETA, best/last checkpoints |
+| 3 | Monitoring | ✅ | Plotly dashboard, per-sample validation analysis |
+| 4 | Inference | ✅ | Single PDB, batch, upload, auto-detect weights |
+| 5 | Visualization | ✅ | 5-panel analysis dashboard, 3D py3Dmol, animation, mode grid |
+| 6 | Trajectory Export | ✅ | Multi-model PDB for PyMOL/ChimeraX |
+| 7 | Evaluation | ✅ | Test set eval, CSV export, baseline comparison |
+| 8 | Ablations | ✅ | 10 config presets, comparison plots |
+### Known Issues & Workarounds
+1. **Figshare download**: Private link URLs don't support programmatic download (wget/curl/requests all fail). Solution: manual browser download + Colab upload. Cell 1.1 auto-detects and extracts uploads.
+2. **numpy binary incompatibility**: Installing packages can break numpy. Solution: after cell 0.1, do Runtime → Restart session, then skip to 0.2.
+3. **`torch.linalg.eigh` cusolver error**: Happens with AMP (FP16) in `rigid_utils.py` quaternion computation. Solution: cell 2.4 sets `torch.backends.cuda.preferred_linalg_library("magma")`.
+4. **Split file mismatch**: Config references `val_list.txt` but repo has `full_val_list.txt`. Cell 2.2 auto-detects the correct files.
+5. **ProstT5 embedding computation**: Takes ~10min on A100, longer on T4. Embeddings are cached in `embeddings/` (symlinked to Drive if mounted). First run is slow, subsequent runs are fast.
+6. **Batch size**: Default is 16 (too small for large GPUs). On RTX PRO 6000 (96GB), use 128-256.
+### How the Notebook is Built
+The notebook is modified via Python scripts that:
+1. Load the `.ipynb` JSON
+2. Find cells by title (e.g., `'5.1' in line`)
+3. Replace source with new code, ensuring proper `\n` line terminators
+4. Save back to disk
+**Critical**: Each line in `source` array MUST end with `\n` (except the last line) for Colab to execute it. This was a major early bug.
+Example patch script pattern:
+```python
+import json
+with open('PETIMOT_workflow.ipynb') as f:
+    nb = json.load(f)
+for i, c in enumerate(nb['cells']):
+    if any('CELL_ID' in l for l in c['source']):
+        lines = new_src.split("\n")
+        nb['cells'][i]['source'] = [l + "\n" for l in lines[:-1]] + [lines[-1]]
+        break
+json.dump(nb, open('PETIMOT_workflow.ipynb', 'w'), indent=1)
+```
+## Training Details (from private repo)
+- **TrainingDataset** (cell 2.1):
+  - Loads `.pt` ground truth files listed in a text file
+  - Eigenvectors reshaped: `(3N, K)` → `(N, 3, K)` → `(N, K, 3)`, scaled by `√N`
+  - Multiplicative Gaussian noise on eigvects + embeddings for augmentation
+  - Random embeddings option for ablation (`rand_emb=True`)
+- **process_epoch** (cell 2.3):
+  - `set_grad_enabled(training)` per loss component
+  - AMP with GradScaler, gradient clipping at 10
+  - `optimizer.zero_grad(set_to_none=True)` for speed
+  - Tracks: NSSE, min_NSSE, RMSIP, ortho, success rate, gradient norms
+- **train_petimot** (cell 2.3):
+  - AdamW + ReduceLROnPlateau (factor=0.5, patience from config)
+  - Saves `best.pt` + `last.pt` in `weights/{run_name}/`
+  - Auto-loads best model after training
+  - Optional WandB logging, optional resume from checkpoint
+## Data Sources
+- **Figshare** (private link): https://figshare.com/s/ab400d852b4669a83b64
+  - `ground_truth.zip` (958 MB, ~36K `.pt` files)
+  - `default_2025-02-07_21-54-02_epoch_33.pt` (18 MB, pretrained weights)
+  - `baseline_predictions.zip` (23 MB — AlphaFlow, ESMFlow, NMA)
+  - File IDs: ground_truth=52349453, weights=52349456, baselines=52349480
+- **Local** (user's machine):
+  - `/Users/valentin/Documents/Petimot/` — public repo clone
+  - `/Users/valentin/Documents/petimot_private/` — private repo clone
+  - `/Users/valentin/Documents/petimot_temp/` — private repo (has weights .pt files)
+## Target Audience
+The notebook is designed for **ML/bioinformatics professors and researchers**. Key design decisions:
+- Rich, publication-quality visualizations (not toy demos)
+- Extensive inline comments and docstrings with tensor shapes
+- Interactive Colab form controls for parameters
+- Auto-detect/auto-extract for user-friendly data setup
+- All 8 sections are independent after setup (can run inference without training)
+## Streamlit App (`app/`)
+### Current State: ✅ Built, needs testing
+Interactive web explorer for PETIMOT predictions. Replaces notebook sections 5-8.
+```
+app/
+├── app.py                         # Main entry + sidebar + dark theme
+├── requirements.txt               # Dependencies
+├── pages/
+│   ├── 1_🔍_Explorer.py          # Browse pre-computed DB (search, filter, 3D, sequence)
+│   ├── 2_🔮_Inference.py         # Upload PDB → predict → visualize + export
+│   └── 3_📊_Statistics.py        # Dataset-wide distributions + eigenvalue analysis
+├── components/
+│   ├── viewer_3d.py              # py3Dmol with arrows, labels, mode comparison grid
+│   ├── sequence_viewer.py        # HTML sequence heatmap + Plotly displacement chart
+│   └── mode_panel.py             # Mode tabs, correlation matrix, eigenvalue spectrum
+└── utils/
+    ├── data_loader.py            # Load predictions index, modes, ground truth
+    └── inference.py              # PETIMOT inference wrapper + RCSB PDB download
+```
+**Run locally:** `cd PETIMOT && streamlit run app/app.py`
+**Key features:**
+- Dark theme with purple accent palette
+- Real-time sliders for amplitude, arrow size, color scheme
+- Searchable/sortable protein table with click-to-detail
+- HTML sequence viewer with displacement-colored cells + hover tooltips
+- Plotly charts (displacement profiles, eigenvalue spectrum, correlations)
+- CSV + PDB export for inference results
+- PDB ID fetch from RCSB or file upload
+**Dependencies:** stmol (py3Dmol for Streamlit), plotly, streamlit
+**Needs:** Pre-computed predictions in `predictions/` directory for Explorer page to work.
+## Dependencies
+```
+torch>=2.0.0, torch_geometric, torch_scatter, torch_sparse
+transformers==4.48.3, sentencepiece==0.2.0
+scipy, typer==0.15.1, tqdm, numpy
+wandb, plotly, py3Dmol, biopython, pandas, ipywidgets, gdown, requests
+```

requirements.txt CHANGED Viewed

@@ -1,9 +1,18 @@
 torch>=2.0.0
 torch_geometric>=2.0.0
-wandb>=0.19.0
-transformers==4.48.3
-sentencepiece==0.2.0
 tqdm>=4.65.0
-scipy>=1.13.0
-typer==0.15.1
-numpy<2

+# PETIMOT Explorer — HuggingFace Spaces
+streamlit>=1.30.0
+stmol>=0.0.9
+py3Dmol>=2.0.0
+plotly>=5.18.0
+pandas>=2.0.0
+numpy>=1.24.0
 torch>=2.0.0
 torch_geometric>=2.0.0
+torch_scatter
+torch_sparse
+transformers>=4.30.0
+sentencepiece>=0.1.99
+scipy>=1.10.0
+biopython>=1.80
+requests>=2.28.0
 tqdm>=4.65.0
+typer>=0.9.0