Spaces:

lighteternal
/

BioAssayAlign-Compatibility-Explorer

Sleeping

App Files Files Community

lighteternal commited on Mar 9

Commit

f1158c7

verified ·

1 Parent(s): 2ad7575

Polish UX, examples, and result explainability

Browse files

Files changed (5) hide show

README.md +35 -9
__pycache__/app.cpython-310.pyc +0 -0
__pycache__/space_runtime.cpython-310.pyc +0 -0
app.py +209 -53
space_runtime.py +57 -19

README.md CHANGED Viewed

@@ -1,20 +1,20 @@
 ---
 title: BioAssayAlign Compatibility Explorer
 emoji: 🧪
-colorFrom: blue
-colorTo: gray
 sdk: gradio
 sdk_version: 6.9.0
 python_version: "3.10"
 app_file: app.py
 pinned: false
 license: mit
-short_description: Rank candidate molecules for a bioassay.
 ---
 # BioAssayAlign Compatibility Explorer
-This Space is a scientist-facing demo for **assay-conditioned compound ranking**.
 You provide:
 - a bioassay description and optional metadata
@@ -27,7 +27,7 @@ The model returns:
 ## What It Is
-This is not a chatbot and it is not a potency predictor.
 It is a **ranking model** trained on a frozen public bioassay dataset built from PubChem BioAssay and ChEMBL. It is designed to answer:
@@ -35,9 +35,27 @@ It is a **ranking model** trained on a frozen public bioassay dataset built from
 ## What The Score Means
-- Higher score = the model believes the molecule is more compatible with the assay than lower-ranked candidates in the same list.
-- The score is **not** a probability.
-- The score is best used for **ranking**, not absolute decision thresholds.
 ## Recommended Input Style
@@ -58,6 +76,14 @@ You can paste SMILES directly or upload a CSV with a `smiles` or `canonical_smil
 - triaging compounds before a more expensive downstream model or wet-lab step
 - testing how sensitive rankings are to assay wording and metadata
 ## Limits
 - This is a public-data model, not a medicinal chemistry oracle.
@@ -66,7 +92,7 @@ You can paste SMILES directly or upload a CSV with a `smiles` or `canonical_smil
 ## Runtime Notes
-- The first request can be slower because the Space has to load the model.
 - Large candidate lists increase runtime. For interactive use, start with a few hundred molecules.
 ## Model

 ---
 title: BioAssayAlign Compatibility Explorer
 emoji: 🧪
+colorFrom: green
+colorTo: red
 sdk: gradio
 sdk_version: 6.9.0
 python_version: "3.10"
 app_file: app.py
 pinned: false
 license: mit
+short_description: Rank a candidate molecule list against a bioassay.
 ---
 # BioAssayAlign Compatibility Explorer
+BioAssayAlign is an **assay-conditioned molecule ranking** tool.
 You provide:
 - a bioassay description and optional metadata
 ## What It Is
+This is not a chatbot. It is not a potency predictor.
 It is a **ranking model** trained on a frozen public bioassay dataset built from PubChem BioAssay and ChEMBL. It is designed to answer:
 ## What The Score Means
+- The app shows a **priority band** and a **list-relative score** first.
+- Those values explain the ranking better than the raw model score.
+- The raw score is **not** a probability. Use it only for debugging.
+- The strongest molecule in your submitted list will be near the top of the `0–100` relative scale.
+## How To Use It
+1. Enter the assay title and description in plain scientific language.
+2. Add metadata if you know it:
+   - organism
+   - readout
+   - assay format
+   - assay type
+   - target UniProt ID
+3. Paste one SMILES per line or upload a CSV with a `smiles` column.
+4. Run ranking.
+5. Read the output in this order:
+   - `priority`
+   - `relative score`
+   - chemistry context columns (`MolWt`, `logP`, `TPSA`)
+   - raw model score only if needed
 ## Recommended Input Style
 - triaging compounds before a more expensive downstream model or wet-lab step
 - testing how sensitive rankings are to assay wording and metadata
+## Example Assays Included In The UI
+- BTK binding sanity check
+- JAK2 cell assay
+- ALDH1A1 fluorescence assay
+These examples call the live model. They are not screenshots or mocked outputs.
 ## Limits
 - This is a public-data model, not a medicinal chemistry oracle.
 ## Runtime Notes
+- The first request can be slower because the Space warms the model in the background.
 - Large candidate lists increase runtime. For interactive use, start with a few hundred molecules.
 ## Model

__pycache__/app.cpython-310.pyc ADDED Viewed

Binary file (16.9 kB). View file

__pycache__/space_runtime.cpython-310.pyc ADDED Viewed

Binary file (21.3 kB). View file

app.py CHANGED Viewed

@@ -3,46 +3,59 @@ from __future__ import annotations
 import csv
 import os
 import tempfile
 from pathlib import Path
 from typing import Any
 import gradio as gr
 import pandas as pd
-from space_runtime import AssayQuery, load_compatibility_model_from_hub, rank_compounds, serialize_assay_query
 MODEL_REPO_ID = os.getenv("MODEL_REPO_ID", "lighteternal/BioAssayAlign-Qwen3-Embedding-0.6B-Compatibility")
 MAX_INPUT_SMILES = int(os.getenv("MAX_INPUT_SMILES", "3000"))
 DEFAULT_TOP_K = int(os.getenv("DEFAULT_TOP_K", "50"))
 CSS = """
-@import url('https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Source+Serif+4:wght@500;600;700&display=swap');
 :root {
-  --paper: #f4efe6;
-  --ink: #122033;
-  --ink-soft: #4f6073;
-  --accent: #0f5fd7;
-  --accent-soft: #d9e8ff;
-  --line: #c9d1db;
   --warning: #8a4b0f;
   --good: #0e6b48;
 }
 .gradio-container {
   font-family: "IBM Plex Sans", sans-serif;
   background:
-    radial-gradient(circle at top right, rgba(15,95,215,0.08), transparent 24rem),
-    linear-gradient(180deg, #faf7f0 0%, var(--paper) 100%);
   color: var(--ink);
 }
 #hero {
   border: 1px solid var(--line);
-  background: linear-gradient(135deg, rgba(255,255,255,0.9), rgba(239,245,255,0.92));
-  border-radius: 24px;
-  padding: 1.25rem 1.4rem;
-  box-shadow: 0 20px 40px rgba(18,32,51,0.08);
 }
 .eyebrow {
@@ -50,12 +63,12 @@ CSS = """
   font-size: 0.78rem;
   letter-spacing: 0.08em;
   text-transform: uppercase;
-  color: var(--accent);
 }
 .hero-title {
-  font-family: "Source Serif 4", serif;
-  font-size: 2.2rem;
   line-height: 1.05;
   margin: 0.2rem 0 0.5rem 0;
 }
@@ -68,7 +81,7 @@ CSS = """
 .panel-note {
   border-left: 4px solid var(--accent);
-  background: rgba(15,95,215,0.06);
   padding: 0.9rem 1rem;
   border-radius: 12px;
 }
@@ -81,7 +94,7 @@ CSS = """
 .metric-card {
   border: 1px solid var(--line);
-  background: rgba(255,255,255,0.75);
   padding: 0.8rem 0.9rem;
   border-radius: 16px;
 }
@@ -91,10 +104,28 @@ CSS = """
   font-size: 1.15rem;
   margin-top: 0.15rem;
 }
 """
 EXAMPLES = {
-    "BTK binding": {
         "title": "BTK kinase inhibitor binding assay",
         "description": "In vitro kinase-domain binding assay for Bruton's tyrosine kinase inhibitor ranking.",
         "organism": "Homo sapiens",
@@ -105,13 +136,28 @@ EXAMPLES = {
         "smiles": "\n".join(
             [
                 "CC1=NC(=O)N(C)C(=O)N1",
-                "CCOc1ccc2nc(N3CCN(C)CC3)n(C)c(=O)c2c1",
-                "CC(=O)Nc1ncc(C#N)c(Nc2ccc(F)c(Cl)c2)n1",
                 "c1ccccc1",
                 "CCO",
             ]
         ),
     },
     "ALDH1A1 fluorescence": {
         "title": "ALDH1A1 inhibition assay",
         "description": "Cell-based fluorescence assay measuring ALDH1A1 inhibition in human cells.",
@@ -122,10 +168,9 @@ EXAMPLES = {
         "target_uniprot": "P00352",
         "smiles": "\n".join(
             [
                 "CC1=CC(=O)N(C)C(=O)N1",
-                "COC1=CC=C(C=C1)C(=O)O",
                 "CCN(CC)CCOC1=CC=CC=C1",
-                "CCOC1=CC=CC=C1",
                 "CCO",
             ]
         ),
@@ -176,21 +221,78 @@ def _load_model():
     return load_compatibility_model_from_hub(MODEL_REPO_ID)
 def _build_summary(query_text: str, valid_rows: list[dict[str, Any]], invalid_rows: list[dict[str, Any]], warning: str | None) -> str:
     best = valid_rows[0] if valid_rows else None
     chunks = [
-        "### Run Summary",
         f"- Model repo: `{MODEL_REPO_ID}`",
-        f"- Assay prompt length: `{len(query_text.split())}` tokens-equivalent words",
         f"- Valid molecules ranked: `{len(valid_rows)}`",
         f"- Invalid molecules rejected: `{len(invalid_rows)}`",
     ]
     if best is not None:
-        chunks.append(f"- Top hit: `{best['canonical_smiles']}` with score `{best['score']:.3f}`")
     if warning:
         chunks.append(f"- Warning: {warning}")
     chunks.append("")
-    chunks.append("Higher scores mean the model ranks the molecule as more compatible with this assay than lower-scored candidates in the same list. Scores are ranking signals, not calibrated probabilities.")
     return "\n".join(chunks)
@@ -199,17 +301,40 @@ def _results_to_csv(valid_rows: list[dict[str, Any]], invalid_rows: list[dict[st
     if not rows:
         return None
     handle = tempfile.NamedTemporaryFile("w", suffix=".csv", delete=False, newline="")
-    writer = csv.DictWriter(handle, fieldnames=["rank", "input_smiles", "canonical_smiles", "smiles_hash", "score", "valid", "error"])
     writer.writeheader()
     rank = 1
     for row in valid_rows:
         writer.writerow(
             {
                 "rank": rank,
                 "input_smiles": row["input_smiles"],
                 "canonical_smiles": row["canonical_smiles"],
                 "smiles_hash": row["smiles_hash"],
-                "score": row["score"],
                 "valid": True,
                 "error": "",
             }
@@ -222,7 +347,11 @@ def _results_to_csv(valid_rows: list[dict[str, Any]], invalid_rows: list[dict[st
                 "input_smiles": row["input_smiles"],
                 "canonical_smiles": "",
                 "smiles_hash": "",
-                "score": "",
                 "valid": False,
                 "error": row.get("error", "invalid_smiles"),
             }
@@ -260,14 +389,19 @@ def run_ranking(
     ranked = rank_compounds(model, assay_text=assay_text, smiles_list=smiles_values, top_k=top_k or None)
     valid_rows = [row for row in ranked if row["valid"]]
     invalid_rows = [row for row in ranked if not row["valid"]]
     display_rows = [
         {
             "rank": idx + 1,
-            "input_smiles": row["input_smiles"],
             "canonical_smiles": row["canonical_smiles"],
-            "smiles_hash": row["smiles_hash"],
-            "score": round(float(row["score"]), 4),
         }
         for idx, row in enumerate(valid_rows)
     ]
@@ -294,7 +428,7 @@ def load_example(example_name: str):
     )
-with gr.Blocks(title="BioAssayAlign Compatibility Explorer") as demo:
     gr.Markdown(
         """
 <style>
@@ -303,11 +437,11 @@ with gr.Blocks(title="BioAssayAlign Compatibility Explorer") as demo:
         + """
 </style>
 <div id="hero">
-  <div class="eyebrow">BioAssayAlign · scientist-facing ranking demo</div>
-  <div class="hero-title">Rank candidate molecules for a bioassay</div>
   <div class="hero-copy">
-    Build an assay query from structured fields, paste or upload a candidate molecule list, and get a ranked output from the current BioAssayAlign compatibility model.
-    This app is designed for triage and prioritization, not for direct potency claims.
   </div>
 </div>
 """
@@ -318,7 +452,7 @@ with gr.Blocks(title="BioAssayAlign Compatibility Explorer") as demo:
             gr.Markdown(
                 """
 <div class="panel-note">
-Use the structured fields if you have them. Missing fields are allowed, but species, readout, and target metadata usually help.
 </div>
 """
             )
@@ -327,17 +461,28 @@ Use the structured fields if you have them. Missing fields are allowed, but spec
                 f"""
 <div class="metric-strip">
   <div class="metric-card"><span>Default model</span><strong>{MODEL_REPO_ID}</strong></div>
-  <div class="metric-card"><span>Expected use</span><strong>ranking, not probability</strong></div>
-  <div class="metric-card"><span>Interactive cap</span><strong>{MAX_INPUT_SMILES} SMILES</strong></div>
 </div>
 """
             )
     with gr.Tab("Rank Compounds"):
         with gr.Row():
             with gr.Column(scale=6):
-                example_name = gr.Dropdown(choices=list(EXAMPLES.keys()), value="BTK binding", label="Load an example")
                 load_example_btn = gr.Button("Load Example", variant="secondary")
                 assay_title = gr.Textbox(label="Assay title")
                 description = gr.Textbox(label="Description", lines=6, placeholder="Describe the assay in practical lab language.")
                 with gr.Row():
@@ -352,17 +497,17 @@ Use the structured fields if you have them. Missing fields are allowed, but spec
                 smiles_text = gr.Textbox(
                     label="Candidate SMILES",
                     lines=14,
-                    placeholder="Paste one SMILES per line. CSV upload is optional and will be merged.",
                 )
                 upload_file = gr.File(label="Upload CSV / TXT / SMI", file_count="single", file_types=[".csv", ".txt", ".smi", ".smiles"])
                 top_k = gr.Slider(label="Top-K rows to display", minimum=5, maximum=200, step=5, value=DEFAULT_TOP_K)
-                run_btn = gr.Button("Rank Molecules", variant="primary")
                 clear_btn = gr.ClearButton(value="Clear", components=[assay_title, description, organism, readout, assay_format, assay_type, target_uniprot, smiles_text, upload_file])
         summary = gr.Markdown()
         with gr.Accordion("Serialized assay text used by the model", open=False):
             assay_preview = gr.Textbox(lines=12, label="Model-facing assay text")
-        ranked_df = gr.Dataframe(label="Ranked molecules", interactive=False, wrap=True)
         invalid_df = gr.Dataframe(label="Rejected inputs", interactive=False, wrap=True)
         download_file = gr.File(label="Download CSV")
@@ -380,24 +525,30 @@ Use the structured fields if you have them. Missing fields are allowed, but spec
     with gr.Tab("How To Use This"):
         gr.Markdown(
             """
-### Recommended workflow
 1. Describe the assay in plain scientific language.
 2. Add metadata if you know it: organism, readout, format, assay type, target UniProt.
 3. Paste a candidate list or upload a CSV with a `smiles` column.
-4. Rank the list and inspect the top molecules first.
-### What the score means
-- The score is a ranking signal.
-- Higher means “more compatible than the other molecules in this submitted list”.
-- It is **not** a calibrated activity probability and it is **not** an IC50 prediction.
 ### Good input habits
 - Prefer parent, neutralized, chemically sensible SMILES.
 - Keep assay descriptions concrete.
 - If the assay is target-defined, add the UniProt ID.
 ### What this Space is not
@@ -409,4 +560,9 @@ Use the structured fields if you have them. Missing fields are allowed, but spec
 if __name__ == "__main__":
-    demo.queue(default_concurrency_limit=4).launch(show_error=True)

 import csv
 import os
 import tempfile
+import threading
 from pathlib import Path
 from typing import Any
 import gradio as gr
+import numpy as np
 import pandas as pd
+from space_runtime import (
+    AssayQuery,
+    load_compatibility_model_from_hub,
+    molecule_ui_metrics,
+    rank_compounds,
+    serialize_assay_query,
+)
 MODEL_REPO_ID = os.getenv("MODEL_REPO_ID", "lighteternal/BioAssayAlign-Qwen3-Embedding-0.6B-Compatibility")
 MAX_INPUT_SMILES = int(os.getenv("MAX_INPUT_SMILES", "3000"))
 DEFAULT_TOP_K = int(os.getenv("DEFAULT_TOP_K", "50"))
 CSS = """
+@import url('https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Fraunces:opsz,wght@9..144,600;9..144,700&display=swap');
 :root {
+  --paper: #f4efe4;
+  --ink: #132128;
+  --ink-soft: #56656e;
+  --accent: #135a52;
+  --accent-soft: #d9ece8;
+  --accent-warm: #ab5936;
+  --line: #c8cfc7;
   --warning: #8a4b0f;
   --good: #0e6b48;
+  --card: rgba(255,255,255,0.82);
 }
 .gradio-container {
   font-family: "IBM Plex Sans", sans-serif;
   background:
+    radial-gradient(circle at top right, rgba(19,90,82,0.12), transparent 24rem),
+    radial-gradient(circle at bottom left, rgba(171,89,54,0.10), transparent 22rem),
+    linear-gradient(180deg, #faf7ef 0%, var(--paper) 100%);
   color: var(--ink);
 }
 #hero {
   border: 1px solid var(--line);
+  background:
+    linear-gradient(135deg, rgba(255,255,255,0.95), rgba(240,246,244,0.90)),
+    linear-gradient(90deg, rgba(19,90,82,0.03), rgba(171,89,54,0.03));
+  border-radius: 28px;
+  padding: 1.35rem 1.5rem;
+  box-shadow: 0 24px 50px rgba(19,33,40,0.08);
 }
 .eyebrow {
   font-size: 0.78rem;
   letter-spacing: 0.08em;
   text-transform: uppercase;
+  color: var(--accent-warm);
 }
 .hero-title {
+  font-family: "Fraunces", serif;
+  font-size: 2.35rem;
   line-height: 1.05;
   margin: 0.2rem 0 0.5rem 0;
 }
 .panel-note {
   border-left: 4px solid var(--accent);
+  background: rgba(19,90,82,0.06);
   padding: 0.9rem 1rem;
   border-radius: 12px;
 }
 .metric-card {
   border: 1px solid var(--line);
+  background: var(--card);
   padding: 0.8rem 0.9rem;
   border-radius: 16px;
 }
   font-size: 1.15rem;
   margin-top: 0.15rem;
 }
+.guide-grid {
+  display: grid;
+  grid-template-columns: repeat(3, minmax(0, 1fr));
+  gap: 0.8rem;
+}
+.guide-card {
+  border: 1px solid var(--line);
+  background: var(--card);
+  padding: 0.9rem 1rem;
+  border-radius: 16px;
+}
+.guide-card strong {
+  display: block;
+  margin-bottom: 0.2rem;
+}
 """
 EXAMPLES = {
+    "BTK binding sanity check": {
         "title": "BTK kinase inhibitor binding assay",
         "description": "In vitro kinase-domain binding assay for Bruton's tyrosine kinase inhibitor ranking.",
         "organism": "Homo sapiens",
         "smiles": "\n".join(
             [
                 "CC1=NC(=O)N(C)C(=O)N1",
                 "c1ccccc1",
                 "CCO",
             ]
         ),
     },
+    "JAK2 cell assay": {
+        "title": "JAK2 inhibition assay",
+        "description": "Cell-based luminescence assay measuring JAK2 inhibition in HEK293 cells.",
+        "organism": "Homo sapiens",
+        "readout": "luminescence",
+        "assay_format": "cell-based",
+        "assay_type": "inhibition",
+        "target_uniprot": "O60674",
+        "smiles": "\n".join(
+            [
+                "CC1=CC(=O)N(C)C(=O)N1",
+                "CC(=O)Nc1ncc(C#N)c(Nc2ccc(F)c(Cl)c2)n1",
+                "CCOc1ccc2nc(N3CCN(C)CC3)n(C)c(=O)c2c1",
+                "CCO",
+            ]
+        ),
+    },
     "ALDH1A1 fluorescence": {
         "title": "ALDH1A1 inhibition assay",
         "description": "Cell-based fluorescence assay measuring ALDH1A1 inhibition in human cells.",
         "target_uniprot": "P00352",
         "smiles": "\n".join(
             [
+                "CCOC1=CC=CC=C1",
                 "CC1=CC(=O)N(C)C(=O)N1",
                 "CCN(CC)CCOC1=CC=CC=C1",
                 "CCO",
             ]
         ),
     return load_compatibility_model_from_hub(MODEL_REPO_ID)
+def _warm_model_background() -> None:
+    try:
+        _load_model()
+    except Exception:
+        # Keep the app usable even if warmup fails; the request path will raise the real error.
+        return
+def _priority_band(relative_score: float, rank: int, total: int) -> str:
+    if total <= 3:
+        return "Screen first" if rank == 1 else ("Worth a look" if rank == 2 else "Low priority")
+    if relative_score >= 85:
+        return "Screen first"
+    if relative_score >= 60:
+        return "Worth a look"
+    if relative_score >= 35:
+        return "Middle pack"
+    return "Low priority"
+def _decorate_valid_rows(valid_rows: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    if not valid_rows:
+        return []
+    scores = np.array([float(row["score"]) for row in valid_rows], dtype=np.float32)
+    minimum = float(scores.min())
+    maximum = float(scores.max())
+    spread = maximum - minimum
+    decorated: list[dict[str, Any]] = []
+    for idx, row in enumerate(valid_rows):
+        score = float(row["score"])
+        relative_score = 100.0 if spread <= 1e-8 and idx == 0 else (50.0 if spread <= 1e-8 else 100.0 * (score - minimum) / spread)
+        metrics = molecule_ui_metrics(row["canonical_smiles"])
+        decorated.append(
+            {
+                **row,
+                "relative_score": round(relative_score, 1),
+                "priority_band": _priority_band(relative_score, idx + 1, len(valid_rows)),
+                "mol_wt": round(float(metrics["mol_wt"]), 1),
+                "logp": round(float(metrics["logp"]), 2),
+                "tpsa": round(float(metrics["tpsa"]), 1),
+                "heavy_atoms": int(metrics["heavy_atoms"]),
+            }
+        )
+    return decorated
 def _build_summary(query_text: str, valid_rows: list[dict[str, Any]], invalid_rows: list[dict[str, Any]], warning: str | None) -> str:
     best = valid_rows[0] if valid_rows else None
+    score_range = None
+    if valid_rows:
+        raw_scores = [float(row["score"]) for row in valid_rows]
+        score_range = max(raw_scores) - min(raw_scores)
     chunks = [
+        "### Ranking Summary",
         f"- Model repo: `{MODEL_REPO_ID}`",
+        f"- Assay fields serialized into `{len(query_text.split())}` words",
         f"- Valid molecules ranked: `{len(valid_rows)}`",
         f"- Invalid molecules rejected: `{len(invalid_rows)}`",
     ]
     if best is not None:
+        chunks.append(
+            f"- Top hit: `{best['canonical_smiles']}` · `{best['priority_band']}` · list-relative score `{best['relative_score']:.1f}/100`"
+        )
+    if score_range is not None:
+        chunks.append(f"- Score spread across this submitted list: `{score_range:.2f}` model-score units")
     if warning:
         chunks.append(f"- Warning: {warning}")
     chunks.append("")
+    chunks.append(
+        "Use the **priority band** and **list-relative score** first. The raw model score is only a debugging value. "
+        "A candidate with `relative score 100` is the strongest item in your submitted list, not in all chemistry."
+    )
     return "\n".join(chunks)
     if not rows:
         return None
     handle = tempfile.NamedTemporaryFile("w", suffix=".csv", delete=False, newline="")
+    writer = csv.DictWriter(
+        handle,
+        fieldnames=[
+            "rank",
+            "priority_band",
+            "relative_score_100",
+            "input_smiles",
+            "canonical_smiles",
+            "smiles_hash",
+            "mol_wt",
+            "logp",
+            "tpsa",
+            "heavy_atoms",
+            "model_score",
+            "valid",
+            "error",
+        ],
+    )
     writer.writeheader()
     rank = 1
     for row in valid_rows:
         writer.writerow(
             {
                 "rank": rank,
+                "priority_band": row["priority_band"],
+                "relative_score_100": row["relative_score"],
                 "input_smiles": row["input_smiles"],
                 "canonical_smiles": row["canonical_smiles"],
                 "smiles_hash": row["smiles_hash"],
+                "mol_wt": row["mol_wt"],
+                "logp": row["logp"],
+                "tpsa": row["tpsa"],
+                "heavy_atoms": row["heavy_atoms"],
+                "model_score": row["score"],
                 "valid": True,
                 "error": "",
             }
                 "input_smiles": row["input_smiles"],
                 "canonical_smiles": "",
                 "smiles_hash": "",
+                "mol_wt": "",
+                "logp": "",
+                "tpsa": "",
+                "heavy_atoms": "",
+                "model_score": "",
                 "valid": False,
                 "error": row.get("error", "invalid_smiles"),
             }
     ranked = rank_compounds(model, assay_text=assay_text, smiles_list=smiles_values, top_k=top_k or None)
     valid_rows = [row for row in ranked if row["valid"]]
     invalid_rows = [row for row in ranked if not row["valid"]]
+    valid_rows = _decorate_valid_rows(valid_rows)
     display_rows = [
         {
             "rank": idx + 1,
+            "priority": row["priority_band"],
+            "relative_score_100": row["relative_score"],
             "canonical_smiles": row["canonical_smiles"],
+            "mol_wt": row["mol_wt"],
+            "logp": row["logp"],
+            "tpsa": row["tpsa"],
+            "heavy_atoms": row["heavy_atoms"],
+            "model_score": round(float(row["score"]), 4),
         }
         for idx, row in enumerate(valid_rows)
     ]
     )
+with gr.Blocks(title="BioAssayAlign Compatibility Explorer", analytics_enabled=False) as demo:
     gr.Markdown(
         """
 <style>
         + """
 </style>
 <div id="hero">
+  <div class="eyebrow">BioAssayAlign · assay-conditioned molecule ranking</div>
+  <div class="hero-title">Prioritize a candidate list against an assay</div>
   <div class="hero-copy">
+    Enter assay context, submit a candidate molecule list, and get a ranked shortlist from the current BioAssayAlign compatibility model.
+    The output is designed for triage: which molecules look strongest relative to the other candidates you submitted.
   </div>
 </div>
 """
             gr.Markdown(
                 """
 <div class="panel-note">
+Use structured assay fields when possible. Missing fields are allowed, but species, readout, format, and target metadata usually improve ranking quality.
 </div>
 """
             )
                 f"""
 <div class="metric-strip">
   <div class="metric-card"><span>Default model</span><strong>{MODEL_REPO_ID}</strong></div>
+  <div class="metric-card"><span>Use the output for</span><strong>ranking, not probability</strong></div>
+  <div class="metric-card"><span>Interactive cap</span><strong>{MAX_INPUT_SMILES} molecules</strong></div>
 </div>
 """
             )
+    gr.Markdown(
+        """
+<div class="guide-grid">
+  <div class="guide-card"><strong>1. Define the assay</strong>Use plain scientific language. Add UniProt, readout, and organism if you know them.</div>
+  <div class="guide-card"><strong>2. Submit candidates</strong>Paste one SMILES per line or upload a CSV with a <code>smiles</code> column.</div>
+  <div class="guide-card"><strong>3. Read the ranking</strong>Use <em>priority</em> and <em>relative score</em> first. Ignore the raw model score unless you are debugging.</div>
+</div>
+"""
+    )
     with gr.Tab("Rank Compounds"):
         with gr.Row():
             with gr.Column(scale=6):
+                example_name = gr.Dropdown(choices=list(EXAMPLES.keys()), value="BTK binding sanity check", label="Live example")
                 load_example_btn = gr.Button("Load Example", variant="secondary")
+                gr.Markdown("These example inputs run against the live model. The outputs are not cached screenshots.")
                 assay_title = gr.Textbox(label="Assay title")
                 description = gr.Textbox(label="Description", lines=6, placeholder="Describe the assay in practical lab language.")
                 with gr.Row():
                 smiles_text = gr.Textbox(
                     label="Candidate SMILES",
                     lines=14,
+                    placeholder="Paste one candidate molecule per line. Example: CCO",
                 )
                 upload_file = gr.File(label="Upload CSV / TXT / SMI", file_count="single", file_types=[".csv", ".txt", ".smi", ".smiles"])
                 top_k = gr.Slider(label="Top-K rows to display", minimum=5, maximum=200, step=5, value=DEFAULT_TOP_K)
+                run_btn = gr.Button("Run Ranking", variant="primary")
                 clear_btn = gr.ClearButton(value="Clear", components=[assay_title, description, organism, readout, assay_format, assay_type, target_uniprot, smiles_text, upload_file])
         summary = gr.Markdown()
         with gr.Accordion("Serialized assay text used by the model", open=False):
             assay_preview = gr.Textbox(lines=12, label="Model-facing assay text")
+        ranked_df = gr.Dataframe(label="Ranked candidates", interactive=False, wrap=True)
         invalid_df = gr.Dataframe(label="Rejected inputs", interactive=False, wrap=True)
         download_file = gr.File(label="Download CSV")
     with gr.Tab("How To Use This"):
         gr.Markdown(
             """
+### Input recipe
 1. Describe the assay in plain scientific language.
 2. Add metadata if you know it: organism, readout, format, assay type, target UniProt.
 3. Paste a candidate list or upload a CSV with a `smiles` column.
+4. Run ranking and inspect the top band first.
+### How to read the result table
+- **priority** is the first thing to read:
+  - `Screen first`
+  - `Worth a look`
+  - `Middle pack`
+  - `Low priority`
+- **relative_score_100** rescales the submitted list so the strongest candidate is near `100` and the weakest is near `0`.
+- **model_score** is the raw internal score. It is useful for debugging, not for scientific interpretation.
+- **mol_wt / logp / tpsa** are quick chemistry context columns so you can sanity-check what the model surfaced.
 ### Good input habits
 - Prefer parent, neutralized, chemically sensible SMILES.
 - Keep assay descriptions concrete.
 - If the assay is target-defined, add the UniProt ID.
+- If you upload a CSV, use one SMILES per row in a column named `smiles` or `canonical_smiles`.
 ### What this Space is not
 if __name__ == "__main__":
+    threading.Thread(target=_warm_model_background, daemon=True).start()
+    demo.queue(default_concurrency_limit=4).launch(
+        show_error=True,
+        quiet=True,
+        footer_links=["gradio"],
+    )

space_runtime.py CHANGED Viewed

@@ -1,7 +1,10 @@
 from __future__ import annotations
 import hashlib
 import json
 import re
 from dataclasses import dataclass
 from functools import lru_cache
@@ -12,12 +15,19 @@ import numpy as np
 import torch
 import torch.nn.functional as F
 from huggingface_hub import snapshot_download
 from rdkit import Chem, DataStructs, RDLogger
 from rdkit.Chem import AllChem, Crippen, Descriptors, Lipinski, MACCSkeys, rdMolDescriptors
 from rdkit.Chem.MolStandardize import rdMolStandardize
 from sentence_transformers import SentenceTransformer
 from torch import nn
 from transformers import AutoModel, AutoTokenizer
 RDLogger.DisableLog("rdApp.*")
@@ -90,6 +100,13 @@ def smiles_sha256(smiles: str) -> str:
     return hashlib.sha256(smiles.encode("utf-8")).hexdigest()
 @lru_cache(maxsize=1_000_000)
 def _standardize_smiles_v2_cached(smiles: str) -> str | None:
     mol = Chem.MolFromSmiles(smiles)
@@ -251,6 +268,24 @@ def _molecule_descriptor_vector(mol, *, names: tuple[str, ...] = DEFAULT_DESCRIP
     return np.array([values[name] for name in names], dtype=np.float32)
 class CompatibilityHead(nn.Module):
     def __init__(self, *, assay_dim: int, molecule_dim: int, projection_dim: int, hidden_dim: int, dropout: float) -> None:
         super().__init__()
@@ -349,15 +384,16 @@ class SpaceCompatibilityModel:
         if not self.molecule_transformer_model_name or self._molecule_transformer_model is not None:
             return
         dtype = torch.float16 if self._molecule_transformer_device.type == "cuda" else torch.float32
-        self._molecule_transformer_tokenizer = AutoTokenizer.from_pretrained(
-            self.molecule_transformer_model_name,
-            trust_remote_code=True,
-        )
-        self._molecule_transformer_model = AutoModel.from_pretrained(
-            self.molecule_transformer_model_name,
-            trust_remote_code=True,
-            torch_dtype=dtype,
-        ).to(self._molecule_transformer_device)
         self._molecule_transformer_model.eval()
     def _encode_molecule_transformer_batch(self, smiles_values: list[str]) -> np.ndarray | None:
@@ -413,11 +449,12 @@ class SpaceCompatibilityModel:
 def _load_sentence_transformer(model_name: str) -> SentenceTransformer:
     dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
-    encoder = SentenceTransformer(
-        model_name,
-        trust_remote_code=True,
-        model_kwargs={"torch_dtype": dtype},
-    )
     if getattr(encoder, "tokenizer", None) is not None:
         encoder.tokenizer.padding_side = "left"
     return encoder
@@ -489,11 +526,12 @@ def load_compatibility_model(model_dir: str | Path) -> SpaceCompatibilityModel:
 @lru_cache(maxsize=1)
 def load_compatibility_model_from_hub(model_repo_id: str) -> SpaceCompatibilityModel:
-    model_dir = snapshot_download(
-        repo_id=model_repo_id,
-        repo_type="model",
-        allow_patterns=["best_model.pt", "training_metadata.json", "README.md"],
-    )
     return load_compatibility_model(model_dir)

 from __future__ import annotations
+import contextlib
 import hashlib
+import io
 import json
+import os
 import re
 from dataclasses import dataclass
 from functools import lru_cache
 import torch
 import torch.nn.functional as F
 from huggingface_hub import snapshot_download
+from huggingface_hub.utils import disable_progress_bars
 from rdkit import Chem, DataStructs, RDLogger
 from rdkit.Chem import AllChem, Crippen, Descriptors, Lipinski, MACCSkeys, rdMolDescriptors
 from rdkit.Chem.MolStandardize import rdMolStandardize
 from sentence_transformers import SentenceTransformer
 from torch import nn
 from transformers import AutoModel, AutoTokenizer
+from transformers.utils import logging as transformers_logging
+os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1")
+os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
+disable_progress_bars()
+transformers_logging.set_verbosity_error()
 RDLogger.DisableLog("rdApp.*")
     return hashlib.sha256(smiles.encode("utf-8")).hexdigest()
+@contextlib.contextmanager
+def _silent_imports():
+    buffer = io.StringIO()
+    with contextlib.redirect_stdout(buffer), contextlib.redirect_stderr(buffer):
+        yield
 @lru_cache(maxsize=1_000_000)
 def _standardize_smiles_v2_cached(smiles: str) -> str | None:
     mol = Chem.MolFromSmiles(smiles)
     return np.array([values[name] for name in names], dtype=np.float32)
+def molecule_ui_metrics(smiles: str) -> dict[str, float | int]:
+    canonical = standardize_smiles_v2(smiles) or smiles
+    mol = Chem.MolFromSmiles(canonical)
+    if mol is None:
+        return {
+            "mol_wt": 0.0,
+            "logp": 0.0,
+            "tpsa": 0.0,
+            "heavy_atoms": 0,
+        }
+    return {
+        "mol_wt": float(Descriptors.MolWt(mol)),
+        "logp": float(Crippen.MolLogP(mol)),
+        "tpsa": float(rdMolDescriptors.CalcTPSA(mol)),
+        "heavy_atoms": int(mol.GetNumHeavyAtoms()),
+    }
 class CompatibilityHead(nn.Module):
     def __init__(self, *, assay_dim: int, molecule_dim: int, projection_dim: int, hidden_dim: int, dropout: float) -> None:
         super().__init__()
         if not self.molecule_transformer_model_name or self._molecule_transformer_model is not None:
             return
         dtype = torch.float16 if self._molecule_transformer_device.type == "cuda" else torch.float32
+        with _silent_imports():
+            self._molecule_transformer_tokenizer = AutoTokenizer.from_pretrained(
+                self.molecule_transformer_model_name,
+                trust_remote_code=True,
+            )
+            self._molecule_transformer_model = AutoModel.from_pretrained(
+                self.molecule_transformer_model_name,
+                trust_remote_code=True,
+                torch_dtype=dtype,
+            ).to(self._molecule_transformer_device)
         self._molecule_transformer_model.eval()
     def _encode_molecule_transformer_batch(self, smiles_values: list[str]) -> np.ndarray | None:
 def _load_sentence_transformer(model_name: str) -> SentenceTransformer:
     dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
+    with _silent_imports():
+        encoder = SentenceTransformer(
+            model_name,
+            trust_remote_code=True,
+            model_kwargs={"torch_dtype": dtype},
+        )
     if getattr(encoder, "tokenizer", None) is not None:
         encoder.tokenizer.padding_side = "left"
     return encoder
 @lru_cache(maxsize=1)
 def load_compatibility_model_from_hub(model_repo_id: str) -> SpaceCompatibilityModel:
+    with _silent_imports():
+        model_dir = snapshot_download(
+            repo_id=model_repo_id,
+            repo_type="model",
+            allow_patterns=["best_model.pt", "training_metadata.json", "README.md"],
+        )
     return load_compatibility_model(model_dir)