lighteternal commited on
Commit
f1158c7
·
verified ·
1 Parent(s): 2ad7575

Polish UX, examples, and result explainability

Browse files
README.md CHANGED
@@ -1,20 +1,20 @@
1
  ---
2
  title: BioAssayAlign Compatibility Explorer
3
  emoji: 🧪
4
- colorFrom: blue
5
- colorTo: gray
6
  sdk: gradio
7
  sdk_version: 6.9.0
8
  python_version: "3.10"
9
  app_file: app.py
10
  pinned: false
11
  license: mit
12
- short_description: Rank candidate molecules for a bioassay.
13
  ---
14
 
15
  # BioAssayAlign Compatibility Explorer
16
 
17
- This Space is a scientist-facing demo for **assay-conditioned compound ranking**.
18
 
19
  You provide:
20
  - a bioassay description and optional metadata
@@ -27,7 +27,7 @@ The model returns:
27
 
28
  ## What It Is
29
 
30
- This is not a chatbot and it is not a potency predictor.
31
 
32
  It is a **ranking model** trained on a frozen public bioassay dataset built from PubChem BioAssay and ChEMBL. It is designed to answer:
33
 
@@ -35,9 +35,27 @@ It is a **ranking model** trained on a frozen public bioassay dataset built from
35
 
36
  ## What The Score Means
37
 
38
- - Higher score = the model believes the molecule is more compatible with the assay than lower-ranked candidates in the same list.
39
- - The score is **not** a probability.
40
- - The score is best used for **ranking**, not absolute decision thresholds.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## Recommended Input Style
43
 
@@ -58,6 +76,14 @@ You can paste SMILES directly or upload a CSV with a `smiles` or `canonical_smil
58
  - triaging compounds before a more expensive downstream model or wet-lab step
59
  - testing how sensitive rankings are to assay wording and metadata
60
 
 
 
 
 
 
 
 
 
61
  ## Limits
62
 
63
  - This is a public-data model, not a medicinal chemistry oracle.
@@ -66,7 +92,7 @@ You can paste SMILES directly or upload a CSV with a `smiles` or `canonical_smil
66
 
67
  ## Runtime Notes
68
 
69
- - The first request can be slower because the Space has to load the model.
70
  - Large candidate lists increase runtime. For interactive use, start with a few hundred molecules.
71
 
72
  ## Model
 
1
  ---
2
  title: BioAssayAlign Compatibility Explorer
3
  emoji: 🧪
4
+ colorFrom: green
5
+ colorTo: red
6
  sdk: gradio
7
  sdk_version: 6.9.0
8
  python_version: "3.10"
9
  app_file: app.py
10
  pinned: false
11
  license: mit
12
+ short_description: Rank a candidate molecule list against a bioassay.
13
  ---
14
 
15
  # BioAssayAlign Compatibility Explorer
16
 
17
+ BioAssayAlign is an **assay-conditioned molecule ranking** tool.
18
 
19
  You provide:
20
  - a bioassay description and optional metadata
 
27
 
28
  ## What It Is
29
 
30
+ This is not a chatbot. It is not a potency predictor.
31
 
32
  It is a **ranking model** trained on a frozen public bioassay dataset built from PubChem BioAssay and ChEMBL. It is designed to answer:
33
 
 
35
 
36
  ## What The Score Means
37
 
38
+ - The app shows a **priority band** and a **list-relative score** first.
39
+ - Those values explain the ranking better than the raw model score.
40
+ - The raw score is **not** a probability. Use it only for debugging.
41
+ - The strongest molecule in your submitted list will be near the top of the `0–100` relative scale.
42
+
43
+ ## How To Use It
44
+
45
+ 1. Enter the assay title and description in plain scientific language.
46
+ 2. Add metadata if you know it:
47
+ - organism
48
+ - readout
49
+ - assay format
50
+ - assay type
51
+ - target UniProt ID
52
+ 3. Paste one SMILES per line or upload a CSV with a `smiles` column.
53
+ 4. Run ranking.
54
+ 5. Read the output in this order:
55
+ - `priority`
56
+ - `relative score`
57
+ - chemistry context columns (`MolWt`, `logP`, `TPSA`)
58
+ - raw model score only if needed
59
 
60
  ## Recommended Input Style
61
 
 
76
  - triaging compounds before a more expensive downstream model or wet-lab step
77
  - testing how sensitive rankings are to assay wording and metadata
78
 
79
+ ## Example Assays Included In The UI
80
+
81
+ - BTK binding sanity check
82
+ - JAK2 cell assay
83
+ - ALDH1A1 fluorescence assay
84
+
85
+ These examples call the live model. They are not screenshots or mocked outputs.
86
+
87
  ## Limits
88
 
89
  - This is a public-data model, not a medicinal chemistry oracle.
 
92
 
93
  ## Runtime Notes
94
 
95
+ - The first request can be slower because the Space warms the model in the background.
96
  - Large candidate lists increase runtime. For interactive use, start with a few hundred molecules.
97
 
98
  ## Model
__pycache__/app.cpython-310.pyc ADDED
Binary file (16.9 kB). View file
 
__pycache__/space_runtime.cpython-310.pyc ADDED
Binary file (21.3 kB). View file
 
app.py CHANGED
@@ -3,46 +3,59 @@ from __future__ import annotations
3
  import csv
4
  import os
5
  import tempfile
 
6
  from pathlib import Path
7
  from typing import Any
8
 
9
  import gradio as gr
 
10
  import pandas as pd
11
 
12
- from space_runtime import AssayQuery, load_compatibility_model_from_hub, rank_compounds, serialize_assay_query
 
 
 
 
 
 
13
 
14
  MODEL_REPO_ID = os.getenv("MODEL_REPO_ID", "lighteternal/BioAssayAlign-Qwen3-Embedding-0.6B-Compatibility")
15
  MAX_INPUT_SMILES = int(os.getenv("MAX_INPUT_SMILES", "3000"))
16
  DEFAULT_TOP_K = int(os.getenv("DEFAULT_TOP_K", "50"))
17
 
18
  CSS = """
19
- @import url('https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Source+Serif+4:wght@500;600;700&display=swap');
20
 
21
  :root {
22
- --paper: #f4efe6;
23
- --ink: #122033;
24
- --ink-soft: #4f6073;
25
- --accent: #0f5fd7;
26
- --accent-soft: #d9e8ff;
27
- --line: #c9d1db;
 
28
  --warning: #8a4b0f;
29
  --good: #0e6b48;
 
30
  }
31
 
32
  .gradio-container {
33
  font-family: "IBM Plex Sans", sans-serif;
34
  background:
35
- radial-gradient(circle at top right, rgba(15,95,215,0.08), transparent 24rem),
36
- linear-gradient(180deg, #faf7f0 0%, var(--paper) 100%);
 
37
  color: var(--ink);
38
  }
39
 
40
  #hero {
41
  border: 1px solid var(--line);
42
- background: linear-gradient(135deg, rgba(255,255,255,0.9), rgba(239,245,255,0.92));
43
- border-radius: 24px;
44
- padding: 1.25rem 1.4rem;
45
- box-shadow: 0 20px 40px rgba(18,32,51,0.08);
 
 
46
  }
47
 
48
  .eyebrow {
@@ -50,12 +63,12 @@ CSS = """
50
  font-size: 0.78rem;
51
  letter-spacing: 0.08em;
52
  text-transform: uppercase;
53
- color: var(--accent);
54
  }
55
 
56
  .hero-title {
57
- font-family: "Source Serif 4", serif;
58
- font-size: 2.2rem;
59
  line-height: 1.05;
60
  margin: 0.2rem 0 0.5rem 0;
61
  }
@@ -68,7 +81,7 @@ CSS = """
68
 
69
  .panel-note {
70
  border-left: 4px solid var(--accent);
71
- background: rgba(15,95,215,0.06);
72
  padding: 0.9rem 1rem;
73
  border-radius: 12px;
74
  }
@@ -81,7 +94,7 @@ CSS = """
81
 
82
  .metric-card {
83
  border: 1px solid var(--line);
84
- background: rgba(255,255,255,0.75);
85
  padding: 0.8rem 0.9rem;
86
  border-radius: 16px;
87
  }
@@ -91,10 +104,28 @@ CSS = """
91
  font-size: 1.15rem;
92
  margin-top: 0.15rem;
93
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
  """
95
 
96
  EXAMPLES = {
97
- "BTK binding": {
98
  "title": "BTK kinase inhibitor binding assay",
99
  "description": "In vitro kinase-domain binding assay for Bruton's tyrosine kinase inhibitor ranking.",
100
  "organism": "Homo sapiens",
@@ -105,13 +136,28 @@ EXAMPLES = {
105
  "smiles": "\n".join(
106
  [
107
  "CC1=NC(=O)N(C)C(=O)N1",
108
- "CCOc1ccc2nc(N3CCN(C)CC3)n(C)c(=O)c2c1",
109
- "CC(=O)Nc1ncc(C#N)c(Nc2ccc(F)c(Cl)c2)n1",
110
  "c1ccccc1",
111
  "CCO",
112
  ]
113
  ),
114
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
  "ALDH1A1 fluorescence": {
116
  "title": "ALDH1A1 inhibition assay",
117
  "description": "Cell-based fluorescence assay measuring ALDH1A1 inhibition in human cells.",
@@ -122,10 +168,9 @@ EXAMPLES = {
122
  "target_uniprot": "P00352",
123
  "smiles": "\n".join(
124
  [
 
125
  "CC1=CC(=O)N(C)C(=O)N1",
126
- "COC1=CC=C(C=C1)C(=O)O",
127
  "CCN(CC)CCOC1=CC=CC=C1",
128
- "CCOC1=CC=CC=C1",
129
  "CCO",
130
  ]
131
  ),
@@ -176,21 +221,78 @@ def _load_model():
176
  return load_compatibility_model_from_hub(MODEL_REPO_ID)
177
 
178
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
179
  def _build_summary(query_text: str, valid_rows: list[dict[str, Any]], invalid_rows: list[dict[str, Any]], warning: str | None) -> str:
180
  best = valid_rows[0] if valid_rows else None
 
 
 
 
181
  chunks = [
182
- "### Run Summary",
183
  f"- Model repo: `{MODEL_REPO_ID}`",
184
- f"- Assay prompt length: `{len(query_text.split())}` tokens-equivalent words",
185
  f"- Valid molecules ranked: `{len(valid_rows)}`",
186
  f"- Invalid molecules rejected: `{len(invalid_rows)}`",
187
  ]
188
  if best is not None:
189
- chunks.append(f"- Top hit: `{best['canonical_smiles']}` with score `{best['score']:.3f}`")
 
 
 
 
190
  if warning:
191
  chunks.append(f"- Warning: {warning}")
192
  chunks.append("")
193
- chunks.append("Higher scores mean the model ranks the molecule as more compatible with this assay than lower-scored candidates in the same list. Scores are ranking signals, not calibrated probabilities.")
 
 
 
194
  return "\n".join(chunks)
195
 
196
 
@@ -199,17 +301,40 @@ def _results_to_csv(valid_rows: list[dict[str, Any]], invalid_rows: list[dict[st
199
  if not rows:
200
  return None
201
  handle = tempfile.NamedTemporaryFile("w", suffix=".csv", delete=False, newline="")
202
- writer = csv.DictWriter(handle, fieldnames=["rank", "input_smiles", "canonical_smiles", "smiles_hash", "score", "valid", "error"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
203
  writer.writeheader()
204
  rank = 1
205
  for row in valid_rows:
206
  writer.writerow(
207
  {
208
  "rank": rank,
 
 
209
  "input_smiles": row["input_smiles"],
210
  "canonical_smiles": row["canonical_smiles"],
211
  "smiles_hash": row["smiles_hash"],
212
- "score": row["score"],
 
 
 
 
213
  "valid": True,
214
  "error": "",
215
  }
@@ -222,7 +347,11 @@ def _results_to_csv(valid_rows: list[dict[str, Any]], invalid_rows: list[dict[st
222
  "input_smiles": row["input_smiles"],
223
  "canonical_smiles": "",
224
  "smiles_hash": "",
225
- "score": "",
 
 
 
 
226
  "valid": False,
227
  "error": row.get("error", "invalid_smiles"),
228
  }
@@ -260,14 +389,19 @@ def run_ranking(
260
  ranked = rank_compounds(model, assay_text=assay_text, smiles_list=smiles_values, top_k=top_k or None)
261
  valid_rows = [row for row in ranked if row["valid"]]
262
  invalid_rows = [row for row in ranked if not row["valid"]]
 
263
 
264
  display_rows = [
265
  {
266
  "rank": idx + 1,
267
- "input_smiles": row["input_smiles"],
 
268
  "canonical_smiles": row["canonical_smiles"],
269
- "smiles_hash": row["smiles_hash"],
270
- "score": round(float(row["score"]), 4),
 
 
 
271
  }
272
  for idx, row in enumerate(valid_rows)
273
  ]
@@ -294,7 +428,7 @@ def load_example(example_name: str):
294
  )
295
 
296
 
297
- with gr.Blocks(title="BioAssayAlign Compatibility Explorer") as demo:
298
  gr.Markdown(
299
  """
300
  <style>
@@ -303,11 +437,11 @@ with gr.Blocks(title="BioAssayAlign Compatibility Explorer") as demo:
303
  + """
304
  </style>
305
  <div id="hero">
306
- <div class="eyebrow">BioAssayAlign · scientist-facing ranking demo</div>
307
- <div class="hero-title">Rank candidate molecules for a bioassay</div>
308
  <div class="hero-copy">
309
- Build an assay query from structured fields, paste or upload a candidate molecule list, and get a ranked output from the current BioAssayAlign compatibility model.
310
- This app is designed for triage and prioritization, not for direct potency claims.
311
  </div>
312
  </div>
313
  """
@@ -318,7 +452,7 @@ with gr.Blocks(title="BioAssayAlign Compatibility Explorer") as demo:
318
  gr.Markdown(
319
  """
320
  <div class="panel-note">
321
- Use the structured fields if you have them. Missing fields are allowed, but species, readout, and target metadata usually help.
322
  </div>
323
  """
324
  )
@@ -327,17 +461,28 @@ Use the structured fields if you have them. Missing fields are allowed, but spec
327
  f"""
328
  <div class="metric-strip">
329
  <div class="metric-card"><span>Default model</span><strong>{MODEL_REPO_ID}</strong></div>
330
- <div class="metric-card"><span>Expected use</span><strong>ranking, not probability</strong></div>
331
- <div class="metric-card"><span>Interactive cap</span><strong>{MAX_INPUT_SMILES} SMILES</strong></div>
332
  </div>
333
  """
334
  )
335
 
 
 
 
 
 
 
 
 
 
 
336
  with gr.Tab("Rank Compounds"):
337
  with gr.Row():
338
  with gr.Column(scale=6):
339
- example_name = gr.Dropdown(choices=list(EXAMPLES.keys()), value="BTK binding", label="Load an example")
340
  load_example_btn = gr.Button("Load Example", variant="secondary")
 
341
  assay_title = gr.Textbox(label="Assay title")
342
  description = gr.Textbox(label="Description", lines=6, placeholder="Describe the assay in practical lab language.")
343
  with gr.Row():
@@ -352,17 +497,17 @@ Use the structured fields if you have them. Missing fields are allowed, but spec
352
  smiles_text = gr.Textbox(
353
  label="Candidate SMILES",
354
  lines=14,
355
- placeholder="Paste one SMILES per line. CSV upload is optional and will be merged.",
356
  )
357
  upload_file = gr.File(label="Upload CSV / TXT / SMI", file_count="single", file_types=[".csv", ".txt", ".smi", ".smiles"])
358
  top_k = gr.Slider(label="Top-K rows to display", minimum=5, maximum=200, step=5, value=DEFAULT_TOP_K)
359
- run_btn = gr.Button("Rank Molecules", variant="primary")
360
  clear_btn = gr.ClearButton(value="Clear", components=[assay_title, description, organism, readout, assay_format, assay_type, target_uniprot, smiles_text, upload_file])
361
 
362
  summary = gr.Markdown()
363
  with gr.Accordion("Serialized assay text used by the model", open=False):
364
  assay_preview = gr.Textbox(lines=12, label="Model-facing assay text")
365
- ranked_df = gr.Dataframe(label="Ranked molecules", interactive=False, wrap=True)
366
  invalid_df = gr.Dataframe(label="Rejected inputs", interactive=False, wrap=True)
367
  download_file = gr.File(label="Download CSV")
368
 
@@ -380,24 +525,30 @@ Use the structured fields if you have them. Missing fields are allowed, but spec
380
  with gr.Tab("How To Use This"):
381
  gr.Markdown(
382
  """
383
- ### Recommended workflow
384
 
385
  1. Describe the assay in plain scientific language.
386
  2. Add metadata if you know it: organism, readout, format, assay type, target UniProt.
387
  3. Paste a candidate list or upload a CSV with a `smiles` column.
388
- 4. Rank the list and inspect the top molecules first.
389
 
390
- ### What the score means
391
 
392
- - The score is a ranking signal.
393
- - Higher means “more compatible than the other molecules in this submitted list”.
394
- - It is **not** a calibrated activity probability and it is **not** an IC50 prediction.
 
 
 
 
 
395
 
396
  ### Good input habits
397
 
398
  - Prefer parent, neutralized, chemically sensible SMILES.
399
  - Keep assay descriptions concrete.
400
  - If the assay is target-defined, add the UniProt ID.
 
401
 
402
  ### What this Space is not
403
 
@@ -409,4 +560,9 @@ Use the structured fields if you have them. Missing fields are allowed, but spec
409
 
410
 
411
  if __name__ == "__main__":
412
- demo.queue(default_concurrency_limit=4).launch(show_error=True)
 
 
 
 
 
 
3
  import csv
4
  import os
5
  import tempfile
6
+ import threading
7
  from pathlib import Path
8
  from typing import Any
9
 
10
  import gradio as gr
11
+ import numpy as np
12
  import pandas as pd
13
 
14
+ from space_runtime import (
15
+ AssayQuery,
16
+ load_compatibility_model_from_hub,
17
+ molecule_ui_metrics,
18
+ rank_compounds,
19
+ serialize_assay_query,
20
+ )
21
 
22
  MODEL_REPO_ID = os.getenv("MODEL_REPO_ID", "lighteternal/BioAssayAlign-Qwen3-Embedding-0.6B-Compatibility")
23
  MAX_INPUT_SMILES = int(os.getenv("MAX_INPUT_SMILES", "3000"))
24
  DEFAULT_TOP_K = int(os.getenv("DEFAULT_TOP_K", "50"))
25
 
26
  CSS = """
27
+ @import url('https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Fraunces:opsz,wght@9..144,600;9..144,700&display=swap');
28
 
29
  :root {
30
+ --paper: #f4efe4;
31
+ --ink: #132128;
32
+ --ink-soft: #56656e;
33
+ --accent: #135a52;
34
+ --accent-soft: #d9ece8;
35
+ --accent-warm: #ab5936;
36
+ --line: #c8cfc7;
37
  --warning: #8a4b0f;
38
  --good: #0e6b48;
39
+ --card: rgba(255,255,255,0.82);
40
  }
41
 
42
  .gradio-container {
43
  font-family: "IBM Plex Sans", sans-serif;
44
  background:
45
+ radial-gradient(circle at top right, rgba(19,90,82,0.12), transparent 24rem),
46
+ radial-gradient(circle at bottom left, rgba(171,89,54,0.10), transparent 22rem),
47
+ linear-gradient(180deg, #faf7ef 0%, var(--paper) 100%);
48
  color: var(--ink);
49
  }
50
 
51
  #hero {
52
  border: 1px solid var(--line);
53
+ background:
54
+ linear-gradient(135deg, rgba(255,255,255,0.95), rgba(240,246,244,0.90)),
55
+ linear-gradient(90deg, rgba(19,90,82,0.03), rgba(171,89,54,0.03));
56
+ border-radius: 28px;
57
+ padding: 1.35rem 1.5rem;
58
+ box-shadow: 0 24px 50px rgba(19,33,40,0.08);
59
  }
60
 
61
  .eyebrow {
 
63
  font-size: 0.78rem;
64
  letter-spacing: 0.08em;
65
  text-transform: uppercase;
66
+ color: var(--accent-warm);
67
  }
68
 
69
  .hero-title {
70
+ font-family: "Fraunces", serif;
71
+ font-size: 2.35rem;
72
  line-height: 1.05;
73
  margin: 0.2rem 0 0.5rem 0;
74
  }
 
81
 
82
  .panel-note {
83
  border-left: 4px solid var(--accent);
84
+ background: rgba(19,90,82,0.06);
85
  padding: 0.9rem 1rem;
86
  border-radius: 12px;
87
  }
 
94
 
95
  .metric-card {
96
  border: 1px solid var(--line);
97
+ background: var(--card);
98
  padding: 0.8rem 0.9rem;
99
  border-radius: 16px;
100
  }
 
104
  font-size: 1.15rem;
105
  margin-top: 0.15rem;
106
  }
107
+
108
+ .guide-grid {
109
+ display: grid;
110
+ grid-template-columns: repeat(3, minmax(0, 1fr));
111
+ gap: 0.8rem;
112
+ }
113
+
114
+ .guide-card {
115
+ border: 1px solid var(--line);
116
+ background: var(--card);
117
+ padding: 0.9rem 1rem;
118
+ border-radius: 16px;
119
+ }
120
+
121
+ .guide-card strong {
122
+ display: block;
123
+ margin-bottom: 0.2rem;
124
+ }
125
  """
126
 
127
  EXAMPLES = {
128
+ "BTK binding sanity check": {
129
  "title": "BTK kinase inhibitor binding assay",
130
  "description": "In vitro kinase-domain binding assay for Bruton's tyrosine kinase inhibitor ranking.",
131
  "organism": "Homo sapiens",
 
136
  "smiles": "\n".join(
137
  [
138
  "CC1=NC(=O)N(C)C(=O)N1",
 
 
139
  "c1ccccc1",
140
  "CCO",
141
  ]
142
  ),
143
  },
144
+ "JAK2 cell assay": {
145
+ "title": "JAK2 inhibition assay",
146
+ "description": "Cell-based luminescence assay measuring JAK2 inhibition in HEK293 cells.",
147
+ "organism": "Homo sapiens",
148
+ "readout": "luminescence",
149
+ "assay_format": "cell-based",
150
+ "assay_type": "inhibition",
151
+ "target_uniprot": "O60674",
152
+ "smiles": "\n".join(
153
+ [
154
+ "CC1=CC(=O)N(C)C(=O)N1",
155
+ "CC(=O)Nc1ncc(C#N)c(Nc2ccc(F)c(Cl)c2)n1",
156
+ "CCOc1ccc2nc(N3CCN(C)CC3)n(C)c(=O)c2c1",
157
+ "CCO",
158
+ ]
159
+ ),
160
+ },
161
  "ALDH1A1 fluorescence": {
162
  "title": "ALDH1A1 inhibition assay",
163
  "description": "Cell-based fluorescence assay measuring ALDH1A1 inhibition in human cells.",
 
168
  "target_uniprot": "P00352",
169
  "smiles": "\n".join(
170
  [
171
+ "CCOC1=CC=CC=C1",
172
  "CC1=CC(=O)N(C)C(=O)N1",
 
173
  "CCN(CC)CCOC1=CC=CC=C1",
 
174
  "CCO",
175
  ]
176
  ),
 
221
  return load_compatibility_model_from_hub(MODEL_REPO_ID)
222
 
223
 
224
+ def _warm_model_background() -> None:
225
+ try:
226
+ _load_model()
227
+ except Exception:
228
+ # Keep the app usable even if warmup fails; the request path will raise the real error.
229
+ return
230
+
231
+
232
+ def _priority_band(relative_score: float, rank: int, total: int) -> str:
233
+ if total <= 3:
234
+ return "Screen first" if rank == 1 else ("Worth a look" if rank == 2 else "Low priority")
235
+ if relative_score >= 85:
236
+ return "Screen first"
237
+ if relative_score >= 60:
238
+ return "Worth a look"
239
+ if relative_score >= 35:
240
+ return "Middle pack"
241
+ return "Low priority"
242
+
243
+
244
+ def _decorate_valid_rows(valid_rows: list[dict[str, Any]]) -> list[dict[str, Any]]:
245
+ if not valid_rows:
246
+ return []
247
+ scores = np.array([float(row["score"]) for row in valid_rows], dtype=np.float32)
248
+ minimum = float(scores.min())
249
+ maximum = float(scores.max())
250
+ spread = maximum - minimum
251
+ decorated: list[dict[str, Any]] = []
252
+ for idx, row in enumerate(valid_rows):
253
+ score = float(row["score"])
254
+ relative_score = 100.0 if spread <= 1e-8 and idx == 0 else (50.0 if spread <= 1e-8 else 100.0 * (score - minimum) / spread)
255
+ metrics = molecule_ui_metrics(row["canonical_smiles"])
256
+ decorated.append(
257
+ {
258
+ **row,
259
+ "relative_score": round(relative_score, 1),
260
+ "priority_band": _priority_band(relative_score, idx + 1, len(valid_rows)),
261
+ "mol_wt": round(float(metrics["mol_wt"]), 1),
262
+ "logp": round(float(metrics["logp"]), 2),
263
+ "tpsa": round(float(metrics["tpsa"]), 1),
264
+ "heavy_atoms": int(metrics["heavy_atoms"]),
265
+ }
266
+ )
267
+ return decorated
268
+
269
+
270
  def _build_summary(query_text: str, valid_rows: list[dict[str, Any]], invalid_rows: list[dict[str, Any]], warning: str | None) -> str:
271
  best = valid_rows[0] if valid_rows else None
272
+ score_range = None
273
+ if valid_rows:
274
+ raw_scores = [float(row["score"]) for row in valid_rows]
275
+ score_range = max(raw_scores) - min(raw_scores)
276
  chunks = [
277
+ "### Ranking Summary",
278
  f"- Model repo: `{MODEL_REPO_ID}`",
279
+ f"- Assay fields serialized into `{len(query_text.split())}` words",
280
  f"- Valid molecules ranked: `{len(valid_rows)}`",
281
  f"- Invalid molecules rejected: `{len(invalid_rows)}`",
282
  ]
283
  if best is not None:
284
+ chunks.append(
285
+ f"- Top hit: `{best['canonical_smiles']}` · `{best['priority_band']}` · list-relative score `{best['relative_score']:.1f}/100`"
286
+ )
287
+ if score_range is not None:
288
+ chunks.append(f"- Score spread across this submitted list: `{score_range:.2f}` model-score units")
289
  if warning:
290
  chunks.append(f"- Warning: {warning}")
291
  chunks.append("")
292
+ chunks.append(
293
+ "Use the **priority band** and **list-relative score** first. The raw model score is only a debugging value. "
294
+ "A candidate with `relative score 100` is the strongest item in your submitted list, not in all chemistry."
295
+ )
296
  return "\n".join(chunks)
297
 
298
 
 
301
  if not rows:
302
  return None
303
  handle = tempfile.NamedTemporaryFile("w", suffix=".csv", delete=False, newline="")
304
+ writer = csv.DictWriter(
305
+ handle,
306
+ fieldnames=[
307
+ "rank",
308
+ "priority_band",
309
+ "relative_score_100",
310
+ "input_smiles",
311
+ "canonical_smiles",
312
+ "smiles_hash",
313
+ "mol_wt",
314
+ "logp",
315
+ "tpsa",
316
+ "heavy_atoms",
317
+ "model_score",
318
+ "valid",
319
+ "error",
320
+ ],
321
+ )
322
  writer.writeheader()
323
  rank = 1
324
  for row in valid_rows:
325
  writer.writerow(
326
  {
327
  "rank": rank,
328
+ "priority_band": row["priority_band"],
329
+ "relative_score_100": row["relative_score"],
330
  "input_smiles": row["input_smiles"],
331
  "canonical_smiles": row["canonical_smiles"],
332
  "smiles_hash": row["smiles_hash"],
333
+ "mol_wt": row["mol_wt"],
334
+ "logp": row["logp"],
335
+ "tpsa": row["tpsa"],
336
+ "heavy_atoms": row["heavy_atoms"],
337
+ "model_score": row["score"],
338
  "valid": True,
339
  "error": "",
340
  }
 
347
  "input_smiles": row["input_smiles"],
348
  "canonical_smiles": "",
349
  "smiles_hash": "",
350
+ "mol_wt": "",
351
+ "logp": "",
352
+ "tpsa": "",
353
+ "heavy_atoms": "",
354
+ "model_score": "",
355
  "valid": False,
356
  "error": row.get("error", "invalid_smiles"),
357
  }
 
389
  ranked = rank_compounds(model, assay_text=assay_text, smiles_list=smiles_values, top_k=top_k or None)
390
  valid_rows = [row for row in ranked if row["valid"]]
391
  invalid_rows = [row for row in ranked if not row["valid"]]
392
+ valid_rows = _decorate_valid_rows(valid_rows)
393
 
394
  display_rows = [
395
  {
396
  "rank": idx + 1,
397
+ "priority": row["priority_band"],
398
+ "relative_score_100": row["relative_score"],
399
  "canonical_smiles": row["canonical_smiles"],
400
+ "mol_wt": row["mol_wt"],
401
+ "logp": row["logp"],
402
+ "tpsa": row["tpsa"],
403
+ "heavy_atoms": row["heavy_atoms"],
404
+ "model_score": round(float(row["score"]), 4),
405
  }
406
  for idx, row in enumerate(valid_rows)
407
  ]
 
428
  )
429
 
430
 
431
+ with gr.Blocks(title="BioAssayAlign Compatibility Explorer", analytics_enabled=False) as demo:
432
  gr.Markdown(
433
  """
434
  <style>
 
437
  + """
438
  </style>
439
  <div id="hero">
440
+ <div class="eyebrow">BioAssayAlign · assay-conditioned molecule ranking</div>
441
+ <div class="hero-title">Prioritize a candidate list against an assay</div>
442
  <div class="hero-copy">
443
+ Enter assay context, submit a candidate molecule list, and get a ranked shortlist from the current BioAssayAlign compatibility model.
444
+ The output is designed for triage: which molecules look strongest relative to the other candidates you submitted.
445
  </div>
446
  </div>
447
  """
 
452
  gr.Markdown(
453
  """
454
  <div class="panel-note">
455
+ Use structured assay fields when possible. Missing fields are allowed, but species, readout, format, and target metadata usually improve ranking quality.
456
  </div>
457
  """
458
  )
 
461
  f"""
462
  <div class="metric-strip">
463
  <div class="metric-card"><span>Default model</span><strong>{MODEL_REPO_ID}</strong></div>
464
+ <div class="metric-card"><span>Use the output for</span><strong>ranking, not probability</strong></div>
465
+ <div class="metric-card"><span>Interactive cap</span><strong>{MAX_INPUT_SMILES} molecules</strong></div>
466
  </div>
467
  """
468
  )
469
 
470
+ gr.Markdown(
471
+ """
472
+ <div class="guide-grid">
473
+ <div class="guide-card"><strong>1. Define the assay</strong>Use plain scientific language. Add UniProt, readout, and organism if you know them.</div>
474
+ <div class="guide-card"><strong>2. Submit candidates</strong>Paste one SMILES per line or upload a CSV with a <code>smiles</code> column.</div>
475
+ <div class="guide-card"><strong>3. Read the ranking</strong>Use <em>priority</em> and <em>relative score</em> first. Ignore the raw model score unless you are debugging.</div>
476
+ </div>
477
+ """
478
+ )
479
+
480
  with gr.Tab("Rank Compounds"):
481
  with gr.Row():
482
  with gr.Column(scale=6):
483
+ example_name = gr.Dropdown(choices=list(EXAMPLES.keys()), value="BTK binding sanity check", label="Live example")
484
  load_example_btn = gr.Button("Load Example", variant="secondary")
485
+ gr.Markdown("These example inputs run against the live model. The outputs are not cached screenshots.")
486
  assay_title = gr.Textbox(label="Assay title")
487
  description = gr.Textbox(label="Description", lines=6, placeholder="Describe the assay in practical lab language.")
488
  with gr.Row():
 
497
  smiles_text = gr.Textbox(
498
  label="Candidate SMILES",
499
  lines=14,
500
+ placeholder="Paste one candidate molecule per line. Example: CCO",
501
  )
502
  upload_file = gr.File(label="Upload CSV / TXT / SMI", file_count="single", file_types=[".csv", ".txt", ".smi", ".smiles"])
503
  top_k = gr.Slider(label="Top-K rows to display", minimum=5, maximum=200, step=5, value=DEFAULT_TOP_K)
504
+ run_btn = gr.Button("Run Ranking", variant="primary")
505
  clear_btn = gr.ClearButton(value="Clear", components=[assay_title, description, organism, readout, assay_format, assay_type, target_uniprot, smiles_text, upload_file])
506
 
507
  summary = gr.Markdown()
508
  with gr.Accordion("Serialized assay text used by the model", open=False):
509
  assay_preview = gr.Textbox(lines=12, label="Model-facing assay text")
510
+ ranked_df = gr.Dataframe(label="Ranked candidates", interactive=False, wrap=True)
511
  invalid_df = gr.Dataframe(label="Rejected inputs", interactive=False, wrap=True)
512
  download_file = gr.File(label="Download CSV")
513
 
 
525
  with gr.Tab("How To Use This"):
526
  gr.Markdown(
527
  """
528
+ ### Input recipe
529
 
530
  1. Describe the assay in plain scientific language.
531
  2. Add metadata if you know it: organism, readout, format, assay type, target UniProt.
532
  3. Paste a candidate list or upload a CSV with a `smiles` column.
533
+ 4. Run ranking and inspect the top band first.
534
 
535
+ ### How to read the result table
536
 
537
+ - **priority** is the first thing to read:
538
+ - `Screen first`
539
+ - `Worth a look`
540
+ - `Middle pack`
541
+ - `Low priority`
542
+ - **relative_score_100** rescales the submitted list so the strongest candidate is near `100` and the weakest is near `0`.
543
+ - **model_score** is the raw internal score. It is useful for debugging, not for scientific interpretation.
544
+ - **mol_wt / logp / tpsa** are quick chemistry context columns so you can sanity-check what the model surfaced.
545
 
546
  ### Good input habits
547
 
548
  - Prefer parent, neutralized, chemically sensible SMILES.
549
  - Keep assay descriptions concrete.
550
  - If the assay is target-defined, add the UniProt ID.
551
+ - If you upload a CSV, use one SMILES per row in a column named `smiles` or `canonical_smiles`.
552
 
553
  ### What this Space is not
554
 
 
560
 
561
 
562
  if __name__ == "__main__":
563
+ threading.Thread(target=_warm_model_background, daemon=True).start()
564
+ demo.queue(default_concurrency_limit=4).launch(
565
+ show_error=True,
566
+ quiet=True,
567
+ footer_links=["gradio"],
568
+ )
space_runtime.py CHANGED
@@ -1,7 +1,10 @@
1
  from __future__ import annotations
2
 
 
3
  import hashlib
 
4
  import json
 
5
  import re
6
  from dataclasses import dataclass
7
  from functools import lru_cache
@@ -12,12 +15,19 @@ import numpy as np
12
  import torch
13
  import torch.nn.functional as F
14
  from huggingface_hub import snapshot_download
 
15
  from rdkit import Chem, DataStructs, RDLogger
16
  from rdkit.Chem import AllChem, Crippen, Descriptors, Lipinski, MACCSkeys, rdMolDescriptors
17
  from rdkit.Chem.MolStandardize import rdMolStandardize
18
  from sentence_transformers import SentenceTransformer
19
  from torch import nn
20
  from transformers import AutoModel, AutoTokenizer
 
 
 
 
 
 
21
 
22
  RDLogger.DisableLog("rdApp.*")
23
 
@@ -90,6 +100,13 @@ def smiles_sha256(smiles: str) -> str:
90
  return hashlib.sha256(smiles.encode("utf-8")).hexdigest()
91
 
92
 
 
 
 
 
 
 
 
93
  @lru_cache(maxsize=1_000_000)
94
  def _standardize_smiles_v2_cached(smiles: str) -> str | None:
95
  mol = Chem.MolFromSmiles(smiles)
@@ -251,6 +268,24 @@ def _molecule_descriptor_vector(mol, *, names: tuple[str, ...] = DEFAULT_DESCRIP
251
  return np.array([values[name] for name in names], dtype=np.float32)
252
 
253
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
254
  class CompatibilityHead(nn.Module):
255
  def __init__(self, *, assay_dim: int, molecule_dim: int, projection_dim: int, hidden_dim: int, dropout: float) -> None:
256
  super().__init__()
@@ -349,15 +384,16 @@ class SpaceCompatibilityModel:
349
  if not self.molecule_transformer_model_name or self._molecule_transformer_model is not None:
350
  return
351
  dtype = torch.float16 if self._molecule_transformer_device.type == "cuda" else torch.float32
352
- self._molecule_transformer_tokenizer = AutoTokenizer.from_pretrained(
353
- self.molecule_transformer_model_name,
354
- trust_remote_code=True,
355
- )
356
- self._molecule_transformer_model = AutoModel.from_pretrained(
357
- self.molecule_transformer_model_name,
358
- trust_remote_code=True,
359
- torch_dtype=dtype,
360
- ).to(self._molecule_transformer_device)
 
361
  self._molecule_transformer_model.eval()
362
 
363
  def _encode_molecule_transformer_batch(self, smiles_values: list[str]) -> np.ndarray | None:
@@ -413,11 +449,12 @@ class SpaceCompatibilityModel:
413
 
414
  def _load_sentence_transformer(model_name: str) -> SentenceTransformer:
415
  dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
416
- encoder = SentenceTransformer(
417
- model_name,
418
- trust_remote_code=True,
419
- model_kwargs={"torch_dtype": dtype},
420
- )
 
421
  if getattr(encoder, "tokenizer", None) is not None:
422
  encoder.tokenizer.padding_side = "left"
423
  return encoder
@@ -489,11 +526,12 @@ def load_compatibility_model(model_dir: str | Path) -> SpaceCompatibilityModel:
489
 
490
  @lru_cache(maxsize=1)
491
  def load_compatibility_model_from_hub(model_repo_id: str) -> SpaceCompatibilityModel:
492
- model_dir = snapshot_download(
493
- repo_id=model_repo_id,
494
- repo_type="model",
495
- allow_patterns=["best_model.pt", "training_metadata.json", "README.md"],
496
- )
 
497
  return load_compatibility_model(model_dir)
498
 
499
 
 
1
  from __future__ import annotations
2
 
3
+ import contextlib
4
  import hashlib
5
+ import io
6
  import json
7
+ import os
8
  import re
9
  from dataclasses import dataclass
10
  from functools import lru_cache
 
15
  import torch
16
  import torch.nn.functional as F
17
  from huggingface_hub import snapshot_download
18
+ from huggingface_hub.utils import disable_progress_bars
19
  from rdkit import Chem, DataStructs, RDLogger
20
  from rdkit.Chem import AllChem, Crippen, Descriptors, Lipinski, MACCSkeys, rdMolDescriptors
21
  from rdkit.Chem.MolStandardize import rdMolStandardize
22
  from sentence_transformers import SentenceTransformer
23
  from torch import nn
24
  from transformers import AutoModel, AutoTokenizer
25
+ from transformers.utils import logging as transformers_logging
26
+
27
+ os.environ.setdefault("HF_HUB_DISABLE_PROGRESS_BARS", "1")
28
+ os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
29
+ disable_progress_bars()
30
+ transformers_logging.set_verbosity_error()
31
 
32
  RDLogger.DisableLog("rdApp.*")
33
 
 
100
  return hashlib.sha256(smiles.encode("utf-8")).hexdigest()
101
 
102
 
103
+ @contextlib.contextmanager
104
+ def _silent_imports():
105
+ buffer = io.StringIO()
106
+ with contextlib.redirect_stdout(buffer), contextlib.redirect_stderr(buffer):
107
+ yield
108
+
109
+
110
  @lru_cache(maxsize=1_000_000)
111
  def _standardize_smiles_v2_cached(smiles: str) -> str | None:
112
  mol = Chem.MolFromSmiles(smiles)
 
268
  return np.array([values[name] for name in names], dtype=np.float32)
269
 
270
 
271
+ def molecule_ui_metrics(smiles: str) -> dict[str, float | int]:
272
+ canonical = standardize_smiles_v2(smiles) or smiles
273
+ mol = Chem.MolFromSmiles(canonical)
274
+ if mol is None:
275
+ return {
276
+ "mol_wt": 0.0,
277
+ "logp": 0.0,
278
+ "tpsa": 0.0,
279
+ "heavy_atoms": 0,
280
+ }
281
+ return {
282
+ "mol_wt": float(Descriptors.MolWt(mol)),
283
+ "logp": float(Crippen.MolLogP(mol)),
284
+ "tpsa": float(rdMolDescriptors.CalcTPSA(mol)),
285
+ "heavy_atoms": int(mol.GetNumHeavyAtoms()),
286
+ }
287
+
288
+
289
  class CompatibilityHead(nn.Module):
290
  def __init__(self, *, assay_dim: int, molecule_dim: int, projection_dim: int, hidden_dim: int, dropout: float) -> None:
291
  super().__init__()
 
384
  if not self.molecule_transformer_model_name or self._molecule_transformer_model is not None:
385
  return
386
  dtype = torch.float16 if self._molecule_transformer_device.type == "cuda" else torch.float32
387
+ with _silent_imports():
388
+ self._molecule_transformer_tokenizer = AutoTokenizer.from_pretrained(
389
+ self.molecule_transformer_model_name,
390
+ trust_remote_code=True,
391
+ )
392
+ self._molecule_transformer_model = AutoModel.from_pretrained(
393
+ self.molecule_transformer_model_name,
394
+ trust_remote_code=True,
395
+ torch_dtype=dtype,
396
+ ).to(self._molecule_transformer_device)
397
  self._molecule_transformer_model.eval()
398
 
399
  def _encode_molecule_transformer_batch(self, smiles_values: list[str]) -> np.ndarray | None:
 
449
 
450
  def _load_sentence_transformer(model_name: str) -> SentenceTransformer:
451
  dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
452
+ with _silent_imports():
453
+ encoder = SentenceTransformer(
454
+ model_name,
455
+ trust_remote_code=True,
456
+ model_kwargs={"torch_dtype": dtype},
457
+ )
458
  if getattr(encoder, "tokenizer", None) is not None:
459
  encoder.tokenizer.padding_side = "left"
460
  return encoder
 
526
 
527
  @lru_cache(maxsize=1)
528
  def load_compatibility_model_from_hub(model_repo_id: str) -> SpaceCompatibilityModel:
529
+ with _silent_imports():
530
+ model_dir = snapshot_download(
531
+ repo_id=model_repo_id,
532
+ repo_type="model",
533
+ allow_patterns=["best_model.pt", "training_metadata.json", "README.md"],
534
+ )
535
  return load_compatibility_model(model_dir)
536
 
537