File size: 3,062 Bytes
6fb0bbf
 
57f1553
f1158c7
 
6fb0bbf
3036219
65dd8a8
6fb0bbf
 
57f1553
f1158c7
6fb0bbf
 
57f1553
 
f1158c7
57f1553
 
 
 
 
 
 
 
 
 
 
 
f1158c7
57f1553
 
 
 
 
 
 
f1158c7
 
f230c49
f1158c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57f1553
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1158c7
 
 
 
0afd4cc
f1158c7
 
 
57f1553
 
 
 
 
 
 
 
f1158c7
57f1553
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
title: BioAssayAlign Compatibility Explorer
emoji: 🧪
colorFrom: green
colorTo: red
sdk: gradio
sdk_version: 6.9.0
python_version: "3.10"
app_file: app.py
pinned: false
license: mit
short_description: Rank a candidate molecule list against a bioassay.
---

# BioAssayAlign Compatibility Explorer

BioAssayAlign is an **assay-conditioned molecule ranking** tool.

You provide:
- a bioassay description and optional metadata
- a list of candidate SMILES

The model returns:
- a ranked list of molecules
- a compatibility score for each one
- explicit flags for invalid SMILES

## What It Is

This is not a chatbot. It is not a potency predictor.

It is a **ranking model** trained on a frozen public bioassay dataset built from PubChem BioAssay and ChEMBL. It is designed to answer:

> “Given this assay, which molecules should I look at first?”

## What The Score Means

- The app shows a **priority band** and a **list-relative score** first.
- Those values explain the ranking better than the raw model score.
- The raw score is **not** a probability. It is an uncalibrated ranking value from the scorer head.
- The strongest molecule in your submitted list will be near the top of the `0–100` relative scale.

## How To Use It

1. Enter the assay title and description in plain scientific language.
2. Add metadata if you know it:
   - organism
   - readout
   - assay format
   - assay type
   - target UniProt ID
3. Paste one SMILES per line or upload a CSV with a `smiles` column.
4. Run ranking.
5. Read the output in this order:
   - `priority`
   - `relative score`
   - chemistry context columns (`MolWt`, `logP`, `TPSA`)
   - raw model score only if needed

## Recommended Input Style

The model is most reliable when assay information is provided as structured fields:
- title
- description
- organism
- readout
- assay format
- assay type
- target UniProt IDs

You can paste SMILES directly or upload a CSV with a `smiles` or `canonical_smiles` column.

## Good Uses

- ranking a screening shortlist for a new assay concept
- triaging compounds before a more expensive downstream model or wet-lab step
- testing how sensitive rankings are to assay wording and metadata

## Example Assays Included In The UI

- JAK2 cell assay
- ALDH1A1 fluorescence assay
- BTK binding quick check

These examples call the live model. They are not screenshots or mocked outputs.

## Limits

- This is a public-data model, not a medicinal chemistry oracle.
- It does not predict IC50 directly.
- It is strongest as a **relative ranking tool** over a candidate list you already care about.

## Runtime Notes

- The first request can be slower because the Space warms the model in the background.
- Large candidate lists increase runtime. For interactive use, start with a few hundred molecules.

## Model

The Space reads the model repo from the `MODEL_REPO_ID` environment variable.

Default:
- `lighteternal/BioAssayAlign-Qwen3-Embedding-0.6B-Compatibility`

If the champion changes later, the Space can point to a new model repo without changing the UI.