| | --- |
| | title: BioAssayAlign Compatibility Explorer |
| | emoji: 🧪 |
| | colorFrom: green |
| | colorTo: red |
| | sdk: gradio |
| | sdk_version: 6.9.0 |
| | python_version: "3.10" |
| | app_file: app.py |
| | pinned: false |
| | license: mit |
| | short_description: Rank a candidate molecule list against a bioassay. |
| | --- |
| | |
| | # BioAssayAlign Compatibility Explorer |
| |
|
| | BioAssayAlign is an **assay-conditioned molecule ranking** tool. |
| |
|
| | You provide: |
| | - a bioassay description and optional metadata |
| | - a list of candidate SMILES |
| |
|
| | The model returns: |
| | - a ranked list of molecules |
| | - a compatibility score for each one |
| | - explicit flags for invalid SMILES |
| |
|
| | ## What It Is |
| |
|
| | This is not a chatbot. It is not a potency predictor. |
| |
|
| | It is a **ranking model** trained on a frozen public bioassay dataset built from PubChem BioAssay and ChEMBL. It is designed to answer: |
| |
|
| | > “Given this assay, which molecules should I look at first?” |
| |
|
| | ## What The Score Means |
| |
|
| | - The app shows a **priority band** and a **list-relative score** first. |
| | - Those values explain the ranking better than the raw model score. |
| | - The raw score is **not** a probability. It is an uncalibrated ranking value from the scorer head. |
| | - The strongest molecule in your submitted list will be near the top of the `0–100` relative scale. |
| |
|
| | ## How To Use It |
| |
|
| | 1. Enter the assay title and description in plain scientific language. |
| | 2. Add metadata if you know it: |
| | - organism |
| | - readout |
| | - assay format |
| | - assay type |
| | - target UniProt ID |
| | 3. Paste one SMILES per line or upload a CSV with a `smiles` column. |
| | 4. Run ranking. |
| | 5. Read the output in this order: |
| | - `priority` |
| | - `relative score` |
| | - chemistry context columns (`MolWt`, `logP`, `TPSA`) |
| | - raw model score only if needed |
| |
|
| | ## Recommended Input Style |
| |
|
| | The model is most reliable when assay information is provided as structured fields: |
| | - title |
| | - description |
| | - organism |
| | - readout |
| | - assay format |
| | - assay type |
| | - target UniProt IDs |
| |
|
| | You can paste SMILES directly or upload a CSV with a `smiles` or `canonical_smiles` column. |
| |
|
| | ## Good Uses |
| |
|
| | - ranking a screening shortlist for a new assay concept |
| | - triaging compounds before a more expensive downstream model or wet-lab step |
| | - testing how sensitive rankings are to assay wording and metadata |
| |
|
| | ## Example Assays Included In The UI |
| |
|
| | - JAK2 cell assay |
| | - ALDH1A1 fluorescence assay |
| | - BTK binding quick check |
| |
|
| | These examples call the live model. They are not screenshots or mocked outputs. |
| |
|
| | ## Limits |
| |
|
| | - This is a public-data model, not a medicinal chemistry oracle. |
| | - It does not predict IC50 directly. |
| | - It is strongest as a **relative ranking tool** over a candidate list you already care about. |
| |
|
| | ## Runtime Notes |
| |
|
| | - The first request can be slower because the Space warms the model in the background. |
| | - Large candidate lists increase runtime. For interactive use, start with a few hundred molecules. |
| |
|
| | ## Model |
| |
|
| | The Space reads the model repo from the `MODEL_REPO_ID` environment variable. |
| |
|
| | Default: |
| | - `lighteternal/BioAssayAlign-Qwen3-Embedding-0.6B-Compatibility` |
| |
|
| | If the champion changes later, the Space can point to a new model repo without changing the UI. |
| |
|