lighteternal
/

BioAssayAlign-Qwen3-Embedding-0.6B-Compatibility

@@ -119,10 +119,10 @@ Assay:
 - target UniProt: `O60674`
 Candidate list ranked by the model:
-1. `CC(=O)Nc1ncc(C#N)c(Nc2ccc(F)c(Cl)c2)n1` → `-16.94`
-2. `c1ccccc1` → `-31.24`
-3. `CCO` → `-41.44`
-4. `CCOc1ccc2nc(N3CCN(C)CC3)n(C)c(=O)c2c1` → `-45.44`
 ### Example 2: ALDH1A1 fluorescence assay
@@ -136,10 +136,10 @@ Assay:
 - target UniProt: `P00352`
 Candidate list ranked by the model:
-1. `CCOc1ccccc1` → `-30.87`
-2. `CCN(CC)CCOc1ccccc1` → `-34.09`
-3. `Cc1cc(=O)n(C)c(=O)[nH]1` → `-34.33`
-4. `CCO` → `-37.07`
 The raw values above are model scores. In practice, read them as list-relative ranking values, not calibrated probabilities.
@@ -153,8 +153,8 @@ You can think of it as a **logit-like utility value**:
 - absolute values across unrelated lists are not directly comparable
 Example:
-- a top candidate with score `-4.7`
-- another candidate with score `-20.0`
 does **not** mean the first compound has negative biological value. It only means the first item scored much better than the second one for that submitted assay-and-list context.
@@ -167,7 +167,7 @@ Softmax example for one list:
 ```python
 from bioassayalign_compatibility import list_softmax_scores
-scores = [-4.6947, -15.0503, -20.0474]
 relative_probs = list_softmax_scores(scores)
 print(relative_probs)
 ```
@@ -295,14 +295,13 @@ The score is:
 - Public assay data contains label noise and heterogeneous assay protocols.
 - Some assays remain difficult and produce only moderate ranking quality.
-## Provenance
-Project code:
-- `https://github.com/lighteternal/bioassayalign-private`
 Model files in this repo:
 - `best_model.pt`
 - `training_metadata.json`
 - `training_summary.json`
-If the public assay-ranking model is updated later, this repo keeps the same public identity while the internal experiment lineage can continue separately.

 - target UniProt: `O60674`
 Candidate list ranked by the model:
+1. `CC(=O)Nc1ncc(C#N)c(Nc2ccc(F)c(Cl)c2)n1` → `-8.87`
+2. `c1ccccc1` → `-13.53`
+3. `CCO` → `-21.92`
+4. `CCOc1ccc2nc(N3CCN(C)CC3)n(C)c(=O)c2c1` → `-27.76`
 ### Example 2: ALDH1A1 fluorescence assay
 - target UniProt: `P00352`
 Candidate list ranked by the model:
+1. `CCOc1ccccc1` → `-26.93`
+2. `Cc1cc(=O)n(C)c(=O)[nH]1` → `-38.51`
+3. `CCN(CC)CCOc1ccccc1` → `-39.18`
+4. `CCO` → `-42.90`
 The raw values above are model scores. In practice, read them as list-relative ranking values, not calibrated probabilities.
 - absolute values across unrelated lists are not directly comparable
 Example:
+- a top candidate with score `-8.9`
+- another candidate with score `-21.9`
 does **not** mean the first compound has negative biological value. It only means the first item scored much better than the second one for that submitted assay-and-list context.
 ```python
 from bioassayalign_compatibility import list_softmax_scores
+scores = [-8.8686, -13.5325, -21.9168]
 relative_probs = list_softmax_scores(scores)
 print(relative_probs)
 ```
 - Public assay data contains label noise and heterogeneous assay protocols.
 - Some assays remain difficult and produce only moderate ranking quality.
+## Files In This Repo
 Model files in this repo:
 - `best_model.pt`
 - `training_metadata.json`
 - `training_summary.json`
+- `bioassayalign_compatibility.py`
+- `requirements.txt`
+You can load and run the published model directly from this repo without cloning any separate project codebase.