| --- |
| license: apache-2.0 |
| library_name: pytorch |
| tags: |
| - antibody |
| - developability |
| - protein-language-model |
| - regression |
| datasets: |
| - ginkgo-datapoints/GDPa1 |
| --- |
| |
| # CrossAbSense — antibody developability oracles (v0.9) |
|
|
| Property-specific neural oracles that predict five biophysical developability assays |
| for therapeutic IgGs from paired VH/VL sequences, combining frozen protein-language-model |
| encoders (ESM-Cambrian, ProtT5) with configurable attention decoders. |
|
|
| Code: https://github.com/SimonCrouzet/CrossAbSense |
| Dataset: [GDPa1](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1) (242 IgGs, Ginkgo Bioworks) |
|
|
| Each property folder (`<PROPERTY>_<config-checksum>/`) contains: |
| `final.ckpt` (model trained on all data — used by `predict.py`), `fold0-4.ckpt` |
| (5-fold CV checkpoints), `config.yaml`, and `property.txt`. |
|
|
| ## Performance (5-fold cluster-stratified CV, Spearman ρ) |
|
|
| | Property | This release (v0.9) | Paper (Table 1) | |
| |----------|--------------------:|----------------:| |
| | HIC (hydrophobicity) | 0.685 | 0.644 | |
| | Titer (expression) | 0.425 | 0.428 | |
| | PR_CHO (polyreactivity) | 0.461 | 0.475 | |
| | AC-SINS (self-association)| 0.420 | 0.475 | |
| | Tm2 (thermostability) | 0.442 | 0.387 | |
| |
| ## ⚠️ Important caveat (v0.9) |
| |
| These weights were trained from the published configs but in an environment **without |
| BioPhi (OASis humanness) and ScaLoP** available. Those two antibody-feature sources were |
| substituted with sentinel values during training, so the feature inputs differ slightly |
| from the paper runs. This mainly affects **AC-SINS** (~0.05 below paper); the other four |
| properties match or exceed Table 1. A future **v1.0** will retrain the feature-using |
| properties with BioPhi/ScaLoP restored. Pin `revision="v0.9"` if you need exactly these weights. |
| |
| ## Usage |
| |
| ```bash |
| pip install huggingface_hub |
| python scripts/download_models.py --revision v0.9 # final.ckpt only (add --folds for CV) |
| python src/predict.py --input inputs/my_seqs.csv --model models/HIC_3595cc57 --output preds.csv |
| ``` |
| |
| By default only `final.ckpt` (+ small metadata) is downloaded; the 5 CV fold |
| checkpoints are fetched only when you ask for them (`--folds`, or `predict.py |
| --use-cv`/`--fold`). |
| |
| Or let `predict.py` fetch on demand: |
| |
| ```bash |
| python src/predict.py --input inputs/my_seqs.csv --model HIC_3595cc57 --from-hf --output preds.csv |
| ``` |
| |
| ## License |
| |
| Apache-2.0, matching the CrossAbSense repository. |
| |