SimonCrouzet
/

CrossAbSense

protein-language-model

Model card Files Files and versions

CrossAbSense / README.md

SimonCrouzet's picture

Add/update model card

5746b24 verified 18 days ago

|

History Blame Contribute Delete

2.44 kB

	---
	license: apache-2.0
	library_name: pytorch
	tags:
	- antibody
	- developability
	- protein-language-model
	- regression
	datasets:
	- ginkgo-datapoints/GDPa1
	---

	# CrossAbSense — antibody developability oracles (v0.9)

	Property-specific neural oracles that predict five biophysical developability assays
	for therapeutic IgGs from paired VH/VL sequences, combining frozen protein-language-model
	encoders (ESM-Cambrian, ProtT5) with configurable attention decoders.

	Code: https://github.com/SimonCrouzet/CrossAbSense
	Dataset: [GDPa1](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1) (242 IgGs, Ginkgo Bioworks)

	Each property folder (`<PROPERTY>_<config-checksum>/`) contains:
	`final.ckpt` (model trained on all data — used by `predict.py`), `fold0-4.ckpt`
	(5-fold CV checkpoints), `config.yaml`, and `property.txt`.

	## Performance (5-fold cluster-stratified CV, Spearman ρ)

	\| Property \| This release (v0.9) \| Paper (Table 1) \|
	\|----------\|--------------------:\|----------------:\|
	\| HIC (hydrophobicity) \| 0.685 \| 0.644 \|
	\| Titer (expression) \| 0.425 \| 0.428 \|
	\| PR_CHO (polyreactivity) \| 0.461 \| 0.475 \|
	\| AC-SINS (self-association)\| 0.420 \| 0.475 \|
	\| Tm2 (thermostability) \| 0.442 \| 0.387 \|

	## ⚠️ Important caveat (v0.9)

	These weights were trained from the published configs but in an environment **without
	BioPhi (OASis humanness) and ScaLoP** available. Those two antibody-feature sources were
	substituted with sentinel values during training, so the feature inputs differ slightly
	from the paper runs. This mainly affects AC-SINS (~0.05 below paper); the other four
	properties match or exceed Table 1. A future v1.0 will retrain the feature-using
	properties with BioPhi/ScaLoP restored. Pin `revision="v0.9"` if you need exactly these weights.

	## Usage

	```bash
	pip install huggingface_hub
	python scripts/download_models.py --revision v0.9 # final.ckpt only (add --folds for CV)
	python src/predict.py --input inputs/my_seqs.csv --model models/HIC_3595cc57 --output preds.csv
	```

	By default only `final.ckpt` (+ small metadata) is downloaded; the 5 CV fold
	checkpoints are fetched only when you ask for them (`--folds`, or `predict.py
	--use-cv`/`--fold`).

	Or let `predict.py` fetch on demand:

	```bash
	python src/predict.py --input inputs/my_seqs.csv --model HIC_3595cc57 --from-hf --output preds.csv
	```

	## License

	Apache-2.0, matching the CrossAbSense repository.