super-dainiu commited on
Commit
d294de1
·
verified ·
1 Parent(s): dd7da89

Add LogiCA pretrained backbones (8M/35M/150M/650M) + model card

Browse files
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - protein-ligand
6
+ - drug-target-interaction
7
+ - contrastive-learning
8
+ - esm2
9
+ - selformer
10
+ - logica
11
+ ---
12
+
13
+ # LogiCA — pretrained protein–ligand backbones
14
+
15
+ Pretrained checkpoints for **LogiCA** (*Contextualizing Biological Language Models
16
+ across Modalities via Logit-Space Contrastive Alignment*). Each checkpoint is a
17
+ bidirectional ESM-2 (protein) + SELFormer (ligand) model with a cross-attention
18
+ adapter, pretrained on BindingDB with the logit-space contrastive objective
19
+ (`--mode logicl`, 40-epoch `logicl40_clean` track).
20
+
21
+ Code: **https://github.com/Yale-CompBio/logica**
22
+
23
+ | Size | Protein backbone | File | Weights | Epoch | Val loss |
24
+ | --- | --- | --- | --- | --- | --- |
25
+ | 8M | `esm2_t6_8M_UR50D` | `checkpoints/8m/best.pt` | 19 MB | 100 | 1.8036 |
26
+ | 35M | `esm2_t12_35M_UR50D` | `checkpoints/35m/best.pt` | 38 MB | 99 | 1.6101 |
27
+ | 150M | `esm2_t30_150M_UR50D` | `checkpoints/150m/best.pt` | 64 MB | 98 | 1.3963 |
28
+ | 650M | `esm2_t33_650M_UR50D` | `checkpoints/650m/best.pt` | 97 MB | 92 | 1.0994 |
29
+
30
+ Ligand encoder: `HUBioDataLab/SELFormer` for all sizes.
31
+
32
+ ## Contents
33
+
34
+ Checkpoints are **weights-only** (`trainable_only=True`): a dict with
35
+ `model_state_dict` (+ `epoch`, `metrics`, `rng_state`). They load into the
36
+ `BiDirectionalDrugProteinModel` with `strict=False`.
37
+
38
+ ## Usage
39
+
40
+ ```python
41
+ from huggingface_hub import hf_hub_download
42
+ ckpt = hf_hub_download("Yale-CompBio/logica", "checkpoints/150m/best.pt")
43
+ ```
44
+
45
+ Then fine-tune with the released code (the entry points take `--pretrained_checkpoint`):
46
+
47
+ ```bash
48
+ python dti.py --method logica --dataset DAVIS \
49
+ --esm_model esm2_t30_150M_UR50D --pretrained_checkpoint $ckpt --output_dir runs/dti_davis
50
+ python variant.py --objective pairwise --held_out EGFR --split_strategy lopo \
51
+ --esm_model esm2_t30_150M_UR50D --pretrained_checkpoint $ckpt \
52
+ --data_csv splits/data.csv --fasta_path data/proteins.fasta --drugs_selfies data/drugs.selfies \
53
+ --use_lora --output_dir runs/variant_egfr
54
+ ```
55
+
56
+ ## Citation
57
+
58
+ ```
59
+ @inproceedings{logica2026,
60
+ title = {Contextualizing Biological Language Models across Modalities via Logit-Space Contrastive Alignment},
61
+ author = {anonymous},
62
+ booktitle = {NeurIPS},
63
+ year = {2026}
64
+ }
65
+ ```
checkpoints/150m/best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6f256484aeb614c1fd0ad1ede84cf05c139ec7d2e8d5a66fc1ece6f4db2e98d
3
+ size 63838684
checkpoints/35m/best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a09c115bd35f266817e3e34844b6ca03a8ca36830d705304a5845996a486dfe
3
+ size 37724235
checkpoints/650m/best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e509b6df98da09a28645e94e4b51fb1384a532d8768ea657640ea5be840b28fe
3
+ size 97234134
checkpoints/8m/best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e20c9c907a55e0c0c2846895c1c15c69f33f99fd0de8047f7a8359a96bfb434c
3
+ size 18783965