File size: 3,708 Bytes
6a5be1f 7d06ac6 6a5be1f 7d06ac6 6a5be1f 0569e97 6a5be1f 7d06ac6 6a5be1f 7d06ac6 8dc2602 7d06ac6 6a5be1f 8dc2602 6a5be1f 8dc2602 6a5be1f 44fcada 6a5be1f 7d06ac6 6a5be1f 8dc2602 60a4581 8dc2602 60a4581 8dc2602 60a4581 8dc2602 60a4581 7d06ac6 60a4581 6a5be1f 7d06ac6 8dc2602 6a5be1f 44fcada 6a5be1f 0569e97 60a4581 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | ---
library_name: pytorch
license: other
tags:
- glycans
- proteins
- protein-glycan
- affinose
- bertose
- esm-c
- pytorch
---
# AFFINose Interaction Model
This repository contains the AFFINose checkpoint for protein-glycan interaction inference. AFFINose combines BERTose glycan token representations with per-residue ESM-C protein embeddings and returns a scalar interaction score.
## Quick Start
The recommended user path is the companion notebook. For direct Python use, download the checkpoint and vocabulary with `huggingface_hub`:
```python
from huggingface_hub import hf_hub_download
checkpoint = hf_hub_download(
repo_id="supanthadey1/affinose-interaction-model",
filename="checkpoints/affinose_interaction_model.pt",
)
vocab = hf_hub_download(
repo_id="supanthadey1/affinose-interaction-model",
filename="vocab/bpe_vocabulary.json",
)
```
No Hugging Face token is required for this AFFINose checkpoint now that the repository is public. ESM-C is separate and may require the user's own Hugging Face login depending on EvolutionaryScale access requirements.
## Files
- `checkpoints/affinose_interaction_model.pt` - AFFINose interaction checkpoint.
- `vocab/bpe_vocabulary.json` - WURCS BPE vocabulary for glycan tokenization.
- `src/affinose_model.py` - AFFINose architecture.
- `src/affinose_inference.py` - standalone inference helper.
- `src/affinose_dataset.py` - tokenizer and data utility helpers.
- `src/bertose_model.py` - BERTose model definition used for glycan encoding.
- `src/bertose_layers.py` - Transformer layers used by BERTose.
- `src/wurcs_bpe_tokenizer.py` - WURCS BPE tokenizer.
## Input
Provide one protein-glycan pair or a CSV batch. Glycans should be WURCS strings. Proteins can be provided as IDs linked to precomputed embeddings, or through the companion notebook as raw sequences that are embedded with ESM-C 300M.
Batch CSVs use `sample_id,protein_id,protein_sequence,glycan_wurcs`. Free-text glycan names, common names, SNFG drawings, and IUPAC-condensed strings are not parsed directly by AFFINose. Convert those inputs to WURCS first, then score the protein-glycan pair.
## Protein Embedding Requirement
AFFINose expects per-residue ESM-C 300M embeddings with shape `[L, 960]`. Do not mean-pool the protein before passing it into AFFINose.
ESM-C is a separate EvolutionaryScale protein model. The ESM-C weights are not included in this repository. Users should install the `esm` package and let it download ESM-C 300M into their own runtime cache.
```python
from esm.models.esmc import ESMC
from esm.sdk.api import ESMProtein, LogitsConfig
esmc = ESMC.from_pretrained("esmc_300m").to("cuda") # or "cpu"
protein = ESMProtein(sequence="MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQ")
protein_tensor = esmc.encode(protein)
output = esmc.logits(
protein_tensor,
LogitsConfig(sequence=True, return_embeddings=True),
)
protein_embeddings = output.embeddings # per-residue ESM-C 300M embeddings
```
If Hugging Face requests authentication for ESM-C, users should authenticate with their own Hugging Face account/token and accept any required EvolutionaryScale terms. BERTose/AFFINose tokens are not required once these repositories are public.
## Output
A scalar protein-glycan interaction score from the trained AFFINose head.
## Scope
This repository does not perform IUPAC-condensed/name-to-WURCS conversion. For now, provide WURCS directly.
License metadata is currently `other`; update it when the final release license and citation text are chosen.
## References
- EvolutionaryScale ESM package: https://github.com/evolutionaryscale/esm
- ESM-C 300M Hugging Face model: https://huggingface.co/EvolutionaryScale/esmc-300m-2024-12
|