triprorep-3B / README.md
hyosoon0's picture
Upload README.md with huggingface_hub
4ba7f83 verified
|
Raw
History Blame Contribute Delete
2.14 kB
---
license: mit
library_name: pytorch
tags:
- protein
- structure
- encoder
- pretrained
---
# triprorep-3B
Structure-aware protein encoder, 3B parameters. ELECTRA-style corrective
MLM pre-training on 83.6M ATLAS + PDB structures. The encoder reads three
per-residue token streams (seq / bb / fa) and outputs a per-residue embedding
of dimension `2560` (fp16).
Architecture: `embed_dim=2560`, `encoder_depth=33`, `encoder_heads=40`.
Part of the [k-fold-structure release](https://github.com/<github-org>/k-fold-structure-release).
## Files
- `3B.ckpt`: full Lightning checkpoint.
- `3B_encoder.pt`: encoder-only state dict, loads `strict=True` for inference.
- `config.yaml`: model + data config. The path fields are placeholders, point them at your local data.
- `backbone_tokenizer.pt`, `fullatom_tokenizer.pt`: structure tokenizers (PDB → token IDs). See Acknowledgements.
## Usage
```bash
pip install torch huggingface_hub omegaconf numpy lmdb biotite
git clone https://github.com/<github-org>/k-fold-structure-release.git
cd k-fold-structure-release
```
```python
import sys; sys.path.insert(0, "code/triprorep")
from inference import load_encoder, embed_pdb
encoder = load_encoder("3B", hf_repo="k-fold-structure/triprorep-3B")
features = embed_pdb(encoder, "your_protein.pdb",
hf_repo="k-fold-structure/triprorep-3B")
print(features.shape) # (L, 2560) fp16
```
`embed_pdb` downloads the bundled tokenizers from this repo on first call,
then runs PDB → (seq, bb, fa) tokens → encoder. If you already have token
IDs (e.g. from `k-fold-structure/repsp-triprorep-tokens`), call
`encode(encoder, seq, bb, fa)` directly. For CPU, pass `device="cpu"` to
`load_encoder`.
## Acknowledgements
`backbone_tokenizer.pt` (aminoaseed VQ-VAE) is from
[StructTokenBench](https://github.com/KatarinaYuan/StructTokenBench).
## Citation
```bibtex
@misc{kfoldstructure,
title = {K-Fold Structure: Structure-Aware Protein Encoders and a Per-Residue Representation Benchmark},
author = {<authors>},
year = {2026},
url = {https://huggingface.co/k-fold-structure}
}
```
## License
MIT