---
license: mit
library_name: pytorch
tags:
  - protein
  - structure
  - encoder
  - pretrained
---

# triprorep-3B

Structure-aware protein encoder, 3B parameters. ELECTRA-style corrective
MLM pre-training on 83.6M ATLAS + PDB structures. The encoder reads three
per-residue token streams (seq / bb / fa) and outputs a per-residue embedding
of dimension `2560` (fp16).

Architecture: `embed_dim=2560`, `encoder_depth=33`, `encoder_heads=40`.

Part of the [k-fold-structure release](https://github.com/<github-org>/k-fold-structure-release).

## Files

- `3B.ckpt`: full Lightning checkpoint.
- `3B_encoder.pt`: encoder-only state dict, loads `strict=True` for inference.
- `config.yaml`: model + data config. The path fields are placeholders, point them at your local data.
- `backbone_tokenizer.pt`, `fullatom_tokenizer.pt`: structure tokenizers (PDB → token IDs). See Acknowledgements.

## Usage

```bash
pip install torch huggingface_hub omegaconf numpy lmdb biotite
git clone https://github.com/<github-org>/k-fold-structure-release.git
cd k-fold-structure-release
```

```python
import sys; sys.path.insert(0, "code/triprorep")
from inference import load_encoder, embed_pdb

encoder  = load_encoder("3B", hf_repo="k-fold-structure/triprorep-3B")
features = embed_pdb(encoder, "your_protein.pdb",
                     hf_repo="k-fold-structure/triprorep-3B")
print(features.shape)   # (L, 2560) fp16
```

`embed_pdb` downloads the bundled tokenizers from this repo on first call,
then runs PDB → (seq, bb, fa) tokens → encoder. If you already have token
IDs (e.g. from `k-fold-structure/repsp-triprorep-tokens`), call
`encode(encoder, seq, bb, fa)` directly. For CPU, pass `device="cpu"` to
`load_encoder`.

## Acknowledgements

`backbone_tokenizer.pt` (aminoaseed VQ-VAE) is from
[StructTokenBench](https://github.com/KatarinaYuan/StructTokenBench).

## Citation

```bibtex
@misc{kfoldstructure,
  title  = {K-Fold Structure: Structure-Aware Protein Encoders and a Per-Residue Representation Benchmark},
  author = {<authors>},
  year   = {2026},
  url    = {https://huggingface.co/k-fold-structure}
}
```

## License

MIT