| --- |
| license: mit |
| library_name: pytorch |
| tags: |
| - protein |
| - structure |
| - encoder |
| - pretrained |
| --- |
| |
| # triprorep-3B |
|
|
| Structure-aware protein encoder, 3B parameters. ELECTRA-style corrective |
| MLM pre-training on 83.6M ATLAS + PDB structures. The encoder reads three |
| per-residue token streams (seq / bb / fa) and outputs a per-residue embedding |
| of dimension `2560` (fp16). |
|
|
| Architecture: `embed_dim=2560`, `encoder_depth=33`, `encoder_heads=40`. |
|
|
| Part of the [k-fold-structure release](https://github.com/<github-org>/k-fold-structure-release). |
|
|
| ## Files |
|
|
| - `3B.ckpt`: full Lightning checkpoint. |
| - `3B_encoder.pt`: encoder-only state dict, loads `strict=True` for inference. |
| - `config.yaml`: model + data config. The path fields are placeholders, point them at your local data. |
| - `backbone_tokenizer.pt`, `fullatom_tokenizer.pt`: structure tokenizers (PDB → token IDs). See Acknowledgements. |
|
|
| ## Usage |
|
|
| ```bash |
| pip install torch huggingface_hub omegaconf numpy lmdb biotite |
| git clone https://github.com/<github-org>/k-fold-structure-release.git |
| cd k-fold-structure-release |
| ``` |
|
|
| ```python |
| import sys; sys.path.insert(0, "code/triprorep") |
| from inference import load_encoder, embed_pdb |
| |
| encoder = load_encoder("3B", hf_repo="k-fold-structure/triprorep-3B") |
| features = embed_pdb(encoder, "your_protein.pdb", |
| hf_repo="k-fold-structure/triprorep-3B") |
| print(features.shape) # (L, 2560) fp16 |
| ``` |
|
|
| `embed_pdb` downloads the bundled tokenizers from this repo on first call, |
| then runs PDB → (seq, bb, fa) tokens → encoder. If you already have token |
| IDs (e.g. from `k-fold-structure/repsp-triprorep-tokens`), call |
| `encode(encoder, seq, bb, fa)` directly. For CPU, pass `device="cpu"` to |
| `load_encoder`. |
|
|
| ## Acknowledgements |
|
|
| `backbone_tokenizer.pt` (aminoaseed VQ-VAE) is from |
| [StructTokenBench](https://github.com/KatarinaYuan/StructTokenBench). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{kfoldstructure, |
| title = {K-Fold Structure: Structure-Aware Protein Encoders and a Per-Residue Representation Benchmark}, |
| author = {<authors>}, |
| year = {2026}, |
| url = {https://huggingface.co/k-fold-structure} |
| } |
| ``` |
|
|
| ## License |
|
|
| MIT |
|
|