File size: 4,975 Bytes

6f18f43

---
license: mit
tags:
- protein-design
- protein-mpnn
- jax
- equinox
- biology
- structure-based-design
library_name: equinox
---

# PrxteinMPNN

A JAX/Equinox implementation of ProteinMPNN for inverse protein folding and sequence design.

## Model Description

PrxteinMPNN is a message-passing neural network that generates amino acid sequences given a protein backbone structure. This implementation uses JAX and Equinox for efficient computation and functional programming patterns.

**Key Features:**
- Fully modular Equinox implementation
- JAX-based for GPU acceleration and automatic differentiation
- Multiple pre-trained model variants (original and soluble)
- Multiple training epochs (002, 010, 020, 030)

## Available Models

All models use the same architecture with different training:

### Original Models
- `original_v_48_002` - Trained for 2 epochs
- `original_v_48_010` - Trained for 10 epochs  
- `original_v_48_020` - Trained for 20 epochs (recommended)
- `original_v_48_030` - Trained for 30 epochs

### Soluble Models
- `soluble_v_48_002` - Trained for 2 epochs on soluble proteins
- `soluble_v_48_010` - Trained for 10 epochs on soluble proteins
- `soluble_v_48_020` - Trained for 20 epochs on soluble proteins (recommended)
- `soluble_v_48_030` - Trained for 30 epochs on soluble proteins

## Installation

```bash
pip install jax equinox huggingface_hub
```

## Usage

### Basic Usage

```python
import jax
import jax.numpy as jnp
import equinox as eqx
from huggingface_hub import hf_hub_download

# Download model from HuggingFace
model_path = hf_hub_download(
    repo_id="maraxen/prxteinmpnn",
    filename="eqx/original_v_48_020.eqx",
    repo_type="model",
)

# Create model structure (must match saved architecture)
from prxteinmpnn.eqx_new import PrxteinMPNN

key = jax.random.PRNGKey(0)
model = PrxteinMPNN(
    node_features=128,
    edge_features=128,
    hidden_features=512,
    num_encoder_layers=3,
    num_decoder_layers=3,
    vocab_size=21,
    k_neighbors=48,
    key=key,
)

# Load weights
model = eqx.tree_deserialise_leaves(model_path, model)

# Use model for inference
# ... (see full documentation for inference examples)
```

### Using the High-Level API

```python
from prxteinmpnn.io.weights import load_model

# Automatically downloads and loads the model
model = load_model(
    model_version="v_48_020",
    model_weights="original"
)
```

## Model Architecture

**Hyperparameters:**
- Node features: 128
- Edge features: 128
- Hidden features: 512
- Encoder layers: 3
- Decoder layers: 3
- K-nearest neighbors: 48
- Vocabulary size: 21 (20 amino acids + 1 unknown)

**Architecture:**
- Message-passing encoder for structural features
- Autoregressive decoder for sequence generation
- Attention-based edge updates
- LayerNorm and residual connections

## Training Data

The models were trained on protein structures from the Protein Data Bank (PDB):
- **Original models:** Standard PDB training set
- **Soluble models:** Filtered for soluble, well-expressed proteins

## Performance

These models achieve state-of-the-art performance on:
- Native sequence recovery
- Structural compatibility (predicted structure vs. designed sequence)
- Expressibility and stability (for soluble models)

## Citation

If you use PrxteinMPNN in your research, please cite the original ProteinMPNN paper:

```bibtex
@article{dauparas2022robust,
  title={Robust deep learning--based protein sequence design using ProteinMPNN},
  author={Dauparas, Justas and Anishchenko, Ivan and Bennett, Nathaniel and Bai, Hua and Ragotte, Robert J and Milles, Lukas F and Wicky, Basile IM and Courbet, Alexis and de Haas, Rob J and Bethel, Neville and others},
  journal={Science},
  volume={378},
  number={6615},
  pages={49--56},
  year={2022},
  publisher={American Association for the Advancement of Science}
}
```

## License

MIT License - See LICENSE file for details.

## Links

- **GitHub Repository:** [maraxen/PrxteinMPNN](https://github.com/maraxen/PrxteinMPNN)
- **Original ProteinMPNN:** [dauparas/ProteinMPNN](https://github.com/dauparas/ProteinMPNN)
- **Documentation:** [Full documentation](https://github.com/maraxen/PrxteinMPNN/tree/main/docs)

## Technical Details

### File Format

Models are saved using Equinox's `tree_serialise_leaves` format (`.eqx` files), which:
- Preserves PyTree structure
- Ensures bit-perfect reproducibility
- Is compatible with JAX's functional programming paradigm
- Supports efficient serialization/deserialization

### Computational Requirements

- **Memory:** ~30 MB per model
- **Inference:** CPU-compatible, GPU-accelerated
- **Batch processing:** Supported via `jax.vmap`

## Updates

**Latest (v2.0):**
- Migrated to unified Equinox architecture
- All models now in `.eqx` format
- Improved modularity and type safety
- Full JAX compatibility with JIT, vmap, and grad

---

For more information, examples, and tutorials, visit the [GitHub repository](https://github.com/maraxen/PrxteinMPNN).