Proust v0

Proust is a 309M-parameter causal protein language model (PLM) introduced in the paper No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation.

The model bridges the divide between masked language models (MLMs), which excel at fitness prediction, and causal models, which enable generation. Proust achieves competitive performance on ProteinGym benchmarks while retaining native generative capabilities.

Model Details

Architecture: GQA-S2 Transformer (Grouped Query Attention with S2 KV-sharing and VO-RoPE)
Parameters: 309M
Configuration: 24 layers, 1024 hidden dimensions, 16 heads, 2 KV heads
Vocab: 32 tokens (ESM-style)
Innovations: Cross-layer value residuals and depthwise causal convolutions

Usage

To use this model, please follow the installation instructions in the official GitHub repository.

Load Model

from proust_inference import load_model

# Downloads checkpoint from HuggingFace on first call, loads to cuda in bfloat16
model = load_model()

Score a protein sequence (log-likelihood)

import torch
from proust_inference import load_model, tokenize

model = load_model()

ids = tokenize("MKTLLILAVLCLGFASSALA", device="cuda")
with torch.no_grad():
    logits = model(ids.unsqueeze(0))  # (1, seq_len, vocab_size)

# Per-token log probabilities
log_probs = logits.float().log_softmax(dim=-1)
# Shift: predict token t+1 from position t
token_log_probs = log_probs[0, :-1].gather(1, ids[1:].unsqueeze(1)).squeeze(1)
print(f"Mean log-likelihood: {token_log_probs.mean().item():.4f}")

Extract embeddings

import torch
from proust_inference import load_model, tokenize

model = load_model()

ids = tokenize("MKTLLILAVLCLGFASSALA", device="cuda")
with torch.no_grad():
    hidden = model.get_embeddings(ids.unsqueeze(0))  # (1, seq_len, 1024)

# Mean pooling (excluding <cls> and <eos>)
embedding = hidden[0, 1:-1].mean(dim=0)  # (1024,)

Citation

@article{proust2026,
  title={No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation},
  author={Furkan Eris},
  journal={arXiv preprint arXiv:2602.01845},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for nappenstance/proust_v0

No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation

Paper • 2602.01845 • Published Feb 2 • 1