nappenstance
/

proust_v0

+---
+license: cc-by-nc-sa-2.0
+pipeline_tag: text-generation
+tags:
+- biology
+- protein
+---
+# Proust v0
+Proust is a 309M-parameter causal protein language model (PLM) introduced in the paper [No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation](https://huggingface.co/papers/2602.01845).
+The model bridges the divide between masked language models (MLMs), which excel at fitness prediction, and causal models, which enable generation. Proust achieves competitive performance on ProteinGym benchmarks while retaining native generative capabilities.
+- **Paper:** [No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation](https://huggingface.co/papers/2602.01845)
+- **Code:** [Furkan9015/proust-inference](https://github.com/Furkan9015/proust-inference)
+## Model Details
+- **Architecture:** GQA-S2 Transformer (Grouped Query Attention with S2 KV-sharing and VO-RoPE)
+- **Parameters:** 309M
+- **Configuration:** 24 layers, 1024 hidden dimensions, 16 heads, 2 KV heads
+- **Vocab:** 32 tokens (ESM-style)
+- **Innovations:** Cross-layer value residuals and depthwise causal convolutions
+## Usage
+To use this model, please follow the installation instructions in the [official GitHub repository](https://github.com/Furkan9015/proust-inference).
+### Load Model
+```python
+from proust_inference import load_model
+# Downloads checkpoint from HuggingFace on first call, loads to cuda in bfloat16
+model = load_model()
+```
+### Score a protein sequence (log-likelihood)
+```python
+import torch
+from proust_inference import load_model, tokenize
+model = load_model()
+ids = tokenize("MKTLLILAVLCLGFASSALA", device="cuda")
+with torch.no_grad():
+    logits = model(ids.unsqueeze(0))  # (1, seq_len, vocab_size)
+# Per-token log probabilities
+log_probs = logits.float().log_softmax(dim=-1)
+# Shift: predict token t+1 from position t
+token_log_probs = log_probs[0, :-1].gather(1, ids[1:].unsqueeze(1)).squeeze(1)
+print(f"Mean log-likelihood: {token_log_probs.mean().item():.4f}")
+```
+### Extract embeddings
+```python
+import torch
+from proust_inference import load_model, tokenize
+model = load_model()
+ids = tokenize("MKTLLILAVLCLGFASSALA", device="cuda")
+with torch.no_grad():
+    hidden = model.get_embeddings(ids.unsqueeze(0))  # (1, seq_len, 1024)
+# Mean pooling (excluding <cls> and <eos>)
+embedding = hidden[0, 1:-1].mean(dim=0)  # (1024,)
+```
+## Citation
+```bibtex
+@article{proust2025,
+  title={No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation},
+  author={Nappenstance Authors},
+  journal={arXiv preprint arXiv:2602.01845},
+  year={2025}
+}
+```