Add model card with paper link and sample usage

Hi, I'm Niels from the Hugging Face community science team. I'm opening this PR to improve your model card by adding relevant metadata, a link to the paper, the official GitHub repository, and a sample usage snippet to help users get started with the model.

Key changes:
- Added `pipeline_tag: feature-extraction` to the metadata.
- Linked the paper: [Reverse Distillation: Consistently Scaling Protein Language Model Representations](https://huggingface.co/papers/2603.07710).
- Included a "Quick Start" section with code from your GitHub README.
- Added the BibTeX citation for the paper.

Files changed (1) hide show

README.md +70 -9

README.md CHANGED Viewed

@@ -1,9 +1,70 @@
----
-license: mit
-base_model:
-  - facebook/esm2_t36_3B_UR50D
-  - facebook/esm2_t33_650M_UR50D
-  - facebook/esm2_t30_150M_UR50D
-  - facebook/esm2_t12_35M_UR50D
-  - facebook/esm2_t6_8M_UR50D
----

+---
+base_model:
+- facebook/esm2_t36_3B_UR50D
+- facebook/esm2_t33_650M_UR50D
+- facebook/esm2_t30_150M_UR50D
+- facebook/esm2_t12_35M_UR50D
+- facebook/esm2_t6_8M_UR50D
+license: mit
+pipeline_tag: feature-extraction
+---
+# PLM Reverse Distillation
+This repository contains the weights for the protein language models presented in the paper [Reverse Distillation: Consistently Scaling Protein Language Model Representations](https://huggingface.co/papers/2603.07710).
+Reverse Distillation is a principled framework that decomposes large Protein Language Model (PLM) representations into orthogonal subspaces guided by smaller models of the same family. The resulting embeddings have a Matryoshka-style nested structure, ensuring that larger reverse-distilled models consistently outperform smaller ones.
+- **GitHub Repository**: [rohitsinghlab/plm_reverse_distillation](https://github.com/rohitsinghlab/plm_reverse_distillation)
+## Quick Start
+Reverse Distilled ESM-2 models are designed to be a drop-in replacement for ESM-2 for most embedding-generation tasks.
+```python
+import esm
+import torch
+import reverse_distillation
+# Load ESM-2 model and the reverse distillation version
+esm2_model, alphabet = esm.pretrained.esm2_t33_650M_UR50D()
+rd_model, alphabet = reverse_distillation.pretrained.esm2_rd_650M()
+batch_converter = alphabet.get_batch_converter()
+esm2_model.eval()  # disables dropout for deterministic results
+rd_model.eval()  # disables dropout for deterministic results
+# Prepare data
+data = [
+    ("protein1", "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"),
+    ("protein2", "KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE"),
+]
+batch_labels, batch_strs, batch_tokens = batch_converter(data)
+batch_lens = (batch_tokens != alphabet.padding_idx).sum(1)
+# Extract per-residue representations
+with torch.no_grad():
+    results_esm = esm2_model(batch_tokens, repr_layers=[33], return_contacts=True)
+    results_rd = rd_model(batch_tokens)
+esm_token_representations = results_esm["representations"][33]
+rd_token_representations = results_rd["representations"]["650M"]
+# Generate per-sequence representations via averaging
+for i, tokens_len in enumerate(batch_lens):
+    print(f"esm representation size: {esm_token_representations[i, 1 : tokens_len - 1].size()}")
+    print(f"rd representation size: {rd_token_representations[i, 1 : tokens_len - 1].size()}")
+```
+## Citation
+If you use reverse distillation, please cite:
+```bibtex
+@inproceedings{catrina2026reverse,
+  title   = {Reverse Distillation: Consistently Scaling Protein Language Model Representations},
+  author  = {Catrina, Darius and Bepler, Christian and Sledzieski, Samuel and Singh, Rohit},
+  booktitle = {International Conference on Learning Representations},
+  year    = {2026}
+}
+```