littleworth
/

protgpt2-distilled-tiny

@@ -20,7 +20,7 @@ base_model: nferruz/ProtGPT2
 A compact protein language model distilled from [ProtGPT2](https://huggingface.co/nferruz/ProtGPT2) using **complementary-regularizer distillation**---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 87% better perplexity than standard knowledge distillation at 20x compression.
-> **Paper**: *Distilling Protein Language Models with Complementary Regularizers* (Wijaya, 2026)
 > **Code**: [github.com/ewijaya/protein-lm-distill](https://github.com/ewijaya/protein-lm-distill)
 ## Model Summary
@@ -181,10 +181,17 @@ Recommended fine-tuning hyperparameters for this model:
 ## Citation
 ```bibtex
-@article{wijaya2026distilling,
-  title={Distilling Protein Language Models with Complementary Regularizers},
-  author={Wijaya, Edward},
-  year={2026}
 }
 ```

 A compact protein language model distilled from [ProtGPT2](https://huggingface.co/nferruz/ProtGPT2) using **complementary-regularizer distillation**---a method that combines uncertainty-aware position weighting with calibration-aware label smoothing to achieve 87% better perplexity than standard knowledge distillation at 20x compression.
+> **Preprint**: *Distilling Protein Language Models with Complementary Regularizers* (Wijaya, 2026) — [bioRxiv](https://www.biorxiv.org/content/10.64898/2026.02.17.706304)
 > **Code**: [github.com/ewijaya/protein-lm-distill](https://github.com/ewijaya/protein-lm-distill)
 ## Model Summary
 ## Citation
 ```bibtex
+@article {Wijaya2026.02.17.706304,
+    author = {Wijaya, Edward},
+    title = {Distilling Protein Language Models with Complementary Regularizers},
+    elocation-id = {2026.02.17.706304},
+    year = {2026},
+    doi = {10.64898/2026.02.17.706304},
+    publisher = {Cold Spring Harbor Laboratory},
+    abstract = {Large autoregressive protein language models generate novel sequences de novo, but their size limits throughput and precludes rapid domain adaptation on scarce proprietary data. We distill a 738M-parameter protein language model into compact students using two protein-specific enhancements, uncertainty-aware position weighting and calibration-aware label smoothing, that individually degrade quality yet combine for substantial improvement. We trace this complementary-regularizer effect to information theory: smoothing denoises teacher distributions while weighting amplifies the cleaned signal at biologically variable positions. Students achieve up to 5x inference speedup, preserve natural amino acid distributions, and require as little as 170 MB of GPU memory, enabling deployment on consumer-grade hardware. When fine-tuned on protein families with as few as 50 sequences, students generate more family-matching sequences than the teacher, achieving higher sample efficiency and Pfam hit rates despite their smaller capacity. These results establish distilled protein language models as superior starting points for domain adaptation on scarce data.Competing Interest StatementThe authors have declared no competing interest.},
+    URL = {https://www.biorxiv.org/content/early/2026/02/25/2026.02.17.706304},
+    eprint = {https://www.biorxiv.org/content/early/2026/02/25/2026.02.17.706304.full.pdf},
+    journal = {bioRxiv}
 }
 ```