UCL-CSSB
/

PlasmidGPT-GRPO

@@ -1,39 +1,23 @@
 ---
-base_model: McClain/plasmidgpt-addgene-gpt2
 library_name: transformers
-model_name: PlasmidGPT-GRPO
 tags:
-- generated_from_trainer
-- grpo
-- trl
 - biology
 - plasmid
 - dna
 - synthetic-biology
-license: mit
-datasets:
-- McClain/plasmids-ncbi-addgene
-pipeline_tag: text-generation
 ---
 # PlasmidGPT-GRPO
-A generative model for plasmid DNA sequences, fine-tuned with Group Relative Policy Optimization (GRPO) reinforcement learning.
-## Model Description
-This model is a fine-tuned version of [PlasmidGPT](https://huggingface.co/McClain/plasmidgpt-addgene-gpt2) optimized using GRPO to generate valid, functional plasmid sequences with:
-- **Origin of replication (ORI)** - Required for plasmid maintenance
-- **Antibiotic resistance marker (AMR)** - Required for selection
-### Performance
-At temperature 1.3, this model achieves:
-- **90% QC pass rate** (valid ORI + AMR)
-- **3 unique ORI types** (ColE1, Col(pHAD28), Col440I)
-- **100% unique sequences** (no duplicates)
-## Quick Start
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -41,66 +25,20 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained("UCL-CSSB/PlasmidGPT-GRPO")
 tokenizer = AutoTokenizer.from_pretrained("UCL-CSSB/PlasmidGPT-GRPO")
-# Generate a plasmid starting with ATG (start codon)
-prompt = "ATG"
-inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(
-    **inputs,
-    max_new_tokens=2000,
-    temperature=1.3,
-    do_sample=True,
-    pad_token_id=tokenizer.eos_token_id
-)
-sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(sequence)
 ```
-## Training
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ucl-cssb/PlasmidRL/runs/u3wt9c50)
-This model was trained with GRPO (Group Relative Policy Optimization), a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
-The reward function optimizes for:
-1. Presence of a valid origin of replication (ORI)
-2. Presence of a valid antibiotic resistance marker (AMR)
-3. Absence of long repetitive sequences
-### Framework Versions
-- TRL: 0.23.1
-- Transformers: 4.57.0
-- PyTorch: 2.8.0
-- Datasets: 4.1.1
-- Tokenizers: 0.22.1
-## Recommended Sampling Parameters
-| Temperature | Pass Rate | ORI Diversity | Notes |
-|-------------|-----------|---------------|-------|
-| 0.8 | 37% | 1 type | Collapsed - avoid |
-| 0.95 | 63% | 2 types | Conservative |
-| 1.15 | 76% | 2 types | Balanced |
-| **1.3** | **90%** | **3 types** | **Recommended** |
 ## Citation
 ```bibtex
-@article{shao2024deepseekmath,
-    title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
-    author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
-    year=2024,
-    eprint={arXiv:2402.03300},
-}
-@misc{vonwerra2022trl,
-    title={{TRL: Transformer Reinforcement Learning}},
-    author={Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
-    year=2020,
-    journal={GitHub repository},
-    publisher={GitHub},
-    howpublished={\url{https://github.com/huggingface/trl}}
 }
 ```

 ---
+license: mit
 library_name: transformers
+pipeline_tag: text-generation
+base_model: UCL-CSSB/PlasmidGPT
 tags:
 - biology
 - plasmid
 - dna
 - synthetic-biology
+- gpt2
+- grpo
+- reinforcement-learning
 ---
 # PlasmidGPT-GRPO
+GRPO reinforcement-learning fine-tune of [PlasmidGPT](https://huggingface.co/UCL-CSSB/PlasmidGPT), trained against a multi-component biological reward (functional annotations, length prior, repeat penalty, cassette ordering). Camera-ready model for the ICML 2026 paper *Effects of Structural Reward Shaping on Biophysical Properties in RL-Trained Plasmid Generators*.
+## Quick start
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained("UCL-CSSB/PlasmidGPT-GRPO")
 tokenizer = AutoTokenizer.from_pretrained("UCL-CSSB/PlasmidGPT-GRPO")
+input_ids = tokenizer("ATG", return_tensors="pt").input_ids
+outputs = model.generate(input_ids, max_new_tokens=512, do_sample=True, temperature=1.0)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
+Recommended sampling: T=1.0 for direct generation, T=1.15 for rejection sampling (per the paper).
 ## Citation
 ```bibtex
+@inproceedings{thiel2026plasmidrl,
+  title     = {Effects of Structural Reward Shaping on Biophysical Properties in {RL}-Trained Plasmid Generators},
+  author    = {Thiel, McClain and Cunningham, Angus G. and Barnes, Chris P.},
+  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
+  year      = {2026}
 }
 ```