PlasmidGPT-GRPO: Reinforcement Learning Fine-tuned Plasmid Generator
A biologically-constrained plasmid design model trained with reinforcement learning to generate functional DNA sequences.
This model is a fine-tuned version of McClain/plasmidgpt-addgene-gpt2 (itself based on the original PlasmidGPT by Bin Shao), optimized using Group Relative Policy Optimization (GRPO) to generate plasmids that satisfy biological constraints.
π― Key Improvements Over Base Model
This RL-fine-tuned model has been trained to generate plasmids that:
- β Contain correct numbers of essential genetic elements (ori, promoters, terminators, markers, CDS)
- β Avoid repeat regions (>50 bp repeats penalized)
- β Generate shorter, more efficient sequences (rewarded for compactness)
- β Maintain proper gene cassette organization (promoter β CDS β terminator)
- β Achieve up to 1.0 reward score for optimal plasmid design
Reward Structure
The model was trained using a custom bioinformatics reward function that scores sequences based on:
| Component | Min | Max | Weight | Description |
|---|---|---|---|---|
| Origin of Replication (ori) | 1 | 1 | 1.5Γ | Essential for plasmid replication |
| Promoters | 1 | 1 | 1.0Γ | Drive gene expression |
| Terminators | 0 | 2 | 0.5Γ | Stop transcription |
| Selectable Markers | 1 | 2 | 1.0Γ | Antibiotic resistance |
| Coding Sequences (CDS) | 1 | 5 | 1.0Γ | Functional genes |
Additional Scoring:
- Repeat Penalty: -0.1 per repeat region β₯50 bp (including reverse complements)
- Length Bonus: Rewards for shorter, more compact sequences (up to +0.5)
- Location Awareness: Bonuses for correct gene cassette ordering and proximity
Maximum reward: 1.0 (perfect plasmid with all constraints satisfied)
π Quick Start
Basic Sequence Generation
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = AutoModelForCausalLM.from_pretrained(
"McClain/plasmidgpt-grpo-rl",
trust_remote_code=True
).to(device)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(
"McClain/plasmidgpt-grpo-rl",
trust_remote_code=True
)
# Generate optimized plasmid sequence
start_sequence = 'ATGGCTAGCGAATTC'
input_ids = tokenizer.encode(start_sequence, return_tensors='pt').to(device)
outputs = model.generate(
input_ids,
max_length=400,
num_return_sequences=5,
temperature=0.8,
do_sample=True,
top_k=50,
top_p=0.95,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
for i, output in enumerate(outputs):
sequence = tokenizer.decode(output, skip_special_tokens=True)
print(f"Plasmid {i+1}: {len(sequence)} bp")
Scoring Generated Plasmids
To evaluate plasmids using the same reward function from training:
# Install plasmidkit for annotation
# pip install plasmidkit
from plasmidrl.rewards import Scorer, RewardConfig
# Use the same config as training
reward_config = RewardConfig(
punish_mode=True,
length_reward_mode=False,
repeat_penalty_enabled=True,
repeat_min_length=50,
repeat_penalty_per_region=0.1,
ori_min=1, ori_max=1, ori_weight=1.5,
promoter_min=1, promoter_max=1, promoter_weight=1.0,
terminator_min=0, terminator_max=2, terminator_weight=0.5,
marker_min=1, marker_max=2, marker_weight=1.0,
cds_min=1, cds_max=5, cds_weight=1.0,
location_aware=True
)
scorer = Scorer(reward_config)
score, components = scorer.score(generated_sequence)
print(f"Reward Score: {score:.3f}")
print(f"Components: {components}")
π Training Details
Training Configuration
- Base Model: McClain/plasmidgpt-addgene-gpt2
- RL Algorithm: GRPO (Group Relative Policy Optimization)
- Training Steps: 2,500 steps
- Training Repository: PlasmidRL
- W&B Run: u3wt9c50
Model Architecture
| Parameter | Value |
|---|---|
| Architecture | GPT-2 (Decoder-only Transformer) |
| Parameters | 110 million |
| Layers | 12 |
| Hidden Size | 768 |
| Attention Heads | 12 |
| Context Length | 2048 tokens |
| Vocabulary Size | 30,002 |
Framework Versions
- TRL: 0.23.1
- Transformers: 4.57.0
- PyTorch: 2.8.0
- Datasets: 4.1.1
- Tokenizers: 0.22.1
𧬠Use Cases
- Optimized Plasmid Design: Generate plasmids that satisfy specific biological constraints
- Synthetic Biology: Create novel genetic constructs for molecular cloning
- Gene Cassette Engineering: Design properly organized promoter-CDS-terminator cassettes
- Compact Plasmid Construction: Generate shorter plasmids while maintaining functionality
- Repeat-Free Sequences: Avoid problematic repeat regions in plasmid design
π Related Resources
Original PlasmidGPT
This model builds upon the original PlasmidGPT work:
- Paper: PlasmidGPT: a generative framework for plasmid design and annotation (bioRxiv 2024.09.30.615762)
- Author: Bin Shao (lingxusb)
- Original Repository: github.com/lingxusb/PlasmidGPT
- Original Model: huggingface.co/lingxusb/PlasmidGPT
Training Infrastructure
- Training Code: github.com/McClain-Thiel/PlasmidRL
- W&B Project: ucl-cssb/PlasmidRL
- Base Model: McClain/plasmidgpt-addgene-gpt2
π Citations
If you use this model, please cite:
This RL Model
@misc{thiel2024plasmidgpt_grpo,
title={PlasmidGPT-GRPO: Reinforcement Learning for Functional Plasmid Design},
author={Thiel, McClain},
year={2024},
howpublished={\url{https://github.com/McClain-Thiel/PlasmidRL}},
note={Training run: https://wandb.ai/ucl-cssb/PlasmidRL/runs/u3wt9c50}
}
Original PlasmidGPT
@article{shao2024plasmidgpt,
title={PlasmidGPT: a generative framework for plasmid design and annotation},
author={Shao, Bin and others},
journal={bioRxiv},
year={2024},
doi={10.1101/2024.09.30.615762},
url={https://www.biorxiv.org/content/10.1101/2024.09.30.615762v1}
}
GRPO Algorithm
@article{shao2024deepseekmath,
title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
journal={arXiv preprint arXiv:2402.03300},
year={2024}
}
TRL Library
@misc{vonwerra2022trl,
title={{TRL: Transformer Reinforcement Learning}},
author={Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year={2020},
publisher={GitHub},
howpublished={\url{https://github.com/huggingface/trl}}
}
βοΈ Technical Details
Reward Function Components
The bioinformatics reward function (src/rewards/bioinformatics/scorer.py) includes:
- Feature Counting: Uses PlasmidKit for automated annotation
- Overlap Merging: Intelligently merges overlapping features (80% threshold)
- CDS Filtering: Removes CDS annotations overlapping with ori/promoter/terminator/marker
- Strand Awareness: Considers strand orientation for gene cassette scoring
- Repeat Detection: Finds direct and reverse complement repeats using k-mer indexing
- Proximity Scoring: Rewards features within 300 bp for proper cassette formation
Training Hyperparameters
View complete hyperparameters and metrics on W&B.
β οΈ Important Notes
- Research Use Only: Generated plasmids should be validated before experimental use
- Annotation Dependency: Scoring requires
plasmidkitfor feature annotation - Compute Requirements: GPU recommended for generation (CPU fallback available)
- Sequence Validation: Always verify generated sequences contain expected features
π License
This model inherits licensing from the original PlasmidGPT repository. Please refer to the original repository for details.
π Acknowledgments
- Bin Shao (lingxusb) for the original PlasmidGPT model and architecture
- Addgene for providing the training data (153k plasmid sequences)
- HuggingFace TRL team for the GRPO implementation
- UCL CSSB for computational resources
Model Version: grpo-production-20251110_132247
Training Date: November 10, 2025
Last Updated: November 13, 2025
- Downloads last month
- 12