YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

PlasmidGPT-GRPO: Reinforcement Learning Fine-tuned Plasmid Generator

W&B Run

A biologically-constrained plasmid design model trained with reinforcement learning to generate functional DNA sequences.

This model is a fine-tuned version of McClain/plasmidgpt-addgene-gpt2 (itself based on the original PlasmidGPT by Bin Shao), optimized using Group Relative Policy Optimization (GRPO) to generate plasmids that satisfy biological constraints.

🎯 Key Improvements Over Base Model

This RL-fine-tuned model has been trained to generate plasmids that:

  • βœ… Contain correct numbers of essential genetic elements (ori, promoters, terminators, markers, CDS)
  • βœ… Avoid repeat regions (>50 bp repeats penalized)
  • βœ… Generate shorter, more efficient sequences (rewarded for compactness)
  • βœ… Maintain proper gene cassette organization (promoter β†’ CDS β†’ terminator)
  • βœ… Achieve up to 1.0 reward score for optimal plasmid design

Reward Structure

The model was trained using a custom bioinformatics reward function that scores sequences based on:

Component Min Max Weight Description
Origin of Replication (ori) 1 1 1.5Γ— Essential for plasmid replication
Promoters 1 1 1.0Γ— Drive gene expression
Terminators 0 2 0.5Γ— Stop transcription
Selectable Markers 1 2 1.0Γ— Antibiotic resistance
Coding Sequences (CDS) 1 5 1.0Γ— Functional genes

Additional Scoring:

  • Repeat Penalty: -0.1 per repeat region β‰₯50 bp (including reverse complements)
  • Length Bonus: Rewards for shorter, more compact sequences (up to +0.5)
  • Location Awareness: Bonuses for correct gene cassette ordering and proximity

Maximum reward: 1.0 (perfect plasmid with all constraints satisfied)

πŸš€ Quick Start

Basic Sequence Generation

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

device = 'cuda' if torch.cuda.is_available() else 'cpu'

model = AutoModelForCausalLM.from_pretrained(
    "McClain/plasmidgpt-grpo-rl",
    trust_remote_code=True
).to(device)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(
    "McClain/plasmidgpt-grpo-rl",
    trust_remote_code=True
)

# Generate optimized plasmid sequence
start_sequence = 'ATGGCTAGCGAATTC'
input_ids = tokenizer.encode(start_sequence, return_tensors='pt').to(device)

outputs = model.generate(
    input_ids,
    max_length=400,
    num_return_sequences=5,
    temperature=0.8,
    do_sample=True,
    top_k=50,
    top_p=0.95,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id
)

for i, output in enumerate(outputs):
    sequence = tokenizer.decode(output, skip_special_tokens=True)
    print(f"Plasmid {i+1}: {len(sequence)} bp")

Scoring Generated Plasmids

To evaluate plasmids using the same reward function from training:

# Install plasmidkit for annotation
# pip install plasmidkit

from plasmidrl.rewards import Scorer, RewardConfig

# Use the same config as training
reward_config = RewardConfig(
    punish_mode=True,
    length_reward_mode=False,
    repeat_penalty_enabled=True,
    repeat_min_length=50,
    repeat_penalty_per_region=0.1,
    ori_min=1, ori_max=1, ori_weight=1.5,
    promoter_min=1, promoter_max=1, promoter_weight=1.0,
    terminator_min=0, terminator_max=2, terminator_weight=0.5,
    marker_min=1, marker_max=2, marker_weight=1.0,
    cds_min=1, cds_max=5, cds_weight=1.0,
    location_aware=True
)

scorer = Scorer(reward_config)
score, components = scorer.score(generated_sequence)

print(f"Reward Score: {score:.3f}")
print(f"Components: {components}")

πŸ“Š Training Details

Training Configuration

Model Architecture

Parameter Value
Architecture GPT-2 (Decoder-only Transformer)
Parameters 110 million
Layers 12
Hidden Size 768
Attention Heads 12
Context Length 2048 tokens
Vocabulary Size 30,002

Framework Versions

  • TRL: 0.23.1
  • Transformers: 4.57.0
  • PyTorch: 2.8.0
  • Datasets: 4.1.1
  • Tokenizers: 0.22.1

🧬 Use Cases

  1. Optimized Plasmid Design: Generate plasmids that satisfy specific biological constraints
  2. Synthetic Biology: Create novel genetic constructs for molecular cloning
  3. Gene Cassette Engineering: Design properly organized promoter-CDS-terminator cassettes
  4. Compact Plasmid Construction: Generate shorter plasmids while maintaining functionality
  5. Repeat-Free Sequences: Avoid problematic repeat regions in plasmid design

πŸ”— Related Resources

Original PlasmidGPT

This model builds upon the original PlasmidGPT work:

Training Infrastructure

πŸ“š Citations

If you use this model, please cite:

This RL Model

@misc{thiel2024plasmidgpt_grpo,
  title={PlasmidGPT-GRPO: Reinforcement Learning for Functional Plasmid Design},
  author={Thiel, McClain},
  year={2024},
  howpublished={\url{https://github.com/McClain-Thiel/PlasmidRL}},
  note={Training run: https://wandb.ai/ucl-cssb/PlasmidRL/runs/u3wt9c50}
}

Original PlasmidGPT

@article{shao2024plasmidgpt,
  title={PlasmidGPT: a generative framework for plasmid design and annotation},
  author={Shao, Bin and others},
  journal={bioRxiv},
  year={2024},
  doi={10.1101/2024.09.30.615762},
  url={https://www.biorxiv.org/content/10.1101/2024.09.30.615762v1}
}

GRPO Algorithm

@article{shao2024deepseekmath,
  title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
  author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
  journal={arXiv preprint arXiv:2402.03300},
  year={2024}
}

TRL Library

@misc{vonwerra2022trl,
  title={{TRL: Transformer Reinforcement Learning}},
  author={Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
  year={2020},
  publisher={GitHub},
  howpublished={\url{https://github.com/huggingface/trl}}
}

βš™οΈ Technical Details

Reward Function Components

The bioinformatics reward function (src/rewards/bioinformatics/scorer.py) includes:

  1. Feature Counting: Uses PlasmidKit for automated annotation
  2. Overlap Merging: Intelligently merges overlapping features (80% threshold)
  3. CDS Filtering: Removes CDS annotations overlapping with ori/promoter/terminator/marker
  4. Strand Awareness: Considers strand orientation for gene cassette scoring
  5. Repeat Detection: Finds direct and reverse complement repeats using k-mer indexing
  6. Proximity Scoring: Rewards features within 300 bp for proper cassette formation

Training Hyperparameters

View complete hyperparameters and metrics on W&B.

⚠️ Important Notes

  • Research Use Only: Generated plasmids should be validated before experimental use
  • Annotation Dependency: Scoring requires plasmidkit for feature annotation
  • Compute Requirements: GPU recommended for generation (CPU fallback available)
  • Sequence Validation: Always verify generated sequences contain expected features

πŸ“„ License

This model inherits licensing from the original PlasmidGPT repository. Please refer to the original repository for details.

πŸ™ Acknowledgments

  • Bin Shao (lingxusb) for the original PlasmidGPT model and architecture
  • Addgene for providing the training data (153k plasmid sequences)
  • HuggingFace TRL team for the GRPO implementation
  • UCL CSSB for computational resources

Model Version: grpo-production-20251110_132247
Training Date: November 10, 2025
Last Updated: November 13, 2025

Downloads last month
12
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for McClain/PlasmidGPT-RL