PlasmidGPT-RL

This model is a fine-tuned version of UCL-CSSB/PlasmidGPT-SFT using Group Relative Policy Optimization (GRPO).

Model Description

PlasmidGPT-RL is trained to generate functional plasmid DNA sequences. It was fine-tuned using reinforcement learning with a reward model that evaluates:

Presence of valid origins of replication (OriV)
Presence of antibiotic resistance genes (ARGs)
Absence of problematic repeat sequences

Training

This model was trained with GRPO using the TRL library.

Training run: Weights & Biases

Training Details

Base model: UCL-CSSB/PlasmidGPT-SFT
Method: GRPO (Group Relative Policy Optimization)
Checkpoint: 800 steps

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("McClain/PlasmidGPT-RL")
model = AutoModelForCausalLM.from_pretrained("McClain/PlasmidGPT-RL")

# Generate a plasmid sequence
prompt = "ATG"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.95,
    top_p=0.9
)
sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(sequence)

Framework Versions

TRL: 0.23.1
Transformers: 4.57.0
PyTorch: 2.8.0

Citation

If you use this model, please cite the GRPO paper:

@article{shao2024deepseekmath,
    title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year={2024},
    eprint={arXiv:2402.03300},
}

Downloads last month: 37

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for McClain/PlasmidGPT-RL

Base model

UCL-CSSB/PlasmidGPT-SFT

Finetuned

(1)

this model

Paper for McClain/PlasmidGPT-RL

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 145