Instructions to use UCL-CSSB/PlasmidGPT-GRPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use UCL-CSSB/PlasmidGPT-GRPO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="UCL-CSSB/PlasmidGPT-GRPO")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("UCL-CSSB/PlasmidGPT-GRPO")
model = AutoModelForCausalLM.from_pretrained("UCL-CSSB/PlasmidGPT-GRPO", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use UCL-CSSB/PlasmidGPT-GRPO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "UCL-CSSB/PlasmidGPT-GRPO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UCL-CSSB/PlasmidGPT-GRPO",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/UCL-CSSB/PlasmidGPT-GRPO

SGLang

How to use UCL-CSSB/PlasmidGPT-GRPO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "UCL-CSSB/PlasmidGPT-GRPO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UCL-CSSB/PlasmidGPT-GRPO",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "UCL-CSSB/PlasmidGPT-GRPO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UCL-CSSB/PlasmidGPT-GRPO",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use UCL-CSSB/PlasmidGPT-GRPO with Docker Model Runner:
```
docker model run hf.co/UCL-CSSB/PlasmidGPT-GRPO
```

Refresh weights and simplify README; add W&B link

by McClain - opened Dec 3, 2025

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+35

-248

Files changed (2) hide show

README.md +34 -247
model.safetensors +1 -1

README.md CHANGED Viewed

@@ -1,258 +1,45 @@
-# PlasmidGPT-GRPO: Reinforcement Learning Fine-tuned Plasmid Generator
-[![W&B Run](https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg)](https://wandb.ai/ucl-cssb/PlasmidRL/runs/u3wt9c50)
-**A biologically-constrained plasmid design model trained with reinforcement learning to generate functional DNA sequences.**
-This model is a fine-tuned version of [McClain/plasmidgpt-addgene-gpt2](https://huggingface.co/McClain/plasmidgpt-addgene-gpt2) (itself based on the original [PlasmidGPT](https://github.com/lingxusb/PlasmidGPT) by Bin Shao), optimized using **Group Relative Policy Optimization (GRPO)** to generate plasmids that satisfy biological constraints.
-## 🎯 Key Improvements Over Base Model
-This RL-fine-tuned model has been trained to generate plasmids that:
-- ✅ Contain **correct numbers** of essential genetic elements (ori, promoters, terminators, markers, CDS)
-- ✅ Avoid **repeat regions** (>50 bp repeats penalized)
-- ✅ Generate **shorter, more efficient** sequences (rewarded for compactness)
-- ✅ Maintain **proper gene cassette organization** (promoter → CDS → terminator)
-- ✅ Achieve up to **1.0 reward score** for optimal plasmid design
-### Reward Structure
-The model was trained using a custom bioinformatics reward function that scores sequences based on:
-| Component | Min | Max | Weight | Description |
-|-----------|-----|-----|--------|-------------|
-| **Origin of Replication (ori)** | 1 | 1 | 1.5× | Essential for plasmid replication |
-| **Promoters** | 1 | 1 | 1.0× | Drive gene expression |
-| **Terminators** | 0 | 2 | 0.5× | Stop transcription |
-| **Selectable Markers** | 1 | 2 | 1.0× | Antibiotic resistance |
-| **Coding Sequences (CDS)** | 1 | 5 | 1.0× | Functional genes |
-**Additional Scoring:**
-- **Repeat Penalty**: -0.1 per repeat region ≥50 bp (including reverse complements)
-- **Length Bonus**: Rewards for shorter, more compact sequences (up to +0.5)
-- **Location Awareness**: Bonuses for correct gene cassette ordering and proximity
-**Maximum reward:** 1.0 (perfect plasmid with all constraints satisfied)
-## 🚀 Quick Start
-### Basic Sequence Generation
-```python
-import torch
-from transformers import AutoTokenizer, AutoModelForCausalLM
-device = 'cuda' if torch.cuda.is_available() else 'cpu'
-model = AutoModelForCausalLM.from_pretrained(
-    "McClain/plasmidgpt-grpo-rl",
-    trust_remote_code=True
-).to(device)
-model.eval()
-tokenizer = AutoTokenizer.from_pretrained(
-    "McClain/plasmidgpt-grpo-rl",
-    trust_remote_code=True
-)
-# Generate optimized plasmid sequence
-start_sequence = 'ATGGCTAGCGAATTC'
-input_ids = tokenizer.encode(start_sequence, return_tensors='pt').to(device)
-outputs = model.generate(
-    input_ids,
-    max_length=400,
-    num_return_sequences=5,
-    temperature=0.8,
-    do_sample=True,
-    top_k=50,
-    top_p=0.95,
-    pad_token_id=tokenizer.pad_token_id,
-    eos_token_id=tokenizer.eos_token_id
-)
-for i, output in enumerate(outputs):
-    sequence = tokenizer.decode(output, skip_special_tokens=True)
-    print(f"Plasmid {i+1}: {len(sequence)} bp")
 ```
-### Scoring Generated Plasmids
-To evaluate plasmids using the same reward function from training:
-```python
-# Install plasmidkit for annotation
-# pip install plasmidkit
-from plasmidrl.rewards import Scorer, RewardConfig
-# Use the same config as training
-reward_config = RewardConfig(
-    punish_mode=True,
-    length_reward_mode=False,
-    repeat_penalty_enabled=True,
-    repeat_min_length=50,
-    repeat_penalty_per_region=0.1,
-    ori_min=1, ori_max=1, ori_weight=1.5,
-    promoter_min=1, promoter_max=1, promoter_weight=1.0,
-    terminator_min=0, terminator_max=2, terminator_weight=0.5,
-    marker_min=1, marker_max=2, marker_weight=1.0,
-    cds_min=1, cds_max=5, cds_weight=1.0,
-    location_aware=True
-)
-scorer = Scorer(reward_config)
-score, components = scorer.score(generated_sequence)
-print(f"Reward Score: {score:.3f}")
-print(f"Components: {components}")
 ```
-## 📊 Training Details
-### Training Configuration
-- **Base Model**: [McClain/plasmidgpt-addgene-gpt2](https://huggingface.co/McClain/plasmidgpt-addgene-gpt2)
-- **RL Algorithm**: GRPO (Group Relative Policy Optimization)
-- **Training Steps**: 2,500 steps
-- **Training Repository**: [PlasmidRL](https://github.com/McClain-Thiel/PlasmidRL)
-- **W&B Run**: [u3wt9c50](https://wandb.ai/ucl-cssb/PlasmidRL/runs/u3wt9c50)
-### Model Architecture
-| Parameter | Value |
-|-----------|-------|
-| **Architecture** | GPT-2 (Decoder-only Transformer) |
-| **Parameters** | 110 million |
-| **Layers** | 12 |
-| **Hidden Size** | 768 |
-| **Attention Heads** | 12 |
-| **Context Length** | 2048 tokens |
-| **Vocabulary Size** | 30,002 |
-### Framework Versions
-- **TRL**: 0.23.1
-- **Transformers**: 4.57.0
-- **PyTorch**: 2.8.0
-- **Datasets**: 4.1.1
-- **Tokenizers**: 0.22.1
-## 🧬 Use Cases
-1. **Optimized Plasmid Design**: Generate plasmids that satisfy specific biological constraints
-2. **Synthetic Biology**: Create novel genetic constructs for molecular cloning
-3. **Gene Cassette Engineering**: Design properly organized promoter-CDS-terminator cassettes
-4. **Compact Plasmid Construction**: Generate shorter plasmids while maintaining functionality
-5. **Repeat-Free Sequences**: Avoid problematic repeat regions in plasmid design
-## 🔗 Related Resources
-### Original PlasmidGPT
-This model builds upon the original PlasmidGPT work:
-- **Paper**: [PlasmidGPT: a generative framework for plasmid design and annotation](https://www.biorxiv.org/content/10.1101/2024.09.30.615762v1) (bioRxiv 2024.09.30.615762)
-- **Author**: Bin Shao (lingxusb)
-- **Original Repository**: [github.com/lingxusb/PlasmidGPT](https://github.com/lingxusb/PlasmidGPT)
-- **Original Model**: [huggingface.co/lingxusb/PlasmidGPT](https://huggingface.co/lingxusb/PlasmidGPT)
-### Training Infrastructure
-- **Training Code**: [github.com/McClain-Thiel/PlasmidRL](https://github.com/McClain-Thiel/PlasmidRL)
-- **W&B Project**: [ucl-cssb/PlasmidRL](https://wandb.ai/ucl-cssb/PlasmidRL)
-- **Base Model**: [McClain/plasmidgpt-addgene-gpt2](https://huggingface.co/McClain/plasmidgpt-addgene-gpt2)
-## 📚 Citations
-If you use this model, please cite:
-### This RL Model
-```bibtex
-@misc{thiel2024plasmidgpt_grpo,
-  title={PlasmidGPT-GRPO: Reinforcement Learning for Functional Plasmid Design},
-  author={Thiel, McClain},
-  year={2024},
-  howpublished={\url{https://github.com/McClain-Thiel/PlasmidRL}},
-  note={Training run: https://wandb.ai/ucl-cssb/PlasmidRL/runs/u3wt9c50}
-}
-```
-### Original PlasmidGPT
-```bibtex
-@article{shao2024plasmidgpt,
-  title={PlasmidGPT: a generative framework for plasmid design and annotation},
-  author={Shao, Bin and others},
-  journal={bioRxiv},
-  year={2024},
-  doi={10.1101/2024.09.30.615762},
-  url={https://www.biorxiv.org/content/10.1101/2024.09.30.615762v1}
-}
 ```
-### GRPO Algorithm
-```bibtex
-@article{shao2024deepseekmath,
-  title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
-  author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
-  journal={arXiv preprint arXiv:2402.03300},
-  year={2024}
-}
-```
-### TRL Library
-```bibtex
-@misc{vonwerra2022trl,
-  title={{TRL: Transformer Reinforcement Learning}},
-  author={Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
-  year={2020},
-  publisher={GitHub},
-  howpublished={\url{https://github.com/huggingface/trl}}
-}
 ```
-## ⚙️ Technical Details
-### Reward Function Components
-The bioinformatics reward function (`src/rewards/bioinformatics/scorer.py`) includes:
-1. **Feature Counting**: Uses [PlasmidKit](https://github.com/jbloomlab/plasmidkit) for automated annotation
-2. **Overlap Merging**: Intelligently merges overlapping features (80% threshold)
-3. **CDS Filtering**: Removes CDS annotations overlapping with ori/promoter/terminator/marker
-4. **Strand Awareness**: Considers strand orientation for gene cassette scoring
-5. **Repeat Detection**: Finds direct and reverse complement repeats using k-mer indexing
-6. **Proximity Scoring**: Rewards features within 300 bp for proper cassette formation
-### Training Hyperparameters
-View complete hyperparameters and metrics on [W&B](https://wandb.ai/ucl-cssb/PlasmidRL/runs/u3wt9c50).
-## ⚠️ Important Notes
-- **Research Use Only**: Generated plasmids should be validated before experimental use
-- **Annotation Dependency**: Scoring requires `plasmidkit` for feature annotation
-- **Compute Requirements**: GPU recommended for generation (CPU fallback available)
-- **Sequence Validation**: Always verify generated sequences contain expected features
-## 📄 License
-This model inherits licensing from the original PlasmidGPT repository. Please refer to the [original repository](https://github.com/lingxusb/PlasmidGPT) for details.
-## 🙏 Acknowledgments
-- **Bin Shao (lingxusb)** for the original PlasmidGPT model and architecture
-- **Addgene** for providing the training data (153k plasmid sequences)
-- **HuggingFace TRL team** for the GRPO implementation
-- **UCL CSSB** for computational resources
----
-**Model Version**: grpo-production-20251110_132247
-**Training Date**: November 10, 2025
-**Last Updated**: November 13, 2025

+# PlasmidGPT-GRPO
+PlasmidGPT-GRPO is a GRPO-trained causal language model for plasmid/DNA sequence generation.
+This update refreshes the weights (model.safetensors) and streamlines the documentation.
+## Weights
+- `model.safetensors` (updated)
+- All tokenizer/config files remain unchanged.
+## Training Run
+- Weights and metrics: https://wandb.ai/ucl-cssb/PlasmidRL/runs/ty13u43j/overview
+## Usage
+Install:
 ```
+pip install torch transformers safetensors
 ```
+Load and generate:
 ```
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "UCL-CSSB/PlasmidGPT-GRPO"
+tok = AutoTokenizer.from_pretrained(model_id)
+if tok.pad_token is None:
+    tok.pad_token = tok.eos_token
+model = AutoModelForCausalLM.from_pretrained(model_id)
+inputs = tok(["ATG"], return_tensors="pt")
+out = model.generate(
+    **inputs,
+    max_new_tokens=128,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.9,
+    pad_token_id=tok.eos_token_id,
+    eos_token_id=tok.eos_token_id,
+)
+print(tok.decode(out[0], skip_special_tokens=True))
 ```
+Notes:
+- Use sampling (temperature/top_p) for diverse sequences; disable for deterministic output.
+- Runs on CPU, CUDA, or Apple MPS depending on your PyTorch install.

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ba508e7a9d4bfb9c095f95c11fe0e7a1131f6a9076e89852bdd22f67ca00c324
 size 438696576

 version https://git-lfs.github.com/spec/v1
+oid sha256:353de867743e69096257539c5ae44131947d9e41ef8a9a0ffdd863b3cff9eee6
 size 438696576