McClain commited on
Commit
363e4f1
·
verified ·
1 Parent(s): db2462a

Camera-ready README: simplified to what-it-is / quick start / citation

Browse files
Files changed (1) hide show
  1. README.md +17 -79
README.md CHANGED
@@ -1,39 +1,23 @@
1
  ---
2
- base_model: McClain/plasmidgpt-addgene-gpt2
3
  library_name: transformers
4
- model_name: PlasmidGPT-GRPO
 
5
  tags:
6
- - generated_from_trainer
7
- - grpo
8
- - trl
9
  - biology
10
  - plasmid
11
  - dna
12
  - synthetic-biology
13
- license: mit
14
- datasets:
15
- - McClain/plasmids-ncbi-addgene
16
- pipeline_tag: text-generation
17
  ---
18
 
19
  # PlasmidGPT-GRPO
20
 
21
- A generative model for plasmid DNA sequences, fine-tuned with Group Relative Policy Optimization (GRPO) reinforcement learning.
22
 
23
- ## Model Description
24
-
25
- This model is a fine-tuned version of [PlasmidGPT](https://huggingface.co/McClain/plasmidgpt-addgene-gpt2) optimized using GRPO to generate valid, functional plasmid sequences with:
26
- - **Origin of replication (ORI)** - Required for plasmid maintenance
27
- - **Antibiotic resistance marker (AMR)** - Required for selection
28
-
29
- ### Performance
30
-
31
- At temperature 1.3, this model achieves:
32
- - **90% QC pass rate** (valid ORI + AMR)
33
- - **3 unique ORI types** (ColE1, Col(pHAD28), Col440I)
34
- - **100% unique sequences** (no duplicates)
35
-
36
- ## Quick Start
37
 
38
  ```python
39
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -41,66 +25,20 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
41
  model = AutoModelForCausalLM.from_pretrained("UCL-CSSB/PlasmidGPT-GRPO")
42
  tokenizer = AutoTokenizer.from_pretrained("UCL-CSSB/PlasmidGPT-GRPO")
43
 
44
- # Generate a plasmid starting with ATG (start codon)
45
- prompt = "ATG"
46
- inputs = tokenizer(prompt, return_tensors="pt")
47
-
48
- outputs = model.generate(
49
- **inputs,
50
- max_new_tokens=2000,
51
- temperature=1.3,
52
- do_sample=True,
53
- pad_token_id=tokenizer.eos_token_id
54
- )
55
-
56
- sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)
57
- print(sequence)
58
  ```
59
 
60
- ## Training
61
-
62
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ucl-cssb/PlasmidRL/runs/u3wt9c50)
63
-
64
- This model was trained with GRPO (Group Relative Policy Optimization), a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
65
-
66
- The reward function optimizes for:
67
- 1. Presence of a valid origin of replication (ORI)
68
- 2. Presence of a valid antibiotic resistance marker (AMR)
69
- 3. Absence of long repetitive sequences
70
-
71
- ### Framework Versions
72
-
73
- - TRL: 0.23.1
74
- - Transformers: 4.57.0
75
- - PyTorch: 2.8.0
76
- - Datasets: 4.1.1
77
- - Tokenizers: 0.22.1
78
-
79
- ## Recommended Sampling Parameters
80
-
81
- | Temperature | Pass Rate | ORI Diversity | Notes |
82
- |-------------|-----------|---------------|-------|
83
- | 0.8 | 37% | 1 type | Collapsed - avoid |
84
- | 0.95 | 63% | 2 types | Conservative |
85
- | 1.15 | 76% | 2 types | Balanced |
86
- | **1.3** | **90%** | **3 types** | **Recommended** |
87
 
88
  ## Citation
89
 
90
  ```bibtex
91
- @article{shao2024deepseekmath,
92
- title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
93
- author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
94
- year=2024,
95
- eprint={arXiv:2402.03300},
96
- }
97
-
98
- @misc{vonwerra2022trl,
99
- title={{TRL: Transformer Reinforcement Learning}},
100
- author={Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
101
- year=2020,
102
- journal={GitHub repository},
103
- publisher={GitHub},
104
- howpublished={\url{https://github.com/huggingface/trl}}
105
  }
106
  ```
 
1
  ---
2
+ license: mit
3
  library_name: transformers
4
+ pipeline_tag: text-generation
5
+ base_model: UCL-CSSB/PlasmidGPT
6
  tags:
 
 
 
7
  - biology
8
  - plasmid
9
  - dna
10
  - synthetic-biology
11
+ - gpt2
12
+ - grpo
13
+ - reinforcement-learning
 
14
  ---
15
 
16
  # PlasmidGPT-GRPO
17
 
18
+ GRPO reinforcement-learning fine-tune of [PlasmidGPT](https://huggingface.co/UCL-CSSB/PlasmidGPT), trained against a multi-component biological reward (functional annotations, length prior, repeat penalty, cassette ordering). Camera-ready model for the ICML 2026 paper *Effects of Structural Reward Shaping on Biophysical Properties in RL-Trained Plasmid Generators*.
19
 
20
+ ## Quick start
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ```python
23
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
25
  model = AutoModelForCausalLM.from_pretrained("UCL-CSSB/PlasmidGPT-GRPO")
26
  tokenizer = AutoTokenizer.from_pretrained("UCL-CSSB/PlasmidGPT-GRPO")
27
 
28
+ input_ids = tokenizer("ATG", return_tensors="pt").input_ids
29
+ outputs = model.generate(input_ids, max_new_tokens=512, do_sample=True, temperature=1.0)
30
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 
 
 
 
 
 
 
 
 
 
 
31
  ```
32
 
33
+ Recommended sampling: T=1.0 for direct generation, T=1.15 for rejection sampling (per the paper).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## Citation
36
 
37
  ```bibtex
38
+ @inproceedings{thiel2026plasmidrl,
39
+ title = {Effects of Structural Reward Shaping on Biophysical Properties in {RL}-Trained Plasmid Generators},
40
+ author = {Thiel, McClain and Cunningham, Angus G. and Barnes, Chris P.},
41
+ booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
42
+ year = {2026}
 
 
 
 
 
 
 
 
 
43
  }
44
  ```