# PlasmidGPT Model This is a GPT-2 based model for engineered plasmid sequence generation, converted from PyTorch `.pt` format to HuggingFace transformers format. This is a supervised fine-tuned (SFT) version of [PlasmidGPT](https://github.com/lingxusb/PlasmidGPT) for engineered plasmids. This work was done by **Angus Cunningham** while at **Prof. Chris Barnes' lab at UCL**. ## Model Details - **Architecture**: GPT-2 - **Vocab Size**: 30,002 - **Hidden Size**: 768 - **Number of Layers**: 12 - **Number of Heads**: 12 - **Max Position Embeddings**: 2048 - **Parameters**: ~124M ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("./plasmidgpt-model") tokenizer = AutoTokenizer.from_pretrained("./plasmidgpt-model") # Basic generation inputs = tokenizer("ATGC", return_tensors="pt") outputs = model.generate(**inputs, max_length=100) generated_sequence = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_sequence) # With sampling (for more diverse outputs) outputs = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.8, top_p=0.9) generated_sequence = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_sequence) ``` ### Example Outputs **Input:** `ATGCGATCG` **Generated:** `ATGCGATCGGTGGTAGGCACTGGATGATGGCCCTGCAGTGTAGCCGTAGTTATGAGCCTCGGGATTCTTTGATGATTCAGCCACCCTCATCATCCTCCTCCTCC...` **Input:** `ATGGCC` **Generated:** `ATGGCCTACATACCTTCAATTACCGAAACAAGGTGGTTCATCTCTAACGCTGTCCATAAAACCGCCCAGTCTAGCTATCGCCATTTGCGCATCTAACGTGGTAGGCACTCCGGGTCCGCGCC...` ## Compatible With This model is compatible with the architecture from [McClain/plasmidgpt-addgene-gpt2](https://huggingface.co/McClain/plasmidgpt-addgene-gpt2), but with different weights from the pretrained model. ## Files - `config.json`: Model configuration - `generation_config.json`: Generation parameters - `model.safetensors`: Model weights in SafeTensors format - `tokenizer.json`: Fast tokenizer data - `tokenizer_config.json`: Tokenizer configuration - `special_tokens_map.json`: Special token mappings