You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

PlasmidGPT Model

This is a GPT-2 based model for engineered plasmid sequence generation, converted from PyTorch .pt format to HuggingFace transformers format.

This is a supervised fine-tuned (SFT) version of PlasmidGPT for engineered plasmids. This work was done by Angus Cunningham while at Prof. Chris Barnes' lab at UCL.

Model Details

  • Architecture: GPT-2
  • Vocab Size: 30,002
  • Hidden Size: 768
  • Number of Layers: 12
  • Number of Heads: 12
  • Max Position Embeddings: 2048
  • Parameters: ~124M

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("./plasmidgpt-model")
tokenizer = AutoTokenizer.from_pretrained("./plasmidgpt-model")

# Basic generation
inputs = tokenizer("ATGC", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
generated_sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_sequence)

# With sampling (for more diverse outputs)
outputs = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.8, top_p=0.9)
generated_sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_sequence)

Example Outputs

Input: ATGCGATCG
Generated: ATGCGATCGGTGGTAGGCACTGGATGATGGCCCTGCAGTGTAGCCGTAGTTATGAGCCTCGGGATTCTTTGATGATTCAGCCACCCTCATCATCCTCCTCCTCC...

Input: ATGGCC
Generated: ATGGCCTACATACCTTCAATTACCGAAACAAGGTGGTTCATCTCTAACGCTGTCCATAAAACCGCCCAGTCTAGCTATCGCCATTTGCGCATCTAACGTGGTAGGCACTCCGGGTCCGCGCC...

Compatible With

This model is compatible with the architecture from McClain/plasmidgpt-addgene-gpt2, but with different weights from the pretrained model.

Files

  • config.json: Model configuration
  • generation_config.json: Generation parameters
  • model.safetensors: Model weights in SafeTensors format
  • tokenizer.json: Fast tokenizer data
  • tokenizer_config.json: Tokenizer configuration
  • special_tokens_map.json: Special token mappings
Downloads last month
12
Safetensors
Model size
0.2B params
Tensor type
F32
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for UCL-CSSB/PlasmidGPT-SFT

Finetunes
1 model