Configuration Parsing Warning: In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

PlasmidLM-kmer6

A 19.3M parameter autoregressive language model for plasmid DNA sequence generation, trained on ~100K plasmid sequences from Addgene.

Model Details

Property Value
Parameters 19.3M
Architecture Transformer decoder (dense MLP)
Hidden size 384
Layers 10
Attention heads 8
Intermediate size 1,536
Max sequence length 16,384 tokens
Tokenizer k-mer (k=6, stride=3)
Vocab size 4,208

Training

  • Data: ~100K plasmid sequences from Addgene, tokenized with k-mer (k=6, stride=3)
  • Steps: 65,000
  • Eval loss: 0.129
  • Token accuracy: 97.4%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("McClain/PlasmidLM-kmer6", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("McClain/PlasmidLM-kmer6", trust_remote_code=True)

# Condition on antibiotic resistance + origin of replication
prompt = "<BOS><AMR_KANAMYCIN><ORI_COLE1><SEP>"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.8, do_sample=True, top_p=0.95)
print(tokenizer.decode(outputs[0].tolist()))

The model generates plasmid DNA sequences conditioned on functional annotations (antibiotic resistance markers, origins of replication) provided as special tokens in the prompt.

Special Tokens

Token Purpose
<BOS> Beginning of sequence
<EOS> End of sequence
<SEP> Separator between prompt annotations and DNA sequence
<PAD> Padding
<AMR_*> Antibiotic resistance markers (e.g., <AMR_KANAMYCIN>, <AMR_AMPICILLIN>)
<ORI_*> Origins of replication (e.g., <ORI_COLE1>, <ORI_P15A>)

Citation

If you use this model, please cite:

@misc{thiel2026plasmidlm,
  title={PlasmidLM: Language Models for Plasmid DNA Generation},
  author={Thiel, McClain},
  year={2026}
}
Downloads last month
-
Safetensors
Model size
19.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support