Configuration Parsing Warning: In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

PlasmidLM-kmer6-MoE

A 78.3M total parameter (31.1M active) Mixture-of-Experts autoregressive language model for plasmid DNA sequence generation, trained on ~100K plasmid sequences from Addgene.

Model Details

Property Value
Total parameters 78.3M
Active parameters 31.1M
Architecture Transformer decoder with MoE MLP
Hidden size 384
Layers 10
Attention heads 8
Experts 6 (top-2 routing)
Expert intermediate size 1,536
Max sequence length 16,384 tokens
Tokenizer k-mer (k=6, stride=3)
Vocab size 4,208

Training

  • Data: ~100K plasmid sequences from Addgene, tokenized with k-mer (k=6, stride=3)
  • Steps: 35,000
  • Eval loss: 0.190
  • Token accuracy: 98.4%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("McClain/PlasmidLM-kmer6-MoE", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("McClain/PlasmidLM-kmer6-MoE", trust_remote_code=True)

# Condition on antibiotic resistance + origin of replication
prompt = "<BOS><AMR_KANAMYCIN><ORI_COLE1><SEP>"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.8, do_sample=True, top_p=0.95)
print(tokenizer.decode(outputs[0].tolist()))

The model generates plasmid DNA sequences conditioned on functional annotations (antibiotic resistance markers, origins of replication) provided as special tokens in the prompt.

MoE Architecture

Each transformer layer replaces the standard dense MLP with a Mixture-of-Experts layer containing 6 expert MLPs. A learned router selects the top-2 experts per token, so only 31.1M of the 78.3M total parameters are active for any given token. This provides greater model capacity while maintaining efficient inference.

Special Tokens

Token Purpose
<BOS> Beginning of sequence
<EOS> End of sequence
<SEP> Separator between prompt annotations and DNA sequence
<PAD> Padding
<AMR_*> Antibiotic resistance markers (e.g., <AMR_KANAMYCIN>, <AMR_AMPICILLIN>)
<ORI_*> Origins of replication (e.g., <ORI_COLE1>, <ORI_P15A>)

Citation

If you use this model, please cite:

@misc{thiel2026plasmidlm,
  title={PlasmidLM: Language Models for Plasmid DNA Generation},
  author={Thiel, McClain},
  year={2026}
}
Downloads last month
-
Safetensors
Model size
78.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support