Configuration Parsing Warning: In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

PlasmidLM-kmer6-MoE

A 78.3M total parameter (31.1M active) Mixture-of-Experts autoregressive language model for plasmid DNA sequence generation, trained on ~100K plasmid sequences from Addgene.

Model Details

Property	Value
Total parameters	78.3M
Active parameters	31.1M
Architecture	Transformer decoder with MoE MLP
Hidden size	384
Layers	10
Attention heads	8
Experts	6 (top-2 routing)
Expert intermediate size	1,536
Max sequence length	16,384 tokens
Tokenizer	k-mer (k=6, stride=3)
Vocab size	4,208

Training

Data: ~100K plasmid sequences from Addgene, tokenized with k-mer (k=6, stride=3)
Steps: 35,000
Eval loss: 0.190
Token accuracy: 98.4%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("McClain/PlasmidLM-kmer6-MoE", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("McClain/PlasmidLM-kmer6-MoE", trust_remote_code=True)

# Condition on antibiotic resistance + origin of replication
prompt = "<BOS><AMR_KANAMYCIN><ORI_COLE1><SEP>"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.8, do_sample=True, top_p=0.95)
print(tokenizer.decode(outputs[0].tolist()))

The model generates plasmid DNA sequences conditioned on functional annotations (antibiotic resistance markers, origins of replication) provided as special tokens in the prompt.

MoE Architecture

Each transformer layer replaces the standard dense MLP with a Mixture-of-Experts layer containing 6 expert MLPs. A learned router selects the top-2 experts per token, so only 31.1M of the 78.3M total parameters are active for any given token. This provides greater model capacity while maintaining efficient inference.

Special Tokens

Token	Purpose
`<BOS>`	Beginning of sequence
`<EOS>`	End of sequence
`<SEP>`	Separator between prompt annotations and DNA sequence
`<PAD>`	Padding
`<AMR_*>`	Antibiotic resistance markers (e.g., `<AMR_KANAMYCIN>`, `<AMR_AMPICILLIN>`)
`<ORI_*>`	Origins of replication (e.g., `<ORI_COLE1>`, `<ORI_P15A>`)

Citation

If you use this model, please cite:

@misc{thiel2026plasmidlm,
  title={PlasmidLM: Language Models for Plasmid DNA Generation},
  author={Thiel, McClain},
  year={2026}
}

Downloads last month: -

Safetensors

Model size

78.3M params

Tensor type

F32