metadata
library_name: transformers
tags: []
NOTE
The GitHub with the implementation and requirements can be found here.
DPLM2
Synthyra DPLM2 checkpoints are HuggingFace AutoModel compatible and include FastPLMs embedding helpers.
Supported models
model_dict = {
"Synthyra/DPLM2-150M": "airkingbd/dplm2_150m",
"Synthyra/DPLM2-650M": "airkingbd/dplm2_650m",
"Synthyra/DPLM2-3B": "airkingbd/dplm2_3b",
}
Use with transformers
import torch
from transformers import AutoModel, AutoModelForMaskedLM
model_path = "Synthyra/DPLM2-150M"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
tokenizer = model.tokenizer
batch = tokenizer(["MPRTEIN", "MSEQWENCE"], padding=True, return_tensors="pt")
with torch.no_grad():
hidden = model(**batch).last_hidden_state
mlm = AutoModelForMaskedLM.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
with torch.no_grad():
logits = mlm(**batch).logits
DPLM2 modality types
DPLM2 infers type_ids automatically from input_ids and attention_mask when they are not provided.
Attention backend
sdpa is the default backend. Flex Attention is available by setting config.attn_backend = "flex" before loading.
Embed datasets
All DPLM2 models inherit EmbeddingMixin, so you can call model.embed_dataset(...) directly.