can protgpt2 be used in "Fill-Mask" tasks?

#10

by likun - opened Dec 6, 2022

Discussion

likun

Dec 6, 2022

Thanks for the excellent work!

I am wondering if that protgpt2 can be used in "Fill-Mask" tasks?

to be more specific, say I have a sequence:

"MTYKLIINGKTLKGETTTEAVDA"

now, i'd like to mutate T2 site, that is filling the T2 site blank with protgpt2.

"M ? YKLIINGKTLKGETTTEAVDA"

i have tied " pipeline('fill-mask', model="nferruz/ProtGPT2")"

got:

""fill-mask", self.model.base_model_prefix, "The tokenizer does not define a `mask_token"

this is my first time using a NLP model, sorry about the naive question.

thanks.

nferruz

Owner Dec 6, 2022

Hi Likun,

As it is, ProtGPT2 cannot be used in a fill-mask problem since it was trained with an autoregressive objective (predict next token). It could be done with some fine-tuning, but I haven’t done this yet.

For your problem, you could directly use a denoising autoencoding model, like ESM1 and ESM2, or ProtT5. They are many more, and they all publicly available. Let me know if you have questions if you give it a try!

Noelia

likun

Dec 6, 2022

thanks! this info is really helpful!

Likun

likun changed discussion status to closed Dec 6, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment