VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning
Paper โข 2503.15438 โข Published โข 4
How to use AI4Protein/deep_unigram_50 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("feature-extraction", model="AI4Protein/deep_unigram_50") # Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("AI4Protein/deep_unigram_50")
model = AutoModelForMaskedLM.from_pretrained("AI4Protein/deep_unigram_50")This repository provides the tokenizer used in the VenusFactory platform, described in VenusFactory: A Unified Platform for Protein Engineering Data Retrieval and Language Model Fine-Tuning. VenusFactory is a unified platform for protein engineering that integrates data retrieval, standardized task benchmarking, and modular fine-tuning of protein language models (PLMs).
Code and further details are available at: https://github.com/tyang816/VenusFactory