Pichia-CLM

Pichia–Codon language model (Pichia-CLM) is a deep learning–based language model for codon optimization to enhance recombinant protein production in the industrially relevant host Komagataella phaffii. Unlike conventional approaches that rely on codon usage bias metrics (CUB)—often providing a global score and ignoring sequence context—Pichia-CLM leverages the host genome to unbiasedly learn the amino acid-to-codon mapping. Prior deep learning models have attempted codon optimization but typically evaluated performance using CUB metrics with limited experimental validation. In contrast, we have experimentally validated Pichia-CLM across six diverse protein classes of varying complexity and consistently observe superior expression titers compared to four commercial codon optimization tools.

If you found this model useful, please cite our original PNAS publication:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using love-lab/Pichia-CLM 1