Pichia-CLM

Pichia–Codon language model (Pichia-CLM) is a deep learning–based language model for codon optimization to enhance recombinant protein production in the industrially relevant host Komagataella phaffii. Unlike conventional approaches that rely on codon usage bias metrics (CUB)—often providing a global score and ignoring sequence context—Pichia-CLM leverages the host genome to unbiasedly learn the amino acid-to-codon mapping. Prior deep learning models have attempted codon optimization but typically evaluated performance using CUB metrics with limited experimental validation. In contrast, we have experimentally validated Pichia-CLM across six diverse protein classes of varying complexity and consistently observe superior expression titers compared to four commercial codon optimization tools.

If you found this model useful, please cite our original PNAS publication:

Paper: Pichia-CLM: A language model–based codon optimization pipeline for Komagataella phaffii
Developed by: Harini Narayanan and J. Christopher Love
Repository: GitHub
Funded by: MIT AltHost Research Consortium, Daniel I.C. Wang (1959) Faculty Research Innovation Fund at MIT, Mazumdar-Shaw International Oncology Fellowship, Koch Institute at MIT
Model type: Unsupervised GRU-based encoder-decoder

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

love-lab
/

Pichia-CLM

Pichia-CLM

Space using love-lab/Pichia-CLM 1