Protein2PAM models and training data

Overview

This repo contains Protein2PAM models and training data.

For Python API, see:

Model Summary

PAM prediction models for CRISPR-Cas nucleases.

Main Models

Model Name Input Protein/Domain CRISPR Type Samples
cas8 Cas8 or Cas10d Type I 28,410
cas9 Cas9 PI-domain Type II 15,843
cas12 Cas12 protein Type V 1,720

Additional Models

Model Name Input Protein/Domain CRISPR Type Samples
cas9_full Cas9 protein Type II 15,843
cas9_full_nolit Cas9 protein Type II 15,731
cas9_pid_nolit Cas9 PI-domain Type II 15,731
cas9_pid_nme Cas9 PI-domain Type II 15,843
cas12_no_lit Cas12 protein Type V 1,675

Dataset Summary

Training data is stored in: Profluent-Bio/protein2pam-training-data

Data Fields

Field name Description Data type Example
cas_family CRISPR-Cas family classification string Cas8
source Data source ('CRISPR-Cas Atlas' or Literature') string CRISPR-Cas Atlas
protein_id Identifier for literature sequences string or null null
citation Literature citation(s) string or null null
doi DOI for the associated publication string or null null
protein_sequence Full amino acid sequence string MTFMILQALYRY...NQN
pid_sequence PAM-interacting domain sequence (Cas9 only) string or null null
pam_consensus Consensus PAM sequence string TTC
pam_logo_acgt ACGT-ordered numerical matrix for PAM logo array of float arrays [[0.005, 0.008, 0.007, 0.01], ...]
type CRISPR-Cas type classification string Type I
subtype CRISPR-Cas subtype or effector family string Cas8

Licensing

Models and data are licensed under the Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0). You may share and adapt the models for non-commercial use with appropriate attribution.

Citation

If you use Protein2PAM in your research, please cite the following preprint:
Nayfach, S., Bhatnagar, A., Novichkov, A., et al. (2025). Engineering of CRISPR-Cas PAM recognition using deep learning of vast evolutionary data. bioRxiv.

Downloads last month
10
Safetensors
Model size
0.7B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Profluent-Bio/protein2pam-cas12