pepTrans

Embedding-Based Transformer Framework for Multi-Level Peptide–Protein Interaction Prediction

Hugging Face Research Deep Learning


Overview

pepTrans is a transformer-based deep learning framework designed for comprehensive peptide–protein interaction (PepPI) analysis using only amino acid sequences.

The framework integrates large-scale pretrained protein language model (PLM) embeddings with task-specific convolutional neural networks (CNNs) to perform multiple peptide–protein interaction prediction tasks without requiring structural information, molecular docking, or handcrafted features.

Unlike traditional structure-dependent approaches, pepTrans learns interaction-relevant representations directly from protein and peptide sequences, enabling scalable and high-throughput prediction across diverse biological applications.


Key Features

βœ… Sequence-only prediction framework

βœ… No requirement for 3D structures

βœ… No handcrafted biochemical features

βœ… Transformer-based protein language model embeddings

βœ… Multi-task peptide–protein interaction prediction

βœ… Strong generalization to unseen proteins and peptides

βœ… Competitive performance against AlphaFold3-associated evaluation pipelines

βœ… Superior performance compared with several structure-based docking methods

βœ… Suitable for large-scale peptide therapeutic discovery


Supported Tasks

The released repository contains pretrained models for:

Task Description
Binary PepPI Prediction Predict whether a peptide interacts with a protein
Peptide Binding Residue Prediction Identify interaction-responsible residues within peptides
Peptide–Protein Binding Affinity Prediction Estimate interaction strength
Peptide Virtual Screening High-throughput candidate ranking
Peptide–PBD Prediction Predict peptide interactions with protein binding domains
Virtual Alanine Scanning Assess residue contributions to binding

Architecture

pepTrans combines:

1. Protein Language Models

  • ProtT5-XL-U50
  • Transformer encoder representations
  • Context-aware residue embeddings

2. Convolutional Neural Networks

Task-specific CNN modules are used to capture:

  • Local residue motifs
  • Interaction signatures
  • Spatial sequence patterns
  • Long-range contextual information

3. Multi-Level Prediction Heads

The learned representations are used for:

  • Binary interaction prediction
  • Residue-level binding prediction
  • Affinity estimation
  • Virtual screening

Model Workflow

Protein Sequence
        β”‚
        β–Ό
 ProtT5 Embedding
        β”‚
        β–Ό
 Protein CNN Module
        β”‚
        β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚            β”‚
        β–Ό            β–Ό

Peptide Sequence
        β”‚
        β–Ό
 ProtT5 Embedding
        β”‚
        β–Ό
 Peptide CNN Module

        β–Ό
 Feature Fusion
        β–Ό
 Fully Connected Layers
        β–Ό

 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ Binary PepPI Prediction β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ Binding Residue Mapping β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Performance Highlights

pepTrans was evaluated using benchmark datasets and independent external test sets.

Binary Interaction Prediction

  • Consistently outperformed:

    • CAMP
    • DeepDTA
    • PIPR
    • NRLMF
  • Demonstrated strong performance under:

    • Novel proteins
    • Novel peptides
    • Novel peptide–protein pairs

Binding Residue Prediction

pepTrans achieved:

  • Average MCC β‰ˆ 0.55
  • Average AUC β‰ˆ 0.77

on independent peptide–protein complexes.

Comparison Against AlphaFold3-Based Evaluation

pepTrans demonstrated:

  • Higher average MCC
  • Competitive AUC
  • More stable prediction distributions

while requiring only sequence information and significantly lower computational cost.

Virtual Screening

pepTrans outperformed:

  • GalaxyPepDock
  • AutoDock CrankPep
  • CABS-Dock
  • MDockPep
  • CAMP

on benchmark virtual screening datasets.


Repository Structure

pepTrans/
β”‚
β”œβ”€β”€ Binary pepPIs prediction/
β”œβ”€β”€ Binding Affinity/
β”œβ”€β”€ Generalizability/
β”œβ”€β”€ Peptide Binding Residues/
β”œβ”€β”€ Peptide PBD Prediction/
└── Peptide Virtual Screening/

Each directory contains task-specific pretrained weights and checkpoints.


Model Weights

Important Notice

GitHub imposes storage limitations for large deep learning model files.

To ensure long-term availability and reproducibility, all pretrained pepTrans weights are hosted on Hugging Face.

Official Model Repository

πŸ‘‰ https://github.com/SyedKumailHussainNaqvi/pepTrans/tree/main

Researchers should download all model checkpoints directly from this repository.


Scientific Impact

pepTrans advances peptide–protein interaction modeling by:

  • Eliminating dependence on experimental structures
  • Enabling scalable peptide screening
  • Improving residue-level interpretability
  • Supporting peptide therapeutic discovery
  • Facilitating large-scale interaction prediction

The framework provides a practical alternative to computationally intensive structure-based pipelines while maintaining competitive predictive performance.


Citation

If you use pepTrans in your research, please cite:

@article{Naqvi2026pepTrans,
  title={pepTrans: Embedding-Based Transformer Framework for Multi-Level Peptide–Protein Interaction Prediction},
  author={Naqvi, Syed Kumail Hussain and Cho, Hwangeui and Chong, Kil To and Tayara, Hilal},
  journal={Under Review},
  year={2026}
}

Authors

Syed Kumail Hussain Naqvi
Department of Physical-AI Convergence
Jeonbuk National University, Republic of Korea

Hwangeui Cho
School of Pharmacy
Jeonbuk National University, Republic of Korea

Kil To Chong
Jeonbuk National University, Republic of Korea

Hilal Tayara
School of International Engineering and Science
Jeonbuk National University, Republic of Korea


Contact

For questions regarding:

  • pretrained weights
  • model reproduction
  • datasets
  • benchmarking
  • collaborations

please contact the Syed Kumail Hussain Naqvi .


pepTrans

Advancing sequence-based peptide–protein interaction modeling through transformer-powered representation learning.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support