Revised Peptide LGBM Model

This repository contains a PyTorch deep learning model trained to predict peptide properties from amino acid sequences.

Model Description

The model uses tokenized amino acid sequences as input and predicts a probability score indicating the likelihood of the peptide belonging to the positive class.

The architecture is defined in model/network.py and initialized using a YAML configuration file.

Input Representation

Sequences are tokenized using the following mapping:

Token Description
PAD Padding
UNK Unknown
CLS Start token
SEP Separator
MASK Mask token
L,A,G,V,E,S,I,K,R,D,T,P,N,Q,F,Y,M,H,C,W Amino acids

Sequences are padded to the maximum length within a batch.

Files

File Description
model.pt Trained model checkpoint
config.yaml Model configuration
tokenizer_mapping.json Amino acid token mapping
inference.py Example inference script

Usage

Example inference:

from inference import predict

sequence = "LAGVEST"
probability = predict(sequence)

print(probability)
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support