pdac_pred_llm / README.md
shubhamc-iiitd's picture
Initial model upload
ccf1103 verified
metadata
license: gpl-3.0
tags:
  - protein
  - peptide
  - deep-learning
  - pytorch
  - bioinformatics
library_name: pytorch

Revised Peptide LGBM Model

This repository contains a PyTorch deep learning model trained to predict peptide properties from amino acid sequences.

Model Description

The model uses tokenized amino acid sequences as input and predicts a probability score indicating the likelihood of the peptide belonging to the positive class.

The architecture is defined in model/network.py and initialized using a YAML configuration file.

Input Representation

Sequences are tokenized using the following mapping:

Token Description
PAD Padding
UNK Unknown
CLS Start token
SEP Separator
MASK Mask token
L,A,G,V,E,S,I,K,R,D,T,P,N,Q,F,Y,M,H,C,W Amino acids

Sequences are padded to the maximum length within a batch.

Files

File Description
model.pt Trained model checkpoint
config.yaml Model configuration
tokenizer_mapping.json Amino acid token mapping
inference.py Example inference script

Usage

Example inference:

from inference import predict

sequence = "LAGVEST"
probability = predict(sequence)

print(probability)