Revised Peptide LGBM Model
This repository contains a PyTorch deep learning model trained to predict peptide properties from amino acid sequences.
Model Description
The model uses tokenized amino acid sequences as input and predicts a probability score indicating the likelihood of the peptide belonging to the positive class.
The architecture is defined in model/network.py and initialized using a YAML configuration file.
Input Representation
Sequences are tokenized using the following mapping:
| Token | Description |
|---|---|
| PAD | Padding |
| UNK | Unknown |
| CLS | Start token |
| SEP | Separator |
| MASK | Mask token |
| L,A,G,V,E,S,I,K,R,D,T,P,N,Q,F,Y,M,H,C,W | Amino acids |
Sequences are padded to the maximum length within a batch.
Files
| File | Description |
|---|---|
| model.pt | Trained model checkpoint |
| config.yaml | Model configuration |
| tokenizer_mapping.json | Amino acid token mapping |
| inference.py | Example inference script |
Usage
Example inference:
from inference import predict
sequence = "LAGVEST"
probability = predict(sequence)
print(probability)
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support