FlowAMP / model_card.md
esunAI's picture
Initial FlowAMP upload: Complete project with all essential files
370f342
---
language:
- en
tags:
- protein-design
- antimicrobial-peptides
- flow-matching
- esm-2
- pytorch
license: mit
datasets:
- uniprot
- amp-datasets
metrics:
- mic-prediction
- sequence-validity
- diversity
---
# FlowAMP: Flow-based Antimicrobial Peptide Generation
## Model Description
FlowAMP is a novel flow-based generative model for designing antimicrobial peptides (AMPs) using conditional flow matching and ESM-2 protein language model embeddings. The model leverages the power of flow matching for high-quality peptide generation while incorporating protein language model understanding for biologically relevant sequences.
### Architecture
The model consists of several key components:
1. **ESM-2 Encoder**: Uses ESM-2 (esm2_t33_650M_UR50D) to extract 1280-dimensional protein sequence embeddings
2. **Compressor/Decompressor**: Reduces embedding dimensionality by 16x (1280 → 80) for efficient processing
3. **Flow Matcher**: Implements conditional flow matching for generation with time embeddings
4. **CFG Integration**: Classifier-free guidance for controllable generation
### Key Features
- **Flow-based Generation**: Uses conditional flow matching for high-quality peptide generation
- **ESM-2 Integration**: Leverages ESM-2 protein language model embeddings for sequence understanding
- **CFG Training**: Implements Classifier-Free Guidance for controllable generation
- **Multi-GPU Training**: Optimized for H100 GPUs with mixed precision training
- **Comprehensive Evaluation**: MIC prediction and antimicrobial activity assessment
## Training
### Training Data
The model was trained on:
- **UniProt Database**: Comprehensive protein sequence database
- **AMP Datasets**: Curated antimicrobial peptide sequences
- **ESM-2 Embeddings**: Pre-computed embeddings for efficient training
### Training Configuration
- **Batch Size**: 96 (optimized for H100)
- **Learning Rate**: 4e-4 with cosine annealing to 2e-4
- **Epochs**: 6000
- **Mixed Precision**: BF16 for H100 optimization
- **CFG Dropout**: 15% for unconditional training
- **Gradient Clipping**: Norm=1.0 for stability
### Training Performance
- **Speed**: 31 steps/second on H100 GPU
- **Memory Efficiency**: Mixed precision training
- **Stability**: Gradient clipping and weight decay (0.01)
## Usage
### Basic Generation
```python
from final_flow_model import AMPFlowMatcherCFGConcat
from generate_amps import generate_amps
# Load trained model
model = AMPFlowMatcherCFGConcat.load_from_checkpoint('path/to/checkpoint.pth')
# Generate AMPs with different CFG strengths
sequences_no_cfg = generate_amps(model, num_samples=100, cfg_strength=0.0)
sequences_weak_cfg = generate_amps(model, num_samples=100, cfg_strength=1.0)
sequences_strong_cfg = generate_amps(model, num_samples=100, cfg_strength=2.0)
sequences_very_strong_cfg = generate_amps(model, num_samples=100, cfg_strength=3.0)
```
### Evaluation
```python
from test_generated_peptides import evaluate_generated_peptides
# Evaluate generated sequences for antimicrobial activity
results = evaluate_generated_peptides(sequences)
```
## Performance
### Generation Quality
- **Sequence Validity**: High percentage of valid peptide sequences
- **Diversity**: Good sequence diversity across different CFG strengths
- **Biological Relevance**: ESM-2 embeddings ensure biologically meaningful sequences
### Antimicrobial Activity
- **MIC Prediction**: Integration with Apex model for MIC prediction
- **Activity Assessment**: Comprehensive evaluation of antimicrobial potential
- **CFG Effectiveness**: Measured through controlled generation
## Limitations
- **Sequence Length**: Limited to 50 amino acids maximum
- **Computational Requirements**: Requires GPU for efficient generation
- **Training Data**: Dependent on quality of UniProt and AMP datasets
## Citation
```bibtex
@article{flowamp2024,
title={FlowAMP: Flow-based Antimicrobial Peptide Generation with Conditional Flow Matching},
author={Sun, Edward},
journal={arXiv preprint},
year={2024}
}
```
## License
MIT License - see LICENSE file for details.