# FlowAMP: Flow-based Antimicrobial Peptide Generation ## Overview FlowAMP is a novel flow-based generative model for designing antimicrobial peptides (AMPs) using conditional flow matching and ESM-2 protein language model embeddings. This project implements a state-of-the-art approach for de novo AMP design with improved generation quality and diversity. ## Key Features - **Flow-based Generation**: Uses conditional flow matching for high-quality peptide generation - **ESM-2 Integration**: Leverages ESM-2 protein language model embeddings for sequence understanding - **CFG Training**: Implements Classifier-Free Guidance for controllable generation - **Multi-GPU Training**: Optimized for H100 GPUs with mixed precision training - **Comprehensive Evaluation**: MIC prediction and antimicrobial activity assessment ## Project Structure ``` flow/ ├── final_flow_model.py # Main FlowAMP model architecture ├── final_sequence_encoder.py # ESM-2 sequence encoding ├── final_sequence_decoder.py # Sequence decoding and generation ├── compressor_with_embeddings.py # Embedding compression/decompression ├── cfg_dataset.py # CFG dataset and dataloader ├── amp_flow_training_single_gpu_full_data.py # Single GPU training ├── amp_flow_training_multi_gpu.py # Multi-GPU training ├── generate_amps.py # AMP generation script ├── test_generated_peptides.py # Evaluation and testing ├── apex/ # Apex model integration │ ├── trained_models/ # Pre-trained Apex models │ └── AMP_DL_model_twohead.py # Apex model architecture ├── normalization_stats.pt # Preprocessing statistics └── requirements.yaml # Dependencies ``` ## Model Architecture The FlowAMP model consists of: 1. **ESM-2 Encoder**: Extracts protein sequence embeddings using ESM-2 2. **Compressor/Decompressor**: Reduces embedding dimensionality for efficiency 3. **Flow Matcher**: Conditional flow matching for generation 4. **CFG Integration**: Classifier-free guidance for controllable generation ## Training ### Single GPU Training ```bash python amp_flow_training_single_gpu_full_data.py ``` ### Multi-GPU Training ```bash bash launch_multi_gpu_training.sh ``` ### Key Training Parameters - **Batch Size**: 96 (optimized for H100) - **Learning Rate**: 4e-4 with cosine annealing - **Epochs**: 6000 - **Mixed Precision**: BF16 for H100 optimization - **CFG Dropout**: 15% for unconditional training ## Generation Generate AMPs with different CFG strengths: ```bash python generate_amps.py --cfg_strength 0.0 # No CFG python generate_amps.py --cfg_strength 1.0 # Weak CFG python generate_amps.py --cfg_strength 2.0 # Strong CFG python generate_amps.py --cfg_strength 3.0 # Very Strong CFG ``` ## Evaluation ### MIC Prediction The model includes integration with Apex for MIC (Minimum Inhibitory Concentration) prediction: ```bash python test_generated_peptides.py ``` ### Performance Metrics - **Generation Quality**: Evaluated using sequence diversity and validity - **Antimicrobial Activity**: Predicted using Apex model integration - **CFG Effectiveness**: Measured through controlled generation ## Results ### Training Performance - **Optimized for H100**: 31 steps/second with batch size 96 - **Mixed Precision**: BF16 training for memory efficiency - **Gradient Clipping**: Stable training with norm=1.0 ### Generation Results - **Sequence Validity**: High percentage of valid peptide sequences - **Diversity**: Good sequence diversity across different CFG strengths - **Antimicrobial Potential**: Predicted MIC values for generated sequences ## Dependencies Key dependencies include: - PyTorch 2.0+ - Transformers (for ESM-2) - Wandb (optional logging) - Apex (for MIC prediction) See `requirements.yaml` for complete dependency list. ## Usage Examples ### Basic AMP Generation ```python from final_flow_model import AMPFlowMatcherCFGConcat from generate_amps import generate_amps # Load trained model model = AMPFlowMatcherCFGConcat.load_from_checkpoint('path/to/checkpoint.pth') # Generate AMPs sequences = generate_amps(model, num_samples=100, cfg_strength=1.0) ``` ### Evaluation ```python from test_generated_peptides import evaluate_generated_peptides # Evaluate generated sequences results = evaluate_generated_peptides(sequences) ``` ## Research Impact This work contributes to: - **Flow-based Protein Design**: Novel application of flow matching to peptide generation - **Conditional Generation**: CFG integration for controllable AMP design - **ESM-2 Integration**: Leveraging protein language models for sequence understanding - **Antimicrobial Discovery**: Automated design of potential therapeutic peptides ## Citation If you use this code in your research, please cite: ```bibtex @article{flowamp2024, title={FlowAMP: Flow-based Antimicrobial Peptide Generation with Conditional Flow Matching}, author={Sun, Edward}, journal={arXiv preprint}, year={2024} } ``` ## License MIT License - see LICENSE file for details. ## Contact For questions or collaboration, please contact the authors.