| # FlowAMP: Flow-based Antimicrobial Peptide Generation | |
| ## Overview | |
| FlowAMP is a novel flow-based generative model for designing antimicrobial peptides (AMPs) using conditional flow matching and ESM-2 protein language model embeddings. This project implements a state-of-the-art approach for de novo AMP design with improved generation quality and diversity. | |
| ## Key Features | |
| - **Flow-based Generation**: Uses conditional flow matching for high-quality peptide generation | |
| - **ESM-2 Integration**: Leverages ESM-2 protein language model embeddings for sequence understanding | |
| - **CFG Training**: Implements Classifier-Free Guidance for controllable generation | |
| - **Multi-GPU Training**: Optimized for H100 GPUs with mixed precision training | |
| - **Comprehensive Evaluation**: MIC prediction and antimicrobial activity assessment | |
| ## Project Structure | |
| ``` | |
| flow/ | |
| βββ final_flow_model.py # Main FlowAMP model architecture | |
| βββ final_sequence_encoder.py # ESM-2 sequence encoding | |
| βββ final_sequence_decoder.py # Sequence decoding and generation | |
| βββ compressor_with_embeddings.py # Embedding compression/decompression | |
| βββ cfg_dataset.py # CFG dataset and dataloader | |
| βββ amp_flow_training_single_gpu_full_data.py # Single GPU training | |
| βββ amp_flow_training_multi_gpu.py # Multi-GPU training | |
| βββ generate_amps.py # AMP generation script | |
| βββ test_generated_peptides.py # Evaluation and testing | |
| βββ apex/ # Apex model integration | |
| β βββ trained_models/ # Pre-trained Apex models | |
| β βββ AMP_DL_model_twohead.py # Apex model architecture | |
| βββ normalization_stats.pt # Preprocessing statistics | |
| βββ requirements.yaml # Dependencies | |
| ``` | |
| ## Model Architecture | |
| The FlowAMP model consists of: | |
| 1. **ESM-2 Encoder**: Extracts protein sequence embeddings using ESM-2 | |
| 2. **Compressor/Decompressor**: Reduces embedding dimensionality for efficiency | |
| 3. **Flow Matcher**: Conditional flow matching for generation | |
| 4. **CFG Integration**: Classifier-free guidance for controllable generation | |
| ## Training | |
| ### Single GPU Training | |
| ```bash | |
| python amp_flow_training_single_gpu_full_data.py | |
| ``` | |
| ### Multi-GPU Training | |
| ```bash | |
| bash launch_multi_gpu_training.sh | |
| ``` | |
| ### Key Training Parameters | |
| - **Batch Size**: 96 (optimized for H100) | |
| - **Learning Rate**: 4e-4 with cosine annealing | |
| - **Epochs**: 6000 | |
| - **Mixed Precision**: BF16 for H100 optimization | |
| - **CFG Dropout**: 15% for unconditional training | |
| ## Generation | |
| Generate AMPs with different CFG strengths: | |
| ```bash | |
| python generate_amps.py --cfg_strength 0.0 # No CFG | |
| python generate_amps.py --cfg_strength 1.0 # Weak CFG | |
| python generate_amps.py --cfg_strength 2.0 # Strong CFG | |
| python generate_amps.py --cfg_strength 3.0 # Very Strong CFG | |
| ``` | |
| ## Evaluation | |
| ### MIC Prediction | |
| The model includes integration with Apex for MIC (Minimum Inhibitory Concentration) prediction: | |
| ```bash | |
| python test_generated_peptides.py | |
| ``` | |
| ### Performance Metrics | |
| - **Generation Quality**: Evaluated using sequence diversity and validity | |
| - **Antimicrobial Activity**: Predicted using Apex model integration | |
| - **CFG Effectiveness**: Measured through controlled generation | |
| ## Results | |
| ### Training Performance | |
| - **Optimized for H100**: 31 steps/second with batch size 96 | |
| - **Mixed Precision**: BF16 training for memory efficiency | |
| - **Gradient Clipping**: Stable training with norm=1.0 | |
| ### Generation Results | |
| - **Sequence Validity**: High percentage of valid peptide sequences | |
| - **Diversity**: Good sequence diversity across different CFG strengths | |
| - **Antimicrobial Potential**: Predicted MIC values for generated sequences | |
| ## Dependencies | |
| Key dependencies include: | |
| - PyTorch 2.0+ | |
| - Transformers (for ESM-2) | |
| - Wandb (optional logging) | |
| - Apex (for MIC prediction) | |
| See `requirements.yaml` for complete dependency list. | |
| ## Usage Examples | |
| ### Basic AMP Generation | |
| ```python | |
| from final_flow_model import AMPFlowMatcherCFGConcat | |
| from generate_amps import generate_amps | |
| # Load trained model | |
| model = AMPFlowMatcherCFGConcat.load_from_checkpoint('path/to/checkpoint.pth') | |
| # Generate AMPs | |
| sequences = generate_amps(model, num_samples=100, cfg_strength=1.0) | |
| ``` | |
| ### Evaluation | |
| ```python | |
| from test_generated_peptides import evaluate_generated_peptides | |
| # Evaluate generated sequences | |
| results = evaluate_generated_peptides(sequences) | |
| ``` | |
| ## Research Impact | |
| This work contributes to: | |
| - **Flow-based Protein Design**: Novel application of flow matching to peptide generation | |
| - **Conditional Generation**: CFG integration for controllable AMP design | |
| - **ESM-2 Integration**: Leveraging protein language models for sequence understanding | |
| - **Antimicrobial Discovery**: Automated design of potential therapeutic peptides | |
| ## Citation | |
| If you use this code in your research, please cite: | |
| ```bibtex | |
| @article{flowamp2024, | |
| title={FlowAMP: Flow-based Antimicrobial Peptide Generation with Conditional Flow Matching}, | |
| author={Sun, Edward}, | |
| journal={arXiv preprint}, | |
| year={2024} | |
| } | |
| ``` | |
| ## License | |
| MIT License - see LICENSE file for details. | |
| ## Contact | |
| For questions or collaboration, please contact the authors. | |