| | --- |
| | license: mit |
| | tags: |
| | - protein-generation |
| | - antimicrobial-peptides |
| | - flow-matching |
| | - protein-design |
| | - esm |
| | - amp |
| | library_name: pytorch |
| | --- |
| | |
| | # FlowFinal: AMP Flow Matching Model |
| |
|
| | FlowFinal is a state-of-the-art flow matching model for generating antimicrobial peptides (AMPs). The model uses continuous normalizing flows to generate protein sequences in the ESM-2 embedding space. |
| |
|
| | ## Model Description |
| |
|
| | - **Model Type**: Flow Matching for Protein Generation |
| | - **Domain**: Antimicrobial Peptide (AMP) Generation |
| | - **Base Model**: ESM-2 (650M parameters) |
| | - **Architecture**: Transformer-based flow matching with classifier-free guidance (CFG) |
| | - **Training Data**: Curated AMP dataset with ~7K sequences |
| |
|
| | ## Key Features |
| |
|
| | - **Classifier-Free Guidance (CFG)**: Enables controlled generation with different conditioning strengths |
| | - **ESM-2 Integration**: Leverages pre-trained protein language model embeddings |
| | - **Compression Architecture**: Efficient 16x compression of ESM-2 embeddings (1280 β 80 dimensions) |
| | - **Multiple CFG Scales**: Support for no conditioning (0.0), weak (3.0), strong (7.5), and very strong (15.0) guidance |
| |
|
| | ## Model Components |
| |
|
| | ### Core Architecture |
| | - `final_flow_model.py`: Main flow matching model implementation |
| | - `compressor_with_embeddings.py`: Embedding compression/decompression modules |
| | - `final_sequence_decoder.py`: ESM-2 embedding to sequence decoder |
| |
|
| | ### Trained Weights |
| | - `final_compressor_model.pth`: Trained compressor (315MB) |
| | - `final_decompressor_model.pth`: Trained decompressor (158MB) |
| | - `amp_flow_model_final_optimized.pth`: Main flow model checkpoint |
| |
|
| | ### Generated Samples (Today's Results) |
| | - Generated AMP sequences with different CFG scales |
| | - HMD-AMP validation results showing 8.8% AMP prediction rate |
| |
|
| | ## Performance Results |
| |
|
| | ### HMD-AMP Validation (80 sequences tested) |
| | - **Total AMPs Predicted**: 7/80 (8.8%) |
| | - **By CFG Configuration**: |
| | - No CFG: 1/20 (5.0%) |
| | - Weak CFG: 2/20 (10.0%) |
| | - Strong CFG: 4/20 (20.0%) β Best performance |
| | - Very Strong CFG: 0/20 (0.0%) |
| |
|
| | ### Best Performing Sequences |
| | 1. `ILVLVLARRIVGVIVAKVVLYAIVRSVVAAAKSISAVTVAKVTVFFQTTA` (No CFG) |
| | 2. `EDLSKAKAELQRYLLLSEIVSAFTALTRFYVVLTKIFQIRVKLIAVGQIL` (Weak CFG) |
| | 3. `IKLSRIAGIIVKRIRVASGDAQRLITASIGFTLSVVLAARFITIILGIVI` (Strong CFG) |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from generate_amps import AMPGenerator |
| | |
| | # Initialize generator |
| | generator = AMPGenerator( |
| | model_path="amp_flow_model_final_optimized.pth", |
| | device='cuda' |
| | ) |
| | |
| | # Generate AMP samples |
| | samples = generator.generate_amps( |
| | num_samples=20, |
| | num_steps=25, |
| | cfg_scale=7.5 # Strong CFG recommended |
| | ) |
| | ``` |
| |
|
| | ## Training Details |
| |
|
| | - **Optimizer**: AdamW with cosine annealing |
| | - **Learning Rate**: 4e-4 (final) |
| | - **Epochs**: 2000 |
| | - **Final Loss**: 1.318 |
| | - **Training Time**: 2.3 hours on H100 |
| | - **Dataset Size**: 6,983 samples |
| |
|
| | ## Files Structure |
| |
|
| | ``` |
| | FlowFinal/ |
| | βββ models/ |
| | β βββ final_compressor_model.pth |
| | β βββ final_decompressor_model.pth |
| | β βββ amp_flow_model_final_optimized.pth |
| | βββ generated_samples/ |
| | β βββ generated_sequences_20250829.fasta |
| | β βββ hmd_amp_detailed_results.csv |
| | βββ src/ |
| | β βββ final_flow_model.py |
| | β βββ compressor_with_embeddings.py |
| | β βββ final_sequence_decoder.py |
| | β βββ generate_amps.py |
| | βββ README.md |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | If you use FlowFinal in your research, please cite: |
| |
|
| | ```bibtex |
| | @misc{flowfinal2025, |
| | title={FlowFinal: Flow Matching for Antimicrobial Peptide Generation}, |
| | author={Edward Sun}, |
| | year={2025}, |
| | url={https://huggingface.co/esunAI/FlowFinal} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | This model is released under the MIT License. |
| |
|