library_name: pytorch
license: mit
pipeline_tag: text-generation
tags:
- protein-sequence-generation
- flow-matching
- bioinformatics
- protein-language-models
- pfam
LineageFlow RP55 Checkpoint
LineageFlow is a Dirichlet flow-matching model designed for high-fidelity, family-aware protein sequence generation. It initializes generation from lineage priors derived from ancestral sequence reconstruction (ASR), turning generation into structured mutation from an evolved scaffold.
- Paper: LineageFlow: Flow Matching for High-Fidelity Family-Aware Protein Sequence Generation
- Code: GitHub Repository
Model Description
Current discrete generative models for proteins often start from uniform or masked-token noise, which can discard position-specific constraints induced by evolution. LineageFlow addresses this by using phylogeny-informed priors to maintain family validity and structural confidence while exploring within-family diversity. Across diverse protein families, LineageFlow achieves family validity close to natural sequences and improves predicted structural confidence over uniform or mask-initialized baselines.
Usage
Download Checkpoint
You can download the checkpoint using the Hugging Face CLI:
pip install -U "huggingface_hub[cli]"
hf download jinxbye/LineageFlow \
lineageflow-rp55.ckpt \
--local-dir checkpoints
Batch Generation
To generate a batch of sequences using the official inference script, run:
python inference/batch_generate.py \
--config config/generation.json \
--ckpt checkpoints/lineageflow-rp55.ckpt \
--num-samples 512 \
--gpus all \
--out outputs/lineageflow_samples.fasta
For more detailed instructions on installation and single-family generation, please refer to the GitHub repository.
Citation
@inproceedings{liang2026lineageflow,
title = {LineageFlow: Flow Matching for High-Fidelity Family-Aware Protein Sequence Generation},
author = {Liang, Langzhang and Yang, Ming and Feng, Yi and Li, Junfan and Pan, Shirui and Xu, Yinghui and Ying, Tianlei and Zheng, Yizhen and Xu, Zenglin},
booktitle = {International Conference on Machine Learning},
year = {2026}
}