File size: 2,364 Bytes
a103a2c 8ba087c a103a2c 8ba087c a103a2c 8ba087c a103a2c 8ba087c a103a2c 8ba087c a103a2c 8ba087c a103a2c 8ba087c a103a2c 8ba087c a103a2c 8ba087c a103a2c 8ba087c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | ---
library_name: pytorch
license: mit
pipeline_tag: text-generation
tags:
- protein-sequence-generation
- flow-matching
- bioinformatics
- protein-language-models
- pfam
---
# LineageFlow RP55 Checkpoint
LineageFlow is a Dirichlet flow-matching model designed for high-fidelity, family-aware protein sequence generation. It initializes generation from lineage priors derived from ancestral sequence reconstruction (ASR), turning generation into structured mutation from an evolved scaffold.
- **Paper:** [LineageFlow: Flow Matching for High-Fidelity Family-Aware Protein Sequence Generation](https://huggingface.co/papers/2605.22252)
- **Code:** [GitHub Repository](https://github.com/Jinx-byebye/LineageFlow)
## Model Description
Current discrete generative models for proteins often start from uniform or masked-token noise, which can discard position-specific constraints induced by evolution. LineageFlow addresses this by using phylogeny-informed priors to maintain family validity and structural confidence while exploring within-family diversity. Across diverse protein families, LineageFlow achieves family validity close to natural sequences and improves predicted structural confidence over uniform or mask-initialized baselines.
## Usage
### Download Checkpoint
You can download the checkpoint using the Hugging Face CLI:
```bash
pip install -U "huggingface_hub[cli]"
hf download jinxbye/LineageFlow \
lineageflow-rp55.ckpt \
--local-dir checkpoints
```
### Batch Generation
To generate a batch of sequences using the official inference script, run:
```bash
python inference/batch_generate.py \
--config config/generation.json \
--ckpt checkpoints/lineageflow-rp55.ckpt \
--num-samples 512 \
--gpus all \
--out outputs/lineageflow_samples.fasta
```
For more detailed instructions on installation and single-family generation, please refer to the [GitHub repository](https://github.com/Jinx-byebye/LineageFlow).
## Citation
```bibtex
@inproceedings{liang2026lineageflow,
title = {LineageFlow: Flow Matching for High-Fidelity Family-Aware Protein Sequence Generation},
author = {Liang, Langzhang and Yang, Ming and Feng, Yi and Li, Junfan and Pan, Shirui and Xu, Yinghui and Ying, Tianlei and Zheng, Yizhen and Xu, Zenglin},
booktitle = {International Conference on Machine Learning},
year = {2026}
}
``` |