LineageFlow / README.md
nielsr's picture
nielsr HF Staff
Improve model card and add metadata
8ba087c verified
|
raw
history blame
2.36 kB
---
library_name: pytorch
license: mit
pipeline_tag: text-generation
tags:
- protein-sequence-generation
- flow-matching
- bioinformatics
- protein-language-models
- pfam
---
# LineageFlow RP55 Checkpoint
LineageFlow is a Dirichlet flow-matching model designed for high-fidelity, family-aware protein sequence generation. It initializes generation from lineage priors derived from ancestral sequence reconstruction (ASR), turning generation into structured mutation from an evolved scaffold.
- **Paper:** [LineageFlow: Flow Matching for High-Fidelity Family-Aware Protein Sequence Generation](https://huggingface.co/papers/2605.22252)
- **Code:** [GitHub Repository](https://github.com/Jinx-byebye/LineageFlow)
## Model Description
Current discrete generative models for proteins often start from uniform or masked-token noise, which can discard position-specific constraints induced by evolution. LineageFlow addresses this by using phylogeny-informed priors to maintain family validity and structural confidence while exploring within-family diversity. Across diverse protein families, LineageFlow achieves family validity close to natural sequences and improves predicted structural confidence over uniform or mask-initialized baselines.
## Usage
### Download Checkpoint
You can download the checkpoint using the Hugging Face CLI:
```bash
pip install -U "huggingface_hub[cli]"
hf download jinxbye/LineageFlow \
lineageflow-rp55.ckpt \
--local-dir checkpoints
```
### Batch Generation
To generate a batch of sequences using the official inference script, run:
```bash
python inference/batch_generate.py \
--config config/generation.json \
--ckpt checkpoints/lineageflow-rp55.ckpt \
--num-samples 512 \
--gpus all \
--out outputs/lineageflow_samples.fasta
```
For more detailed instructions on installation and single-family generation, please refer to the [GitHub repository](https://github.com/Jinx-byebye/LineageFlow).
## Citation
```bibtex
@inproceedings{liang2026lineageflow,
title = {LineageFlow: Flow Matching for High-Fidelity Family-Aware Protein Sequence Generation},
author = {Liang, Langzhang and Yang, Ming and Feng, Yi and Li, Junfan and Pan, Shirui and Xu, Yinghui and Ying, Tianlei and Zheng, Yizhen and Xu, Zenglin},
booktitle = {International Conference on Machine Learning},
year = {2026}
}
```