File size: 2,364 Bytes
a103a2c
 
8ba087c
 
a103a2c
 
 
 
 
 
 
 
 
 
8ba087c
 
 
 
a103a2c
8ba087c
a103a2c
8ba087c
a103a2c
 
 
8ba087c
 
 
 
a103a2c
8ba087c
 
a103a2c
 
 
 
 
8ba087c
 
 
a103a2c
8ba087c
 
 
 
 
 
 
a103a2c
 
8ba087c
 
a103a2c
 
 
 
 
 
 
 
 
8ba087c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
library_name: pytorch
license: mit
pipeline_tag: text-generation
tags:
- protein-sequence-generation
- flow-matching
- bioinformatics
- protein-language-models
- pfam
---

# LineageFlow RP55 Checkpoint

LineageFlow is a Dirichlet flow-matching model designed for high-fidelity, family-aware protein sequence generation. It initializes generation from lineage priors derived from ancestral sequence reconstruction (ASR), turning generation into structured mutation from an evolved scaffold.

- **Paper:** [LineageFlow: Flow Matching for High-Fidelity Family-Aware Protein Sequence Generation](https://huggingface.co/papers/2605.22252)
- **Code:** [GitHub Repository](https://github.com/Jinx-byebye/LineageFlow)

## Model Description

Current discrete generative models for proteins often start from uniform or masked-token noise, which can discard position-specific constraints induced by evolution. LineageFlow addresses this by using phylogeny-informed priors to maintain family validity and structural confidence while exploring within-family diversity. Across diverse protein families, LineageFlow achieves family validity close to natural sequences and improves predicted structural confidence over uniform or mask-initialized baselines.

## Usage

### Download Checkpoint

You can download the checkpoint using the Hugging Face CLI:

```bash
pip install -U "huggingface_hub[cli]"

hf download jinxbye/LineageFlow \
  lineageflow-rp55.ckpt \
  --local-dir checkpoints
```

### Batch Generation

To generate a batch of sequences using the official inference script, run:

```bash
python inference/batch_generate.py \
  --config config/generation.json \
  --ckpt checkpoints/lineageflow-rp55.ckpt \
  --num-samples 512 \
  --gpus all \
  --out outputs/lineageflow_samples.fasta
```

For more detailed instructions on installation and single-family generation, please refer to the [GitHub repository](https://github.com/Jinx-byebye/LineageFlow).

## Citation

```bibtex
@inproceedings{liang2026lineageflow,
  title     = {LineageFlow: Flow Matching for High-Fidelity Family-Aware Protein Sequence Generation},
  author    = {Liang, Langzhang and Yang, Ming and Feng, Yi and Li, Junfan and Pan, Shirui and Xu, Yinghui and Ying, Tianlei and Zheng, Yizhen and Xu, Zenglin},
  booktitle = {International Conference on Machine Learning},
  year      = {2026}
}
```