Improve model card and add metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +27 -9
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
- license: mit
3
  library_name: pytorch
 
 
4
  tags:
5
  - protein-sequence-generation
6
  - flow-matching
@@ -11,27 +12,44 @@ tags:
11
 
12
  # LineageFlow RP55 Checkpoint
13
 
14
- This repository hosts the released LineageFlow checkpoint for family-aware protein sequence generation.
 
 
 
15
 
16
- ## Files
17
 
18
- - `lineageflow-rp55.ckpt`: LineageFlow denoiser checkpoint.
19
- - `SHA256SUMS`: checksum for verifying the checkpoint download.
20
 
21
  ## Usage
22
 
 
 
 
 
23
  ```bash
 
 
24
  hf download jinxbye/LineageFlow \
25
  lineageflow-rp55.ckpt \
26
  --local-dir checkpoints
27
  ```
28
 
29
- See the GitHub repository for inference and evaluation code:
 
 
30
 
31
- ```text
32
- https://github.com/jinxbye/LineageFlow
 
 
 
 
 
33
  ```
34
 
 
 
35
  ## Citation
36
 
37
  ```bibtex
@@ -41,4 +59,4 @@ https://github.com/jinxbye/LineageFlow
41
  booktitle = {International Conference on Machine Learning},
42
  year = {2026}
43
  }
44
- ```
 
1
  ---
 
2
  library_name: pytorch
3
+ license: mit
4
+ pipeline_tag: text-generation
5
  tags:
6
  - protein-sequence-generation
7
  - flow-matching
 
12
 
13
  # LineageFlow RP55 Checkpoint
14
 
15
+ LineageFlow is a Dirichlet flow-matching model designed for high-fidelity, family-aware protein sequence generation. It initializes generation from lineage priors derived from ancestral sequence reconstruction (ASR), turning generation into structured mutation from an evolved scaffold.
16
+
17
+ - **Paper:** [LineageFlow: Flow Matching for High-Fidelity Family-Aware Protein Sequence Generation](https://huggingface.co/papers/2605.22252)
18
+ - **Code:** [GitHub Repository](https://github.com/Jinx-byebye/LineageFlow)
19
 
20
+ ## Model Description
21
 
22
+ Current discrete generative models for proteins often start from uniform or masked-token noise, which can discard position-specific constraints induced by evolution. LineageFlow addresses this by using phylogeny-informed priors to maintain family validity and structural confidence while exploring within-family diversity. Across diverse protein families, LineageFlow achieves family validity close to natural sequences and improves predicted structural confidence over uniform or mask-initialized baselines.
 
23
 
24
  ## Usage
25
 
26
+ ### Download Checkpoint
27
+
28
+ You can download the checkpoint using the Hugging Face CLI:
29
+
30
  ```bash
31
+ pip install -U "huggingface_hub[cli]"
32
+
33
  hf download jinxbye/LineageFlow \
34
  lineageflow-rp55.ckpt \
35
  --local-dir checkpoints
36
  ```
37
 
38
+ ### Batch Generation
39
+
40
+ To generate a batch of sequences using the official inference script, run:
41
 
42
+ ```bash
43
+ python inference/batch_generate.py \
44
+ --config config/generation.json \
45
+ --ckpt checkpoints/lineageflow-rp55.ckpt \
46
+ --num-samples 512 \
47
+ --gpus all \
48
+ --out outputs/lineageflow_samples.fasta
49
  ```
50
 
51
+ For more detailed instructions on installation and single-family generation, please refer to the [GitHub repository](https://github.com/Jinx-byebye/LineageFlow).
52
+
53
  ## Citation
54
 
55
  ```bibtex
 
59
  booktitle = {International Conference on Machine Learning},
60
  year = {2026}
61
  }
62
+ ```