nielsr HF Staff commited on
Commit
4515de4
·
verified ·
1 Parent(s): eb6779c

Add model card and documentation for PT-RAG

Browse files

Hi! I'm Niels from the community science team at Hugging Face.

This PR adds a comprehensive model card for PT-RAG, a framework for predicting cellular responses to gene perturbations using differentiable retrieval-augmented generation.

The model card includes:
- Metadata for the pipeline tag and relevant biological tags.
- Information about the associated paper: [Retrieval-Augmented Generation for Predicting Cellular Responses to Gene Perturbation](https://huggingface.co/papers/2603.07233).
- Links to the official GitHub repository.
- Detailed installation and sample usage instructions for training and inference based on the repository documentation.

This documentation will help researchers understand and use your model more effectively on the Hub.

Files changed (1) hide show
  1. README.md +85 -3
README.md CHANGED
@@ -1,3 +1,85 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: other
4
+ tags:
5
+ - biology
6
+ - genomics
7
+ - gene-perturbation
8
+ - RAG
9
+ ---
10
+
11
+ # PT-RAG: Retrieval-Augmented Generation for Predicting Cellular Responses to Gene Perturbation
12
+
13
+ PT-RAG (Perturbation-aware Two-stage Retrieval-Augmented Generation) is a novel framework that extends Retrieval-Augmented Generation to cellular biology. It is designed to predict how cells respond to genetic perturbations by using a two-stage differentiable retrieval pipeline.
14
+
15
+ - **Paper:** [Retrieval-Augmented Generation for Predicting Cellular Responses to Gene Perturbation](https://huggingface.co/papers/2603.07233)
16
+ - **GitHub Repository:** [https://github.com/difra100/PT-RAG_ICLR](https://github.com/difra100/PT-RAG_ICLR)
17
+ - **Status:** Accepted at ICLR 2026 Workshop (Gen² @ ICLR 2026)
18
+
19
+ ## Overview
20
+
21
+ PT-RAG addresses the challenge of modeling single-cell perturbation responses by leveraging context-aware retrieval. Unlike standard RAG systems, it uses a differentiable mechanism to learn what constitutes relevant context. The pipeline consists of:
22
+ 1. **Candidate Retrieval**: Retrieving candidate perturbations using GenePT embeddings.
23
+ 2. **Adaptive Refinement**: Refining the selection through Gumbel-Softmax discrete sampling conditioned on cell state and input perturbation.
24
+
25
+ ## Installation
26
+
27
+ To set up the environment and install the necessary dependencies:
28
+
29
+ ```bash
30
+ # Create a new conda environment
31
+ conda create -n ptrag python=3.11 -y
32
+ conda activate ptrag
33
+
34
+ # Install the base package
35
+ pip install -e .
36
+
37
+ # Install RAG dependencies
38
+ pip install -r requirements.txt
39
+ ```
40
+
41
+ ## Sample Usage
42
+
43
+ ### Training PT-RAG
44
+ To train a model with differentiable retrieval and sparsity regularization:
45
+
46
+ ```bash
47
+ python -m state.__main__ tx train \
48
+ data.kwargs.toml_config_path=datasets/repogle_nadig_jurkat.toml \
49
+ training.rag=true \
50
+ training.differentiable_rag=true \
51
+ training.retrieve_than_predict=true \
52
+ training.gumbel_sparsity_loss=true \
53
+ training.gumbel_sparsity_weight=0.1 \
54
+ training.topk_rag=32 \
55
+ training.use_genept=true \
56
+ model=state \
57
+ output_dir=experiments/ptrag_model \
58
+ name=jurkat_ptrag_sparsity0.1
59
+ ```
60
+
61
+ ### Inference
62
+ The differentiable RAG index and learned weights are automatically loaded during inference:
63
+
64
+ ```bash
65
+ python -m state.__main__ tx predict \
66
+ --output-dir experiments/ptrag_model \
67
+ --checkpoint last.ckpt \
68
+ --eval-genept-pert
69
+ ```
70
+
71
+ ## Citation
72
+
73
+ If you find this work useful, please cite:
74
+
75
+ ```bibtex
76
+ @article{difrancesco2026retrieval,
77
+ title={Retrieval-Augmented Generation for Predicting Cellular Responses to Gene Perturbation},
78
+ author={Di Francesco, Andrea Giuseppe and Rubbi, Andrea and Liò, Pietro},
79
+ journal={arXiv preprint arXiv:2603.07233},
80
+ year={2026}
81
+ }
82
+ ```
83
+
84
+ ## Acknowledgments
85
+ This repository builds upon the [State](https://github.com/ArcInstitute/state) model from the Arc Institute. Evaluation metrics are computed using the [GenGeneEval (GGE)](https://github.com/AndreaRubbi/GGE) library.