Update README.md
Browse files
README.md
CHANGED
|
@@ -17,10 +17,9 @@ tags:
|
|
| 17 |
<h2 align="center"> ChatCell: Facilitating Single-Cell Analysis with Natural Language </h2>
|
| 18 |
|
| 19 |
<p align="center">
|
| 20 |
-
<a href="https://
|
| 21 |
<a href="https://huggingface.co/datasets/zjunlp/ChatCell-Instructions">🤗 Dataset</a> •
|
| 22 |
<a href="https://huggingface.co/spaces/zjunlp/Chatcell">🍎 Demo</a> •
|
| 23 |
-
<a href="https://arxiv.org/abs/2402.08303">📑 Paper</a> •
|
| 24 |
<a href="#1">🏖️ Overview</a> •
|
| 25 |
<a href="#2">🧬 Single-cell Analysis Tasks</a> •
|
| 26 |
<a href="#3">🛠️ Quickstart</a> •
|
|
@@ -36,37 +35,35 @@ tags:
|
|
| 36 |
|
| 37 |
## 📌 Table of Contents
|
| 38 |
|
| 39 |
-
- [
|
| 40 |
-
- [🧬 Single-cell Analysis Tasks](#
|
| 41 |
-
- [🛠️ Quickstart](#3)
|
| 42 |
- [📝 Cite](#4)
|
| 43 |
|
| 44 |
|
| 45 |
---
|
| 46 |
|
| 47 |
-
<h2 id="1">🏖️ Overview</h2>
|
| 48 |
|
| 49 |
-
|
| 50 |
-
- Single-cell biology examines the intricate functions of the cells, ranging from energy production to genetic information transfer, playing a critical role in unraveling the fundamental principles of life and mechanisms influencing health and disease.
|
| 51 |
-
- The field has witnessed a surge in single-cell RNA sequencing (scRNA-seq) data, driven by advancements in high-throughput sequencing and reduced costs.
|
| 52 |
-
- Traditional single-cell foundation models leverage extensive scRNA-seq datasets, applying NLP techniques to analyze gene expression matrices—structured formats that simplify scRNA-seq data into computationally tractable representations—during pre-training. They are subsequently fine-tuned for distinct single-cell analysis tasks, as shown in Figure (a).
|
| 53 |
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
-
- Initially, we convert scRNA-seq data into a single-cell language that LLMs can readily interpret.
|
| 64 |
-
- Subsequently, we employ templates to integrate this single-cell language with task descriptions and target outcomes, creating comprehensive single-cell instructions.
|
| 65 |
-
- To improve the LLM's expertise in the single-cell domain, we conduct vocabulary adaptation, enriching the model with a specialized single-cell lexicon.
|
| 66 |
-
- Following this, we utilize unified sequence generation to empower the model to adeptly execute a range of single-cell tasks.
|
| 67 |
|
| 68 |
|
| 69 |
-
<h2 id="
|
| 70 |
|
| 71 |
We concentrate on the following single-cell tasks:
|
| 72 |
|
|
@@ -101,34 +98,13 @@ The drug sensitivity prediction task aims to predict the response of different c
|
|
| 101 |
<img src="./figures/example4.jpg" alt="image" width=80%>
|
| 102 |
</p>
|
| 103 |
|
| 104 |
-
<h2 id="3">🛠️ Quickstart</h2>
|
| 105 |
-
|
| 106 |
-
```python
|
| 107 |
-
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
| 108 |
-
|
| 109 |
-
tokenizer = AutoTokenizer.from_pretrained("zjunlp/chatcell-small")
|
| 110 |
-
model = AutoModelForSeq2SeqLM.from_pretrained("zjunlp/chatcell-small")
|
| 111 |
-
input_text="Detail the 100 starting genes for a Mix, ranked by expression level: "
|
| 112 |
-
|
| 113 |
-
# Encode the input text and generate a response with specified generation parameters
|
| 114 |
-
input_ids = tokenizer(input_text,return_tensors="pt").input_ids
|
| 115 |
-
output_ids = model.generate(input_ids, max_length=512, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, do_sample=True)
|
| 116 |
-
|
| 117 |
-
# Decode and print the generated output text
|
| 118 |
-
output_text = tokenizer.decode(output_ids[0],skip_special_tokens=True)
|
| 119 |
-
print(output_text)
|
| 120 |
-
```
|
| 121 |
-
|
| 122 |
-
|
| 123 |
|
| 124 |
<h2 id="4">📝 Cite</h2>
|
| 125 |
|
| 126 |
-
If you use our repository, please cite the following related paper:
|
| 127 |
```
|
| 128 |
@article{fang2024chatcell,
|
| 129 |
title={ChatCell: Facilitating Single-Cell Analysis with Natural Language},
|
| 130 |
author={Fang, Yin and Liu, Kangwei and Zhang, Ningyu and Deng, Xinle and Yang, Penghui and Chen, Zhuo and Tang, Xiangru and Gerstein, Mark and Fan, Xiaohui and Chen, Huajun},
|
| 131 |
-
journal={arXiv preprint arXiv:2402.08303},
|
| 132 |
year={2024},
|
| 133 |
}
|
| 134 |
```
|
|
|
|
| 17 |
<h2 align="center"> ChatCell: Facilitating Single-Cell Analysis with Natural Language </h2>
|
| 18 |
|
| 19 |
<p align="center">
|
| 20 |
+
<a href="https://chat.openai.com/g/g-vUwj222gQ-chatcell">💻GPTStore App</a> •
|
| 21 |
<a href="https://huggingface.co/datasets/zjunlp/ChatCell-Instructions">🤗 Dataset</a> •
|
| 22 |
<a href="https://huggingface.co/spaces/zjunlp/Chatcell">🍎 Demo</a> •
|
|
|
|
| 23 |
<a href="#1">🏖️ Overview</a> •
|
| 24 |
<a href="#2">🧬 Single-cell Analysis Tasks</a> •
|
| 25 |
<a href="#3">🛠️ Quickstart</a> •
|
|
|
|
| 35 |
|
| 36 |
## 📌 Table of Contents
|
| 37 |
|
| 38 |
+
- [🛠️ Quickstart](#2)
|
| 39 |
+
- [🧬 Single-cell Analysis Tasks](#3)
|
|
|
|
| 40 |
- [📝 Cite](#4)
|
| 41 |
|
| 42 |
|
| 43 |
---
|
| 44 |
|
|
|
|
| 45 |
|
| 46 |
+
<h2 id="2">🛠️ Quickstart</h2>
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
+
```python
|
| 49 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
| 50 |
+
|
| 51 |
+
tokenizer = AutoTokenizer.from_pretrained("zjunlp/chatcell-small")
|
| 52 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("zjunlp/chatcell-small")
|
| 53 |
+
input_text="Detail the 100 starting genes for a Mix, ranked by expression level: "
|
| 54 |
+
|
| 55 |
+
# Encode the input text and generate a response with specified generation parameters
|
| 56 |
+
input_ids = tokenizer(input_text,return_tensors="pt").input_ids
|
| 57 |
+
output_ids = model.generate(input_ids, max_length=512, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, do_sample=True)
|
| 58 |
+
|
| 59 |
+
# Decode and print the generated output text
|
| 60 |
+
output_text = tokenizer.decode(output_ids[0],skip_special_tokens=True)
|
| 61 |
+
print(output_text)
|
| 62 |
+
```
|
| 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
|
| 66 |
+
<h2 id="3">🧬 Single-cell Analysis Tasks</h2>
|
| 67 |
|
| 68 |
We concentrate on the following single-cell tasks:
|
| 69 |
|
|
|
|
| 98 |
<img src="./figures/example4.jpg" alt="image" width=80%>
|
| 99 |
</p>
|
| 100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
<h2 id="4">📝 Cite</h2>
|
| 103 |
|
|
|
|
| 104 |
```
|
| 105 |
@article{fang2024chatcell,
|
| 106 |
title={ChatCell: Facilitating Single-Cell Analysis with Natural Language},
|
| 107 |
author={Fang, Yin and Liu, Kangwei and Zhang, Ningyu and Deng, Xinle and Yang, Penghui and Chen, Zhuo and Tang, Xiangru and Gerstein, Mark and Fan, Xiaohui and Chen, Huajun},
|
|
|
|
| 108 |
year={2024},
|
| 109 |
}
|
| 110 |
```
|