Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,61 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
## 🔬 How to Run Inference
|
| 7 |
+
|
| 8 |
+
The following example shows how to use `ncbi/Cell-o1` with structured input for reasoning-based cell type annotation.
|
| 9 |
+
The model expects both a system message and a user prompt containing multiple cells and candidate cell types.
|
| 10 |
+
|
| 11 |
+
```python
|
| 12 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
| 13 |
+
|
| 14 |
+
# 1. Load the model and tokenizer from the Hugging Face Hub
|
| 15 |
+
model_name = "ncbi/Cell-o1"
|
| 16 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 17 |
+
model = AutoModelForCausalLM.from_pretrained(model_name)
|
| 18 |
+
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
|
| 19 |
+
|
| 20 |
+
# 2. A minimal batch example with 3 cells and 3 candidate types
|
| 21 |
+
example = {
|
| 22 |
+
"system_msg": (
|
| 23 |
+
"You are an expert assistant specialized in cell type annotation. "
|
| 24 |
+
"You will be given a batch of N cells from the same donor, where each cell represents a unique cell type. "
|
| 25 |
+
"For each cell, the top expressed genes are provided in descending order of expression. "
|
| 26 |
+
"Using both the gene expression data and donor information, determine the correct cell type for each cell. "
|
| 27 |
+
"You will also receive a list of N candidate cell types, and each candidate must be assigned to exactly one cell. "
|
| 28 |
+
"Ensure that you consider all cells and candidate types together, rather than annotating each cell individually. "
|
| 29 |
+
"Include your detailed reasoning within <think> and </think> tags, and provide your final answer within <answer> and </answer> tags. "
|
| 30 |
+
"The final answer should be a single string listing the assigned cell types in order, separated by ' | '."
|
| 31 |
+
),
|
| 32 |
+
|
| 33 |
+
"user_msg": (
|
| 34 |
+
"Context: The cell is from a female at the 73-year-old stage, originating from the lung. The patient has been diagnosed with chronic obstructive pulmonary disease. The patient is a smoker. There is no cancer present. \n\n"
|
| 35 |
+
"Cell 1: MT2A, ACTB, MT1X, MTATP6P29, MYL9, MTND4LP30, CRIP1, DSTN, MTND2P13, MTCO2P22, S100A6, MTCYBP19, MALAT1, VIM, RPLP1, RGS5, TPT1, LGALS1, TPM2, MTND3P6, MTND1P22, PTMA, TMSB4X, STEAP1B, MT1M, LPP, RPL21\n"
|
| 36 |
+
"Cell 2: MALAT1, FTL, MTCO2P22, TMSB4X, B2M, MTND4LP30, IL6ST, RPS19, RBFOX2, CCSER1, RPL41, RPS27, RPL10, ACTB, MTATP6P29, MTND2P13, RPS12, STEAP1B, RPL13A, S100A4, RPL34, TMSB10, RPL28, RPL32, RPL39, RPL13\n"
|
| 37 |
+
"Cell 3: SCGB3A1, SCGB1A1, SLPI, WFDC2, TPT1, MTCO2P22, B2M, RPS18, RPS4X, RPS6, MTND4LP30, RPL34, RPS14, RPL31, STEAP1B, LCN2, RPLP1, IL6ST, S100A6, RPL21, RPL37A, ADGRL3, RPL37, RBFOX2, RPL41, RARRES1, RPL19\n\n"
|
| 38 |
+
"Match the cells above to one of the following cell types:\n"
|
| 39 |
+
"non-classical monocyte\nepithelial cell of lung\nsmooth muscle cell"
|
| 40 |
+
)
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
# 3. Convert to chat-style messages
|
| 44 |
+
messages = [
|
| 45 |
+
{"role": "system", "content": example["system_msg"]},
|
| 46 |
+
{"role": "user", "content": example["user_msg"]}
|
| 47 |
+
]
|
| 48 |
+
|
| 49 |
+
# 4. Run inference
|
| 50 |
+
response = generator(
|
| 51 |
+
messages,
|
| 52 |
+
max_new_tokens=1000, # increase if your reasoning chain is longer
|
| 53 |
+
do_sample=False # deterministic decoding
|
| 54 |
+
)[0]["generated_text"]
|
| 55 |
+
|
| 56 |
+
# 5. Print the model’s reply (<think> + <answer>)
|
| 57 |
+
assistant_reply = response[-1]["content"] if isinstance(response, list) else response
|
| 58 |
+
print(assistant_reply)
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
```
|