Cell-o1 / README.md

Update README.md

89f0b3a verified 9 months ago

5.2 kB

	---
	{}
	---

	# Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning

	> [!Note]
	> Please refer to our [repository](https://github.com/ncbi-nlp/cell-o1) and [paper](https://www.arxiv.org/abs/2506.02911) for more details.

	## 🧠 Overview
	Cell type annotation is a key task in analyzing the heterogeneity of single-cell RNA sequencing data. Although recent foundation models automate this process, they typically annotate cells independently, without considering batch-level cellular context or providing explanatory reasoning. In contrast, human experts often annotate distinct cell types for different cell clusters based on their domain knowledge.
	To mimic this expert behavior, we introduce *CellPuzzles—a benchmark requiring unique cell-type assignments across cell batches. Existing LLMs struggle with this task, with the best baseline (OpenAI's o1) achieving only 19.0% batch accuracy. To address this, we present Cell-o1, a reasoning-enhanced language model trained via SFT on distilled expert traces, followed by RL with batch-level rewards. Cell-o1* outperforms all baselines on both cell-level and batch-level metrics, and exhibits emergent behaviors such as self-reflection and curriculum reasoning, offering insights into its interpretability and generalization.



	## 🚀 How to Run Inference

	The following example shows how to use `ncbi/Cell-o1` with structured input for reasoning-based cell type annotation.
	The model expects both a system message and a user prompt containing multiple cells and candidate cell types.

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

	# 1. Load the model and tokenizer from the Hugging Face Hub
	model_name = "ncbi/Cell-o1"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)
	generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

	# 2. A minimal batch example with 3 cells and 3 candidate types
	example = {
	"system_msg": (
	"You are an expert assistant specialized in cell type annotation. "
	"You will be given a batch of N cells from the same donor, where each cell represents a unique cell type. "
	"For each cell, the top expressed genes are provided in descending order of expression. "
	"Using both the gene expression data and donor information, determine the correct cell type for each cell. "
	"You will also receive a list of N candidate cell types, and each candidate must be assigned to exactly one cell. "
	"Ensure that you consider all cells and candidate types together, rather than annotating each cell individually. "
	"Include your detailed reasoning within <think> and </think> tags, and provide your final answer within <answer> and </answer> tags. "
	"The final answer should be a single string listing the assigned cell types in order, separated by ' \| '."
	),

	"user_msg": (
	"Context: The cell is from a female at the 73-year-old stage, originating from the lung. The patient has been diagnosed with chronic obstructive pulmonary disease. The patient is a smoker. There is no cancer present. \n\n"
	"Cell 1: MT2A, ACTB, MT1X, MTATP6P29, MYL9, MTND4LP30, CRIP1, DSTN, MTND2P13, MTCO2P22, S100A6, MTCYBP19, MALAT1, VIM, RPLP1, RGS5, TPT1, LGALS1, TPM2, MTND3P6, MTND1P22, PTMA, TMSB4X, STEAP1B, MT1M, LPP, RPL21\n"
	"Cell 2: MALAT1, FTL, MTCO2P22, TMSB4X, B2M, MTND4LP30, IL6ST, RPS19, RBFOX2, CCSER1, RPL41, RPS27, RPL10, ACTB, MTATP6P29, MTND2P13, RPS12, STEAP1B, RPL13A, S100A4, RPL34, TMSB10, RPL28, RPL32, RPL39, RPL13\n"
	"Cell 3: SCGB3A1, SCGB1A1, SLPI, WFDC2, TPT1, MTCO2P22, B2M, RPS18, RPS4X, RPS6, MTND4LP30, RPL34, RPS14, RPL31, STEAP1B, LCN2, RPLP1, IL6ST, S100A6, RPL21, RPL37A, ADGRL3, RPL37, RBFOX2, RPL41, RARRES1, RPL19\n\n"
	"Match the cells above to one of the following cell types:\n"
	"non-classical monocyte\nepithelial cell of lung\nsmooth muscle cell"
	)
	}

	# 3. Convert to chat-style messages
	messages = [
	{"role": "system", "content": example["system_msg"]},
	{"role": "user", "content": example["user_msg"]}
	]

	# 4. Run inference
	response = generator(
	messages,
	max_new_tokens=1000, # increase if your reasoning chain is longer
	do_sample=False # deterministic decoding
	)[0]["generated_text"]

	# 5. Print the model’s reply (<think> + <answer>)
	assistant_reply = response[-1]["content"] if isinstance(response, list) else response
	print(assistant_reply)
	```


	## 🔖 Citation

	If you use our repository, please cite the following related paper:

	```
	@article{fang2025cello1,
	title={Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning},
	author={Fang, Yin and Jin, Qiao and Xiong, Guangzhi and Jin, Bowen and Zhong, Xianrui and Ouyang, Siru and Zhang, Aidong and Han, Jiawei and Lu, Zhiyong},
	journal={arXiv preprint arXiv:2506.02911},
	year={2025}
	}
	```

	## 🫱🏻‍🫲 Acknowledgements

	This research was supported by the Division of Intramural Research (DIR) of the National Library of Medicine (NLM), National Institutes of Health.