| | --- |
| | {} |
| | --- |
| | |
| | # Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning |
| |
|
| | > [!Note] |
| | > Please refer to our [repository](https://github.com/ncbi-nlp/cell-o1) and [paper](https://www.arxiv.org/abs/2506.02911) for more details. |
| |
|
| | ## 🧠 Overview |
| | Cell type annotation is a key task in analyzing the heterogeneity of single-cell RNA sequencing data. Although recent foundation models automate this process, they typically annotate cells independently, without considering batch-level cellular context or providing explanatory reasoning. In contrast, human experts often annotate distinct cell types for different cell clusters based on their domain knowledge. |
| | To mimic this expert behavior, we introduce ***CellPuzzles***—a benchmark requiring unique cell-type assignments across cell batches. Existing LLMs struggle with this task, with the best baseline (OpenAI's o1) achieving only 19.0% batch accuracy. To address this, we present ***Cell-o1***, a reasoning-enhanced language model trained via SFT on distilled expert traces, followed by RL with batch-level rewards. ***Cell-o1*** outperforms all baselines on both cell-level and batch-level metrics, and exhibits emergent behaviors such as self-reflection and curriculum reasoning, offering insights into its interpretability and generalization. |
| |
|
| |
|
| |
|
| | ## 🚀 How to Run Inference |
| |
|
| | The following example shows how to use `ncbi/Cell-o1` with structured input for reasoning-based cell type annotation. |
| | The model expects both a system message and a user prompt containing multiple cells and candidate cell types. |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline |
| | |
| | # 1. Load the model and tokenizer from the Hugging Face Hub |
| | model_name = "ncbi/Cell-o1" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForCausalLM.from_pretrained(model_name) |
| | generator = pipeline("text-generation", model=model, tokenizer=tokenizer) |
| | |
| | # 2. A minimal batch example with 3 cells and 3 candidate types |
| | example = { |
| | "system_msg": ( |
| | "You are an expert assistant specialized in cell type annotation. " |
| | "You will be given a batch of N cells from the same donor, where each cell represents a unique cell type. " |
| | "For each cell, the top expressed genes are provided in descending order of expression. " |
| | "Using both the gene expression data and donor information, determine the correct cell type for each cell. " |
| | "You will also receive a list of N candidate cell types, and each candidate must be assigned to exactly one cell. " |
| | "Ensure that you consider all cells and candidate types together, rather than annotating each cell individually. " |
| | "Include your detailed reasoning within <think> and </think> tags, and provide your final answer within <answer> and </answer> tags. " |
| | "The final answer should be a single string listing the assigned cell types in order, separated by ' | '." |
| | ), |
| | |
| | "user_msg": ( |
| | "Context: The cell is from a female at the 73-year-old stage, originating from the lung. The patient has been diagnosed with chronic obstructive pulmonary disease. The patient is a smoker. There is no cancer present. \n\n" |
| | "Cell 1: MT2A, ACTB, MT1X, MTATP6P29, MYL9, MTND4LP30, CRIP1, DSTN, MTND2P13, MTCO2P22, S100A6, MTCYBP19, MALAT1, VIM, RPLP1, RGS5, TPT1, LGALS1, TPM2, MTND3P6, MTND1P22, PTMA, TMSB4X, STEAP1B, MT1M, LPP, RPL21\n" |
| | "Cell 2: MALAT1, FTL, MTCO2P22, TMSB4X, B2M, MTND4LP30, IL6ST, RPS19, RBFOX2, CCSER1, RPL41, RPS27, RPL10, ACTB, MTATP6P29, MTND2P13, RPS12, STEAP1B, RPL13A, S100A4, RPL34, TMSB10, RPL28, RPL32, RPL39, RPL13\n" |
| | "Cell 3: SCGB3A1, SCGB1A1, SLPI, WFDC2, TPT1, MTCO2P22, B2M, RPS18, RPS4X, RPS6, MTND4LP30, RPL34, RPS14, RPL31, STEAP1B, LCN2, RPLP1, IL6ST, S100A6, RPL21, RPL37A, ADGRL3, RPL37, RBFOX2, RPL41, RARRES1, RPL19\n\n" |
| | "Match the cells above to one of the following cell types:\n" |
| | "non-classical monocyte\nepithelial cell of lung\nsmooth muscle cell" |
| | ) |
| | } |
| | |
| | # 3. Convert to chat-style messages |
| | messages = [ |
| | {"role": "system", "content": example["system_msg"]}, |
| | {"role": "user", "content": example["user_msg"]} |
| | ] |
| | |
| | # 4. Run inference |
| | response = generator( |
| | messages, |
| | max_new_tokens=1000, # increase if your reasoning chain is longer |
| | do_sample=False # deterministic decoding |
| | )[0]["generated_text"] |
| | |
| | # 5. Print the model’s reply (<think> + <answer>) |
| | assistant_reply = response[-1]["content"] if isinstance(response, list) else response |
| | print(assistant_reply) |
| | ``` |
| |
|
| |
|
| | ## 🔖 Citation |
| |
|
| | If you use our repository, please cite the following related paper: |
| |
|
| | ``` |
| | @article{fang2025cello1, |
| | title={Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning}, |
| | author={Fang, Yin and Jin, Qiao and Xiong, Guangzhi and Jin, Bowen and Zhong, Xianrui and Ouyang, Siru and Zhang, Aidong and Han, Jiawei and Lu, Zhiyong}, |
| | journal={arXiv preprint arXiv:2506.02911}, |
| | year={2025} |
| | } |
| | ``` |
| |
|
| | ## 🫱🏻🫲 Acknowledgements |
| |
|
| | This research was supported by the Division of Intramural Research (DIR) of the National Library of Medicine (NLM), National Institutes of Health. |