ncbi
/

Cell-o1

@@ -2,8 +2,18 @@
 license: apache-2.0
 ---
-## 🔬 How to Run Inference
 The following example shows how to use `ncbi/Cell-o1` with structured input for reasoning-based cell type annotation.
 The model expects both a system message and a user prompt containing multiple cells and candidate cell types.
@@ -56,4 +66,22 @@ response = generator(
 # 5. Print the model’s reply (<think> + <answer>)
 assistant_reply = response[-1]["content"] if isinstance(response, list) else response
 print(assistant_reply)
-```

 license: apache-2.0
 ---
+# Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning
+> [!Note]
+> Please refer to our [repository](https://github.com/ncbi-nlp/cell-o1) and [paper](https://www.arxiv.org/abs/2506.02911) for more details.
+## 🧠 Overview
+Cell type annotation is a key task in analyzing the heterogeneity of single-cell RNA sequencing data. Although recent foundation models automate this process, they typically annotate cells independently, without considering batch-level cellular context or providing explanatory reasoning. In contrast, human experts often annotate distinct cell types for different cell clusters based on their domain knowledge.
+To mimic this expert behavior, we introduce ***CellPuzzles***—a benchmark requiring unique cell-type assignments across cell batches. Existing LLMs struggle with this task, with the best baseline (OpenAI's o1) achieving only 19.0% batch accuracy. To address this, we present ***Cell-o1***, a reasoning-enhanced language model trained via SFT on distilled expert traces, followed by RL with batch-level rewards. ***Cell-o1*** outperforms all baselines on both cell-level and batch-level metrics, and exhibits emergent behaviors such as self-reflection and curriculum reasoning, offering insights into its interpretability and generalization.
+## 🚀 How to Run Inference
 The following example shows how to use `ncbi/Cell-o1` with structured input for reasoning-based cell type annotation.
 The model expects both a system message and a user prompt containing multiple cells and candidate cell types.
 # 5. Print the model’s reply (<think> + <answer>)
 assistant_reply = response[-1]["content"] if isinstance(response, list) else response
 print(assistant_reply)
+```
+## 🔖 Citation
+If you use our repository, please cite the following related paper:
+```
+@article{fang2025cello1,
+  title={Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning},
+  author={Fang, Yin and Jin, Qiao and Xiong, Guangzhi and Jin, Bowen and Zhong, Xianrui and Ouyang, Siru and Zhang, Aidong and Han, Jiawei and Lu, Zhiyong},
+  journal={arXiv preprint arXiv:2506.02911},
+  year={2025}
+}
+```
+## 🫱🏻‍🫲 Acknowledgements
+This research was supported by the Division of Intramural Research (DIR) of the National Library of Medicine (NLM), National Institutes of Health.