--- license: mit language: - en base_model: - microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext tags: - biomedical - relation-extraction - text-classification --- # cell-cell-BERT **Configuration: R-pretrained** This model includes learned embeddings for special tokens (e.g., [CELL0], [CELL1]), acquired through continued pre-training on biomedical text. ## Model Description This is a specific configuration of the cell-cell-BERT model for extracting cell-cell interactions from biomedical text. It determines whether a sentence describes a direct biological relationship between two target cell types. For full details, see our paper: **"Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints"** (bioRxiv, 2025). * **Repository:** [https://github.com/mizuno-group/cell-cell-bert](https://github.com/mizuno-group/cell-cell-bert) * **Paper:** [https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726) ## Model Configuration This model corresponds to the following experimental setting in the paper: * **Entity Indication:** [Replacement (e.g., `[CELL0]`) / Boundary Marking (e.g., `...`)] * **Architecture:** [Entity-aware (R-BERT style) / CLS-only] * **Pre-training:** [Continued Pre-training (CPT) / Base (Fine-tuning only)] *Note: Please ensure your input data preprocessing matches the **Entity Indication** method specified above.* ## How to Get Started **Preprocessing Requirement:** Depending on the configuration above, you must insert specific special tokens into your input text before feeding it to the model. * **For Replacement models:** Replace cell names with `[CELL0]` and `[CELL1]`. * **For Boundary models:** Wrap cell names with `...` and `...`. ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # 1. Load the model model_name = "mizuno-group/ccbert-[INSERT-CONFIG-NAME]" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # 2. Prepare Input # CHANGE THIS LINE based on the Entity Indication method of this model: # text = "The [CELL0] activate [CELL1]." # If Replacement text = "The Macrophages activate T cells ." # If Boundary Marking # 3. Inference inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits predicted_class_id = logits.argmax().item() # 0 = No Relation, 1 = Relation Exists print(f"Predicted Class: {predicted_class_id}") ``` ## Citation ```bibtex @article{Yoshikawa2025CCBERT, title = {Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints}, author = {Yoshikawa Mei and Mizuno Tadahaya and Ohto Yohei and Fujimoto Hiromi and Kusuhara Hiroyuki}, journal = {bioRxiv}, year = {2025}, doi = {10.64898/2025.12.01.691726}, url = {[https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)} } ```