ccbert-R-ENT-base / README.md
DevWithKaiju's picture
Create README.md
682b1b9 verified
---
license: mit
language:
- en
base_model:
- microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
tags:
- biomedical
- relation-extraction
- text-classification
---
# cell-cell-BERT
**Configuration: R-ENT-base**
## Model Description
This is a specific configuration of the cell-cell-BERT model for extracting cell-cell interactions from biomedical text. It determines whether a sentence describes a direct biological relationship between two target cell types.
For full details, see our paper: **"Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints"** (bioRxiv, 2025).
* **Repository:** [https://github.com/mizuno-group/cell-cell-bert](https://github.com/mizuno-group/cell-cell-bert)
* **Paper:** [https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)
## Model Configuration
This model corresponds to the following experimental setting in the paper:
* **Entity Indication:** [Replacement (e.g., `[CELL0]`) / Boundary Marking (e.g., `<E0>...`)]
* **Architecture:** [Entity-aware (R-BERT style) / CLS-only]
* **Pre-training:** [Continued Pre-training (CPT) / Base (Fine-tuning only)]
*Note: Please ensure your input data preprocessing matches the **Entity Indication** method specified above.*
## How to Get Started
**Preprocessing Requirement:**
Depending on the configuration above, you must insert specific special tokens into your input text before feeding it to the model.
* **For Replacement models:** Replace cell names with `[CELL0]` and `[CELL1]`.
* **For Boundary models:** Wrap cell names with `<E0>...</E0>` and `<E1>...</E1>`.
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# 1. Load the model
model_name = "mizuno-group/ccbert-[INSERT-CONFIG-NAME]"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# 2. Prepare Input
# CHANGE THIS LINE based on the Entity Indication method of this model:
# text = "The [CELL0] activate [CELL1]." # If Replacement
text = "The <E0> Macrophages </E0> activate <E1> T cells </E1>." # If Boundary Marking
# 3. Inference
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
# 0 = No Relation, 1 = Relation Exists
print(f"Predicted Class: {predicted_class_id}")
```
## Citation
```bibtex
@article{Yoshikawa2025CCBERT,
title = {Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints},
author = {Yoshikawa Mei and Mizuno Tadahaya and Ohto Yohei and Fujimoto Hiromi and Kusuhara Hiroyuki},
journal = {bioRxiv},
year = {2025},
doi = {10.64898/2025.12.01.691726},
url = {[https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)}
}
```