|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext |
|
|
tags: |
|
|
- biomedical |
|
|
- relation-extraction |
|
|
- text-classification |
|
|
--- |
|
|
|
|
|
# cell-cell-BERT |
|
|
|
|
|
**Configuration: R-CLS-base** |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This is a specific configuration of the cell-cell-BERT model for extracting cell-cell interactions from biomedical text. It determines whether a sentence describes a direct biological relationship between two target cell types. |
|
|
|
|
|
For full details, see our paper: **"Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints"** (bioRxiv, 2025). |
|
|
|
|
|
* **Repository:** [https://github.com/mizuno-group/cell-cell-bert](https://github.com/mizuno-group/cell-cell-bert) |
|
|
* **Paper:** [https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726) |
|
|
|
|
|
## Model Configuration |
|
|
This model corresponds to the following experimental setting in the paper: |
|
|
|
|
|
* **Entity Indication:** [Replacement (e.g., `[CELL0]`) / Boundary Marking (e.g., `<E0>...`)] |
|
|
* **Architecture:** [Entity-aware (R-BERT style) / CLS-only] |
|
|
* **Pre-training:** [Continued Pre-training (CPT) / Base (Fine-tuning only)] |
|
|
|
|
|
*Note: Please ensure your input data preprocessing matches the **Entity Indication** method specified above.* |
|
|
|
|
|
## How to Get Started |
|
|
|
|
|
**Preprocessing Requirement:** |
|
|
Depending on the configuration above, you must insert specific special tokens into your input text before feeding it to the model. |
|
|
|
|
|
* **For Replacement models:** Replace cell names with `[CELL0]` and `[CELL1]`. |
|
|
* **For Boundary models:** Wrap cell names with `<E0>...</E0>` and `<E1>...</E1>`. |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# 1. Load the model |
|
|
model_name = "mizuno-group/ccbert-[INSERT-CONFIG-NAME]" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
# 2. Prepare Input |
|
|
# CHANGE THIS LINE based on the Entity Indication method of this model: |
|
|
# text = "The [CELL0] activate [CELL1]." # If Replacement |
|
|
text = "The <E0> Macrophages </E0> activate <E1> T cells </E1>." # If Boundary Marking |
|
|
|
|
|
# 3. Inference |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
predicted_class_id = logits.argmax().item() |
|
|
|
|
|
# 0 = No Relation, 1 = Relation Exists |
|
|
print(f"Predicted Class: {predicted_class_id}") |
|
|
|
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{Yoshikawa2025CCBERT, |
|
|
title = {Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints}, |
|
|
author = {Yoshikawa Mei and Mizuno Tadahaya and Ohto Yohei and Fujimoto Hiromi and Kusuhara Hiroyuki}, |
|
|
journal = {bioRxiv}, |
|
|
year = {2025}, |
|
|
doi = {10.64898/2025.12.01.691726}, |
|
|
url = {[https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)} |
|
|
} |
|
|
|
|
|
``` |