Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
|
| 7 |
+
tags:
|
| 8 |
+
- biomedical
|
| 9 |
+
- relation-extraction
|
| 10 |
+
- text-classification
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# cell-cell-BERT
|
| 14 |
+
|
| 15 |
+
**Configuration: R-ENT-CPT**
|
| 16 |
+
|
| 17 |
+
This model includes learned embeddings for special tokens (e.g., [CELL0], [CELL1]), acquired through continued pre-training on biomedical text.
|
| 18 |
+
|
| 19 |
+
## Model Description
|
| 20 |
+
|
| 21 |
+
This is a specific configuration of the cell-cell-BERT model for extracting cell-cell interactions from biomedical text. It determines whether a sentence describes a direct biological relationship between two target cell types.
|
| 22 |
+
|
| 23 |
+
For full details, see our paper: **"Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints"** (bioRxiv, 2025).
|
| 24 |
+
|
| 25 |
+
* **Repository:** [https://github.com/mizuno-group/cell-cell-bert](https://github.com/mizuno-group/cell-cell-bert)
|
| 26 |
+
* **Paper:** [https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)
|
| 27 |
+
|
| 28 |
+
## Model Configuration
|
| 29 |
+
This model corresponds to the following experimental setting in the paper:
|
| 30 |
+
|
| 31 |
+
* **Entity Indication:** [Replacement (e.g., `[CELL0]`) / Boundary Marking (e.g., `<E0>...`)]
|
| 32 |
+
* **Architecture:** [Entity-aware (R-BERT style) / CLS-only]
|
| 33 |
+
* **Pre-training:** [Continued Pre-training (CPT) / Base (Fine-tuning only)]
|
| 34 |
+
|
| 35 |
+
*Note: Please ensure your input data preprocessing matches the **Entity Indication** method specified above.*
|
| 36 |
+
|
| 37 |
+
## How to Get Started
|
| 38 |
+
|
| 39 |
+
**Preprocessing Requirement:**
|
| 40 |
+
Depending on the configuration above, you must insert specific special tokens into your input text before feeding it to the model.
|
| 41 |
+
|
| 42 |
+
* **For Replacement models:** Replace cell names with `[CELL0]` and `[CELL1]`.
|
| 43 |
+
* **For Boundary models:** Wrap cell names with `<E0>...</E0>` and `<E1>...</E1>`.
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 47 |
+
import torch
|
| 48 |
+
|
| 49 |
+
# 1. Load the model
|
| 50 |
+
model_name = "mizuno-group/ccbert-[INSERT-CONFIG-NAME]"
|
| 51 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 52 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
| 53 |
+
|
| 54 |
+
# 2. Prepare Input
|
| 55 |
+
# CHANGE THIS LINE based on the Entity Indication method of this model:
|
| 56 |
+
# text = "The [CELL0] activate [CELL1]." # If Replacement
|
| 57 |
+
text = "The <E0> Macrophages </E0> activate <E1> T cells </E1>." # If Boundary Marking
|
| 58 |
+
|
| 59 |
+
# 3. Inference
|
| 60 |
+
inputs = tokenizer(text, return_tensors="pt")
|
| 61 |
+
|
| 62 |
+
with torch.no_grad():
|
| 63 |
+
logits = model(**inputs).logits
|
| 64 |
+
predicted_class_id = logits.argmax().item()
|
| 65 |
+
|
| 66 |
+
# 0 = No Relation, 1 = Relation Exists
|
| 67 |
+
print(f"Predicted Class: {predicted_class_id}")
|
| 68 |
+
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Citation
|
| 72 |
+
|
| 73 |
+
```bibtex
|
| 74 |
+
@article{Yoshikawa2025CCBERT,
|
| 75 |
+
title = {Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints},
|
| 76 |
+
author = {Yoshikawa Mei and Mizuno Tadahaya and Ohto Yohei and Fujimoto Hiromi and Kusuhara Hiroyuki},
|
| 77 |
+
journal = {bioRxiv},
|
| 78 |
+
year = {2025},
|
| 79 |
+
doi = {10.64898/2025.12.01.691726},
|
| 80 |
+
url = {[https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)}
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
```
|