mizuno-group
/

ccbert-R-ENT-CPT

Text Classification

relation-extraction

Model card Files Files and versions

DevWithKaiju commited on 25 days ago

Commit

bc7102b

·

verified ·

1 Parent(s): a074bb0

Create README.md

Files changed (1) hide show

README.md +83 -0

README.md ADDED Viewed

	@@ -0,0 +1,83 @@

+---
+license: mit
+language:
+- en
+base_model:
+- microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
+tags:
+- biomedical
+- relation-extraction
+- text-classification
+---
+# cell-cell-BERT
+**Configuration: R-ENT-CPT**
+This model includes learned embeddings for special tokens (e.g., [CELL0], [CELL1]), acquired through continued pre-training on biomedical text.
+## Model Description
+This is a specific configuration of the cell-cell-BERT model for extracting cell-cell interactions from biomedical text. It determines whether a sentence describes a direct biological relationship between two target cell types.
+For full details, see our paper: **"Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints"** (bioRxiv, 2025).
+* **Repository:** [https://github.com/mizuno-group/cell-cell-bert](https://github.com/mizuno-group/cell-cell-bert)
+* **Paper:** [https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)
+## Model Configuration
+This model corresponds to the following experimental setting in the paper:
+* **Entity Indication:** [Replacement (e.g., `[CELL0]`) / Boundary Marking (e.g., `<E0>...`)]
+* **Architecture:** [Entity-aware (R-BERT style) / CLS-only]
+* **Pre-training:** [Continued Pre-training (CPT) / Base (Fine-tuning only)]
+*Note: Please ensure your input data preprocessing matches the **Entity Indication** method specified above.*
+## How to Get Started
+**Preprocessing Requirement:**
+Depending on the configuration above, you must insert specific special tokens into your input text before feeding it to the model.
+* **For Replacement models:** Replace cell names with `[CELL0]` and `[CELL1]`.
+* **For Boundary models:** Wrap cell names with `<E0>...</E0>` and `<E1>...</E1>`.
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# 1. Load the model
+model_name = "mizuno-group/ccbert-[INSERT-CONFIG-NAME]"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# 2. Prepare Input
+# CHANGE THIS LINE based on the Entity Indication method of this model:
+# text = "The [CELL0] activate [CELL1]."  # If Replacement
+text = "The <E0> Macrophages </E0> activate <E1> T cells </E1>." # If Boundary Marking
+# 3. Inference
+inputs = tokenizer(text, return_tensors="pt")
+with torch.no_grad():
+    logits = model(**inputs).logits
+    predicted_class_id = logits.argmax().item()
+# 0 = No Relation, 1 = Relation Exists
+print(f"Predicted Class: {predicted_class_id}")
+```
+## Citation
+```bibtex
+@article{Yoshikawa2025CCBERT,
+  title   = {Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints},
+  author  = {Yoshikawa Mei and Mizuno Tadahaya and Ohto Yohei and Fujimoto Hiromi and Kusuhara Hiroyuki},
+  journal = {bioRxiv},
+  year    = {2025},
+  doi     = {10.64898/2025.12.01.691726},
+  url     = {[https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)}
+}
+```