DevWithKaiju commited on
Commit
bc7102b
·
verified ·
1 Parent(s): a074bb0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
7
+ tags:
8
+ - biomedical
9
+ - relation-extraction
10
+ - text-classification
11
+ ---
12
+
13
+ # cell-cell-BERT
14
+
15
+ **Configuration: R-ENT-CPT**
16
+
17
+ This model includes learned embeddings for special tokens (e.g., [CELL0], [CELL1]), acquired through continued pre-training on biomedical text.
18
+
19
+ ## Model Description
20
+
21
+ This is a specific configuration of the cell-cell-BERT model for extracting cell-cell interactions from biomedical text. It determines whether a sentence describes a direct biological relationship between two target cell types.
22
+
23
+ For full details, see our paper: **"Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints"** (bioRxiv, 2025).
24
+
25
+ * **Repository:** [https://github.com/mizuno-group/cell-cell-bert](https://github.com/mizuno-group/cell-cell-bert)
26
+ * **Paper:** [https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)
27
+
28
+ ## Model Configuration
29
+ This model corresponds to the following experimental setting in the paper:
30
+
31
+ * **Entity Indication:** [Replacement (e.g., `[CELL0]`) / Boundary Marking (e.g., `<E0>...`)]
32
+ * **Architecture:** [Entity-aware (R-BERT style) / CLS-only]
33
+ * **Pre-training:** [Continued Pre-training (CPT) / Base (Fine-tuning only)]
34
+
35
+ *Note: Please ensure your input data preprocessing matches the **Entity Indication** method specified above.*
36
+
37
+ ## How to Get Started
38
+
39
+ **Preprocessing Requirement:**
40
+ Depending on the configuration above, you must insert specific special tokens into your input text before feeding it to the model.
41
+
42
+ * **For Replacement models:** Replace cell names with `[CELL0]` and `[CELL1]`.
43
+ * **For Boundary models:** Wrap cell names with `<E0>...</E0>` and `<E1>...</E1>`.
44
+
45
+ ```python
46
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
47
+ import torch
48
+
49
+ # 1. Load the model
50
+ model_name = "mizuno-group/ccbert-[INSERT-CONFIG-NAME]"
51
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
52
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
53
+
54
+ # 2. Prepare Input
55
+ # CHANGE THIS LINE based on the Entity Indication method of this model:
56
+ # text = "The [CELL0] activate [CELL1]." # If Replacement
57
+ text = "The <E0> Macrophages </E0> activate <E1> T cells </E1>." # If Boundary Marking
58
+
59
+ # 3. Inference
60
+ inputs = tokenizer(text, return_tensors="pt")
61
+
62
+ with torch.no_grad():
63
+ logits = model(**inputs).logits
64
+ predicted_class_id = logits.argmax().item()
65
+
66
+ # 0 = No Relation, 1 = Relation Exists
67
+ print(f"Predicted Class: {predicted_class_id}")
68
+
69
+ ```
70
+
71
+ ## Citation
72
+
73
+ ```bibtex
74
+ @article{Yoshikawa2025CCBERT,
75
+ title = {Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints},
76
+ author = {Yoshikawa Mei and Mizuno Tadahaya and Ohto Yohei and Fujimoto Hiromi and Kusuhara Hiroyuki},
77
+ journal = {bioRxiv},
78
+ year = {2025},
79
+ doi = {10.64898/2025.12.01.691726},
80
+ url = {[https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)}
81
+ }
82
+
83
+ ```