File size: 3,514 Bytes
3616405 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | ---
language: en
license: mit
tags:
- biomedical
- relation-extraction
- pubmedbert
- named-entity-recognition
datasets:
- chemprot
- bc5cdr
- gad
- biored
- ddi
metrics:
- f1
- precision
- recall
model-index:
- name: PubMedBERT Relation Extraction
results:
- task:
type: relation-extraction
name: Biomedical Relation Extraction
metrics:
- type: f1
value: 0.7347
name: F1 Macro
---
# PubMedBERT for Biomedical Relation Extraction
Fine-tuned [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract) for multi-class relation extraction in biomedical text.
## Model Description
This model extracts semantic relations between biomedical entities (chemicals, diseases, genes, proteins) from scientific literature.
**Base Model:** `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract`
**Training Data:** chemprot, bc5cdr, gad, biored, ddi
**Relation Types (9):**
- `activates`
- `inhibits`
- `converts`
- `causes`
- `treats`
- `associated_with`
- `interacts_with`
- `located_in`
- `NO_RELATION`
## Performance
| Metric | Value |
|--------|------:|
| F1 Macro | 0.7347 |
| Accuracy | 75.3% |
### Per-Class F1 Scores
| Relation | F1 | Support |
|----------|---:|--------:|
| interacts_with | 0.85 | 1,304 |
| inhibits | 0.84 | 2,704 |
| activates | 0.83 | 3,412 |
| converts | 0.82 | 884 |
| associated_with | 0.81 | 1,769 |
| causes | 0.81 | 6,760 |
| NO_RELATION | 0.63 | 6,760 |
| treats | 0.28 | 678 |
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "your-username/pubmedbert-relation-extraction"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Add entity markers
special_tokens = {"additional_special_tokens": ["[E1]", "[/E1]", "[E2]", "[/E2]"]}
tokenizer.add_special_tokens(special_tokens)
model.resize_token_embeddings(len(tokenizer))
# Example: Extract relation between aspirin and pain
text = "[E1]Aspirin[/E1] reduces [E2]pain[/E2] in patients."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(probs, dim=-1).item()
print(f"Predicted relation: {model.config.id2label[predicted_class]}")
print(f"Confidence: {probs[0][predicted_class].item():.3f}")
```
## Input Format
Text must contain entity markers `[E1]`, `[/E1]`, `[E2]`, `[/E2]` around the two entities:
```
[E1]Entity1[/E1] ... context ... [E2]Entity2[/E2]
```
## Training Details
- **Optimizer:** AdamW
- **Learning Rate:** 2e-5
- **Batch Size:** 16
- **Epochs:** 15 (early stopping)
- **Max Length:** 256 tokens
- **Loss:** Weighted CrossEntropy
## Limitations
- `treats` relation has low F1 (0.28) due to limited training data
- Best performance on Chemical↔Gene/Protein and Disease relations
- Requires entity markers in input text
- Trained on English biomedical abstracts
## Citation
```bibtex
@misc{pubmedbert-relation-extraction,
author = {Your Name},
title = {PubMedBERT for Biomedical Relation Extraction},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/your-username/pubmedbert-relation-extraction}}
}
```
## Acknowledgments
- Base model: [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract)
- Datasets: ChemProt, BC5CDR, GAD, BioRED, DDI Corpus
|