wesin
/

pubmedbert-relation-extraction

relation-extraction

named-entity-recognition

Eval Results (legacy)

Model card Files Files and versions

pubmedbert-relation-extraction / README.md

wesin's picture

Upload folder using huggingface_hub

3616405 verified 2 days ago

|

history blame contribute delete

3.51 kB

	---
	language: en
	license: mit
	tags:
	- biomedical
	- relation-extraction
	- pubmedbert
	- named-entity-recognition
	datasets:
	- chemprot
	- bc5cdr
	- gad
	- biored
	- ddi
	metrics:
	- f1
	- precision
	- recall
	model-index:
	- name: PubMedBERT Relation Extraction
	results:
	- task:
	type: relation-extraction
	name: Biomedical Relation Extraction
	metrics:
	- type: f1
	value: 0.7347
	name: F1 Macro
	---

	# PubMedBERT for Biomedical Relation Extraction

	Fine-tuned [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract) for multi-class relation extraction in biomedical text.

	## Model Description

	This model extracts semantic relations between biomedical entities (chemicals, diseases, genes, proteins) from scientific literature.

	Base Model: `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract`

	Training Data: chemprot, bc5cdr, gad, biored, ddi

	Relation Types (9):
	- `activates`
	- `inhibits`
	- `converts`
	- `causes`
	- `treats`
	- `associated_with`
	- `interacts_with`
	- `located_in`
	- `NO_RELATION`

	## Performance

	\| Metric \| Value \|
	\|--------\|------:\|
	\| F1 Macro \| 0.7347 \|
	\| Accuracy \| 75.3% \|

	### Per-Class F1 Scores

	\| Relation \| F1 \| Support \|
	\|----------\|---:\|--------:\|
	\| interacts_with \| 0.85 \| 1,304 \|
	\| inhibits \| 0.84 \| 2,704 \|
	\| activates \| 0.83 \| 3,412 \|
	\| converts \| 0.82 \| 884 \|
	\| associated_with \| 0.81 \| 1,769 \|
	\| causes \| 0.81 \| 6,760 \|
	\| NO_RELATION \| 0.63 \| 6,760 \|
	\| treats \| 0.28 \| 678 \|

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	model_name = "your-username/pubmedbert-relation-extraction"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Add entity markers
	special_tokens = {"additional_special_tokens": ["[E1]", "[/E1]", "[E2]", "[/E2]"]}
	tokenizer.add_special_tokens(special_tokens)
	model.resize_token_embeddings(len(tokenizer))

	# Example: Extract relation between aspirin and pain
	text = "[E1]Aspirin[/E1] reduces [E2]pain[/E2] in patients."

	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=-1)
	predicted_class = torch.argmax(probs, dim=-1).item()

	print(f"Predicted relation: {model.config.id2label[predicted_class]}")
	print(f"Confidence: {probs[0][predicted_class].item():.3f}")
	```

	## Input Format

	Text must contain entity markers `[E1]`, `[/E1]`, `[E2]`, `[/E2]` around the two entities:

	```
	[E1]Entity1[/E1] ... context ... [E2]Entity2[/E2]
	```

	## Training Details

	- Optimizer: AdamW
	- Learning Rate: 2e-5
	- Batch Size: 16
	- Epochs: 15 (early stopping)
	- Max Length: 256 tokens
	- Loss: Weighted CrossEntropy

	## Limitations

	- `treats` relation has low F1 (0.28) due to limited training data
	- Best performance on Chemical↔Gene/Protein and Disease relations
	- Requires entity markers in input text
	- Trained on English biomedical abstracts

	## Citation

	```bibtex
	@misc{pubmedbert-relation-extraction,
	author = {Your Name},
	title = {PubMedBERT for Biomedical Relation Extraction},
	year = {2026},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/your-username/pubmedbert-relation-extraction}}
	}
	```

	## Acknowledgments

	- Base model: [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract)
	- Datasets: ChemProt, BC5CDR, GAD, BioRED, DDI Corpus