Upload folder using huggingface_hub

3616405 verified about 22 hours ago

3.51 kB

language: en
license: mit
tags:
  - biomedical
  - relation-extraction
  - pubmedbert
  - named-entity-recognition
datasets:
  - chemprot
  - bc5cdr
  - gad
  - biored
  - ddi
metrics:
  - f1
  - precision
  - recall
model-index:
  - name: PubMedBERT Relation Extraction
    results:
      - task:
          type: relation-extraction
          name: Biomedical Relation Extraction
        metrics:
          - type: f1
            value: 0.7347
            name: F1 Macro

PubMedBERT for Biomedical Relation Extraction

Fine-tuned PubMedBERT for multi-class relation extraction in biomedical text.

Model Description

This model extracts semantic relations between biomedical entities (chemicals, diseases, genes, proteins) from scientific literature.

Base Model: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract

Training Data: chemprot, bc5cdr, gad, biored, ddi

Relation Types (9):

activates
inhibits
converts
causes
treats
associated_with
interacts_with
located_in
NO_RELATION

Performance

Metric	Value
F1 Macro	0.7347
Accuracy	75.3%

Per-Class F1 Scores

Relation	F1	Support
interacts_with	0.85	1,304
inhibits	0.84	2,704
activates	0.83	3,412
converts	0.82	884
associated_with	0.81	1,769
causes	0.81	6,760
NO_RELATION	0.63	6,760
treats	0.28	678

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "your-username/pubmedbert-relation-extraction"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Add entity markers
special_tokens = {"additional_special_tokens": ["[E1]", "[/E1]", "[E2]", "[/E2]"]}
tokenizer.add_special_tokens(special_tokens)
model.resize_token_embeddings(len(tokenizer))

# Example: Extract relation between aspirin and pain
text = "[E1]Aspirin[/E1] reduces [E2]pain[/E2] in patients."

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(probs, dim=-1).item()

print(f"Predicted relation: {model.config.id2label[predicted_class]}")
print(f"Confidence: {probs[0][predicted_class].item():.3f}")

Input Format

Text must contain entity markers [E1], [/E1], [E2], [/E2] around the two entities:

[E1]Entity1[/E1] ... context ... [E2]Entity2[/E2]

Training Details

Optimizer: AdamW
Learning Rate: 2e-5
Batch Size: 16
Epochs: 15 (early stopping)
Max Length: 256 tokens
Loss: Weighted CrossEntropy

Limitations

treats relation has low F1 (0.28) due to limited training data
Best performance on Chemical↔Gene/Protein and Disease relations
Requires entity markers in input text
Trained on English biomedical abstracts

Citation

@misc{pubmedbert-relation-extraction,
  author = {Your Name},
  title = {PubMedBERT for Biomedical Relation Extraction},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/your-username/pubmedbert-relation-extraction}}
}

Acknowledgments

Base model: PubMedBERT
Datasets: ChemProt, BC5CDR, GAD, BioRED, DDI Corpus