File size: 2,637 Bytes
480fb2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: apache-2.0
datasets:
- uzw/PlainFact
language:
- en
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- biology
- medical
- classification
---

> This plain language summary classification model is a part of the [PlainQAFact](https://github.com/zhiwenyou103/PlainQAFact) factuality evaluation framework. 


## Classify the Input into Either Elaborative Explanation or Simplification
We fine-tuned [microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) model using our curated sentence-level [PlainFact](https://huggingface.co/datasets/uzw/PlainFact) dataset.

## Model Overview
[PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) is a BERT model pre-trained from scratch on PubMed abstracts and full-text articles. It's optimized for biomedical text understanding and can be fine-tuned for various classification tasks such as:

- Medical document classification
- Disease/symptom categorization
- Clinical note classification
- Biomedical relation extraction


## How to use
Here is how to use this model in PyTorch:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load tokenizer and model
model_name = "uzw/plainqafact-pls-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)

num_labels = 2  # e.g., binary classification
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=num_labels
)

# Example text
text = "Patient presents with acute myocardial infarction and elevated troponin levels."

inputs = tokenizer(
    text,
    padding=True,
    truncation=True,
    max_length=512,
    return_tensors="pt"
)

# Get predictions
model.eval()
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1)

print(f"Predicted class: {predicted_class.item()}")
print(f"Confidence scores: {predictions}")
```


## Citation
If you use this QG model in your research, please cite with the following BibTex entry:
```
@misc{you2025plainqafactretrievalaugmentedfactualconsistency,
      title={PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization}, 
      author={Zhiwen You and Yue Guo},
      year={2025},
      eprint={2503.08890},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.08890}, 
}
```

> Code: https://github.com/zhiwenyou103/PlainQAFact