| --- |
| license: apache-2.0 |
| datasets: |
| - uzw/PlainFact |
| language: |
| - en |
| metrics: |
| - accuracy |
| pipeline_tag: text-classification |
| tags: |
| - biology |
| - medical |
| - classification |
| --- |
| |
| > This plain language summary classification model is a part of the [PlainQAFact](https://github.com/zhiwenyou103/PlainQAFact) factuality evaluation framework. |
|
|
|
|
| ## Classify the Input into Either Elaborative Explanation or Simplification |
| We fine-tuned [microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) model using our curated sentence-level [PlainFact](https://huggingface.co/datasets/uzw/PlainFact) dataset. |
|
|
| ## Model Overview |
| [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) is a BERT model pre-trained from scratch on PubMed abstracts and full-text articles. It's optimized for biomedical text understanding and can be fine-tuned for various classification tasks such as: |
|
|
| - Medical document classification |
| - Disease/symptom categorization |
| - Clinical note classification |
| - Biomedical relation extraction |
|
|
|
|
| ## How to use |
| Here is how to use this model in PyTorch: |
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| # Load tokenizer and model |
| model_name = "uzw/plainqafact-pls-classifier" |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| |
| num_labels = 2 # e.g., binary classification |
| model = AutoModelForSequenceClassification.from_pretrained( |
| model_name, |
| num_labels=num_labels |
| ) |
| |
| # Example text |
| text = "Patient presents with acute myocardial infarction and elevated troponin levels." |
| |
| inputs = tokenizer( |
| text, |
| padding=True, |
| truncation=True, |
| max_length=512, |
| return_tensors="pt" |
| ) |
| |
| # Get predictions |
| model.eval() |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
| predicted_class = torch.argmax(predictions, dim=-1) |
| |
| print(f"Predicted class: {predicted_class.item()}") |
| print(f"Confidence scores: {predictions}") |
| ``` |
|
|
|
|
| ## Citation |
| If you use this QG model in your research, please cite with the following BibTex entry: |
| ``` |
| @misc{you2025plainqafactretrievalaugmentedfactualconsistency, |
| title={PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization}, |
| author={Zhiwen You and Yue Guo}, |
| year={2025}, |
| eprint={2503.08890}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2503.08890}, |
| } |
| ``` |
|
|
| > Code: https://github.com/zhiwenyou103/PlainQAFact |