uzw
/

plainqafact-pls-classifier

Text Classification

Model card Files Files and versions

plainqafact-pls-classifier / README.md

uzw's picture

Update README.md

480fb2c verified 2 months ago

|

history blame contribute delete

2.64 kB

	---
	license: apache-2.0
	datasets:
	- uzw/PlainFact
	language:
	- en
	metrics:
	- accuracy
	pipeline_tag: text-classification
	tags:
	- biology
	- medical
	- classification
	---

	> This plain language summary classification model is a part of the [PlainQAFact](https://github.com/zhiwenyou103/PlainQAFact) factuality evaluation framework.


	## Classify the Input into Either Elaborative Explanation or Simplification
	We fine-tuned [microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) model using our curated sentence-level [PlainFact](https://huggingface.co/datasets/uzw/PlainFact) dataset.

	## Model Overview
	[PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) is a BERT model pre-trained from scratch on PubMed abstracts and full-text articles. It's optimized for biomedical text understanding and can be fine-tuned for various classification tasks such as:

	- Medical document classification
	- Disease/symptom categorization
	- Clinical note classification
	- Biomedical relation extraction


	## How to use
	Here is how to use this model in PyTorch:
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load tokenizer and model
	model_name = "uzw/plainqafact-pls-classifier"
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	num_labels = 2 # e.g., binary classification
	model = AutoModelForSequenceClassification.from_pretrained(
	model_name,
	num_labels=num_labels
	)

	# Example text
	text = "Patient presents with acute myocardial infarction and elevated troponin levels."

	inputs = tokenizer(
	text,
	padding=True,
	truncation=True,
	max_length=512,
	return_tensors="pt"
	)

	# Get predictions
	model.eval()
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_class = torch.argmax(predictions, dim=-1)

	print(f"Predicted class: {predicted_class.item()}")
	print(f"Confidence scores: {predictions}")
	```


	## Citation
	If you use this QG model in your research, please cite with the following BibTex entry:
	```
	@misc{you2025plainqafactretrievalaugmentedfactualconsistency,
	title={PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization},
	author={Zhiwen You and Yue Guo},
	year={2025},
	eprint={2503.08890},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2503.08890},
	}
	```

	> Code: https://github.com/zhiwenyou103/PlainQAFact