uzw commited on
Commit
480fb2c
·
verified ·
1 Parent(s): 6284ae7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -3
README.md CHANGED
@@ -1,3 +1,84 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - uzw/PlainFact
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ pipeline_tag: text-classification
10
+ tags:
11
+ - biology
12
+ - medical
13
+ - classification
14
+ ---
15
+
16
+ > This plain language summary classification model is a part of the [PlainQAFact](https://github.com/zhiwenyou103/PlainQAFact) factuality evaluation framework.
17
+
18
+
19
+ ## Classify the Input into Either Elaborative Explanation or Simplification
20
+ We fine-tuned [microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) model using our curated sentence-level [PlainFact](https://huggingface.co/datasets/uzw/PlainFact) dataset.
21
+
22
+ ## Model Overview
23
+ [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) is a BERT model pre-trained from scratch on PubMed abstracts and full-text articles. It's optimized for biomedical text understanding and can be fine-tuned for various classification tasks such as:
24
+
25
+ - Medical document classification
26
+ - Disease/symptom categorization
27
+ - Clinical note classification
28
+ - Biomedical relation extraction
29
+
30
+
31
+ ## How to use
32
+ Here is how to use this model in PyTorch:
33
+ ```python
34
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
35
+ import torch
36
+
37
+ # Load tokenizer and model
38
+ model_name = "uzw/plainqafact-pls-classifier"
39
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
40
+
41
+ num_labels = 2 # e.g., binary classification
42
+ model = AutoModelForSequenceClassification.from_pretrained(
43
+ model_name,
44
+ num_labels=num_labels
45
+ )
46
+
47
+ # Example text
48
+ text = "Patient presents with acute myocardial infarction and elevated troponin levels."
49
+
50
+ inputs = tokenizer(
51
+ text,
52
+ padding=True,
53
+ truncation=True,
54
+ max_length=512,
55
+ return_tensors="pt"
56
+ )
57
+
58
+ # Get predictions
59
+ model.eval()
60
+ with torch.no_grad():
61
+ outputs = model(**inputs)
62
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
63
+ predicted_class = torch.argmax(predictions, dim=-1)
64
+
65
+ print(f"Predicted class: {predicted_class.item()}")
66
+ print(f"Confidence scores: {predictions}")
67
+ ```
68
+
69
+
70
+ ## Citation
71
+ If you use this QG model in your research, please cite with the following BibTex entry:
72
+ ```
73
+ @misc{you2025plainqafactretrievalaugmentedfactualconsistency,
74
+ title={PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization},
75
+ author={Zhiwen You and Yue Guo},
76
+ year={2025},
77
+ eprint={2503.08890},
78
+ archivePrefix={arXiv},
79
+ primaryClass={cs.CL},
80
+ url={https://arxiv.org/abs/2503.08890},
81
+ }
82
+ ```
83
+
84
+ > Code: https://github.com/zhiwenyou103/PlainQAFact