library_name: transformers base_model: mistralai/Mistral-7B-Instruct-v0.2 language:
en license: apache-2.0 tags: peft lora qlora medical clinical-nlp text-simplification fine-tuned pipeline_tag: text-generation datasets: armanc/pubmed-rct20k metrics: rouge bertscore
mistral-clinical-simplifier A fine-tuned version of Mistral-7B-Instruct-v0.2 trained to convert complex clinical and biomedical text into plain language that a patient can understand. Demo: https://huggingface.co/spaces/prabhal/mistral-clinical-simplifier
Model Description This model was trained using Supervised Fine-Tuning (SFT) with QLoRA on a custom dataset derived from PubMed RCT abstracts. The task is clinical text simplification: given a sentence or paragraph written for clinicians, the model produces a rewritten version that a patient with no medical background can read and understand.
Base model: mistralai/Mistral-7B-Instruct-v0.2 Fine-tuning method: QLoRA (4-bit quantization with NF4, LoRA rank 16, alpha 32) Target modules: q_proj, k_proj, v_proj, o_proj Training hardware: T4 GPU (Google Colab)
Training Data Sourced from the PubMed RCT 20k dataset. Sentences longer than 80 characters were extracted and cleaned to remove annotation artifacts. Sentence-level and paragraph-level inputs were combined to give the model exposure to both short and multi-sentence clinical contexts. The final dataset contained approximately 400 training examples with a 90/10 train-validation split.
How to Use from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel import torch
base_model_id = "mistralai/Mistral-7B-Instruct-v0.2" lora_model_id = "prabhal/mistral-clinical-simplifier"
bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4" )
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained( base_model_id, quantization_config=bnb_config, device_map="auto" )
model = PeftModel.from_pretrained(base_model, lora_model_id) model.eval()
def simplify(text): prompt = f"""### Instruction: Simplify the following clinical text into patient-friendly explanation.
Input:
{text}
Response:
""" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): output = model.generate( **inputs, max_new_tokens=200, do_sample=True, temperature=0.3, top_p=0.9 ) return tokenizer.decode(output[0], skip_special_tokens=True)
Evaluation Results
Evaluated on 20 held-out samples comparing the fine-tuned model against the base Mistral-7B-Instruct-v0.2 without any fine-tuning.
Readability (Flesch-Kincaid Grade Level): The original clinical text averaged a grade level of 15.58, meaning it reads at a college sophomore level. The base model brought this down to 12.51. The fine-tuned model reduced it further to 7.47, which is roughly middle-school level and aligns with the broadly recommended target for patient-facing health communication.
ROUGE scores: ROUGE-1 improved from 0.3773 on the base model to 0.5274 on the fine-tuned model. ROUGE-L improved from 0.2520 to 0.3872. This indicates the fine-tuned model produces outputs that are significantly closer in word overlap and sequence structure to the reference simplifications.
BERTScore F1: The base model scored 0.8878 and the fine-tuned model scored 0.9034. The gap is smaller here because BERTScore measures meaning-level alignment rather than surface overlap, and the base model is already a strong language model. The improvement confirms the fine-tuned outputs are semantically closer to the references without introducing drift.
LLM-as-Judge (GPT, scale 1 to 10): The fine-tuned model scored 8.60 on simplicity, confirming it reliably produces patient-friendly language. Accuracy scored 6.55 and faithfulness scored 6.45, reflecting the inherent difficulty of preserving exact medical meaning while simplifying vocabulary. These mid-range scores on accuracy and faithfulness point to hallucination and meaning drift as the primary areas for future improvement.
Limitations This model is intended for educational and research purposes only. It should not be used to provide medical advice or replace clinical communication. Faithfulness scores indicate that meaning drift and hallucination remain real risks, and outputs should always be reviewed by a qualified professional before reaching patients.
- Downloads last month
- 246
Model tree for prabhal/mistral-clinical-simplifier
Base model
mistralai/Mistral-7B-Instruct-v0.2