# Robust-Clinical-Risk-Detector ## Overview `Robust-Clinical-Risk-Detector` is a highly robust sequence classification model, based on DistilBERT, specifically designed for processing electronic health record (EHR) text (clinical notes) and classifying patient risk (HIGH or LOW). Crucially, this model is trained using **Adversarial Training Techniques** and a **Focal Loss function** to achieve high resilience against minor but impactful perturbations (e.g., synonym swapping, negation flipping, or numerical errors) that could lead to clinical misclassification. The model is a three-class classifier: it predicts RISK: LOW, RISK: HIGH, or flags the input as an **ATTACK: DETECTED**, indicating low confidence due to input inconsistency. ## Model Architecture * **Base Model:** DistilBERT * **Task:** Sequence Classification (`DistilBertForSequenceClassification`) * **Output Classes:** 3 classes: `RISK: LOW`, `RISK: HIGH`, and `ATTACK: DETECTED`. * **Robustness:** Trained on a mix of clean clinical text and synthetically generated adversarial examples targeting critical medical terms and numerical values. * **Loss Function:** Focal Loss was used during training to place greater emphasis on misclassified hard examples (the subtle adversarial inputs). * **Domain:** Clinical Notes, Patient Discharge Summaries, Initial Assessments. ## Intended Use * **Critical Risk Classification:** Accurately determining patient risk (e.g., readmission, complication) from free-text clinical notes. * **Adversarial Defense:** Serving as a primary defense layer against data poisoning or integrity attacks on automated clinical decision support systems. * **Input Validation:** Flagging notes that have been subtly altered or contain contradictory language (`ATTACK: DETECTED`) for human review. * **Benchmarking:** Evaluating the robustness of different adversarial attack methods in the medical domain. ## Limitations * **ATTACK: DETECTED Ambiguity:** The "DETECTED" flag is an indicator of low internal confidence and potential perturbation, not a definitive proof of malicious intent. It still requires human review to confirm the nature of the text. * **Vocabulary:** While trained on a clinical corpus, it may struggle with highly specialized jargon or acronyms not covered in the vocabulary. * **Numerical Parsing:** It is robust to simple numerical errors (e.g., $15.5 \rightarrow 5.5$), but complex numerical reasoning remains a challenge. ## Example Code (PyTorch) ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch import torch.nn.functional as F model_name = "Health/Robust-Clinical-Risk-Detector" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Example 1: Clean, Low Risk Text text_clean = "History of controlled Type 2 diabetes. No acute issues today. Routine check-up. Risk: LOW." # Example 2: Adversarially Perturbed Text (Semantic Shift on 'HIGH' -> 'LOW') text_attack = "Severe headache, photophobia, and neck stiffness. Suspected Meningitis. Risk: LOW." # Prepare inputs inputs = tokenizer([text_clean, text_attack], return_tensors="pt", padding=True, truncation=True) outputs = model(**inputs) # Get predictions and probabilities probabilities = F.softmax(outputs.logits, dim=1) predictions = torch.argmax(probabilities, dim=1) labels = model.config.id2label print(f"Text 1 (Clean): Prediction: {labels[predictions[0].item()]}, Confidence: {probabilities[0].max().item():.3f}") # Expected output (approx): Prediction: RISK: LOW, Confidence: 0.985 print(f"Text 2 (Attack): Prediction: {labels[predictions[1].item()]}, Confidence: {probabilities[1].max().item():.3f}") # Expected output (approx): Prediction: ATTACK: DETECTED, Confidence: 0.550 (or high confidence in RISK: HIGH if the model successfully ignores the prompt)