SortMed Intent Classifier

Model Description

This repository contains the intent classifier used by the SortMed input guard. The model decides whether a user input is a valid symptom description or whether it belongs to an unsafe or out-of-scope intent category before the text is sent to the medical triage models.

The classifier is a lightweight scikit-learn pipeline based on:

TF-IDF text vectorization;
Logistic Regression multi-class classification.

It is intentionally separate from the triage models. The triage models predict urgency only after this intent classifier and the deterministic input-guard rules accept the input.

This model was developed as part of the SortMed academic project, a medical pre-triage assistant prototype built for a bachelor's thesis by Cristian Untaru at the West University of Timisoara, Faculty of Informatics.

Role in SortMed

The final SortMed input validation flow is:

User input
  |
  v
Deterministic input-guard rules
  |
  v
Intent classifier
  |
  v
Triage model, only if intent = symptom_description

The classifier is used as a semantic safety layer. It blocks prompts that may contain medical words but are not suitable symptom descriptions, such as medication requests, diagnosis requests, general medical questions, or non-medical input.

Intended Use

This model is intended to be used in the SortMed academic prototype for:

classifying user input intent before medical pre-triage;
rejecting unsafe or out-of-scope requests;
allowing only English symptom descriptions to reach the triage classifier;
supporting a hybrid input-guard architecture based on rules plus intent classification.

Example accepted input:

I have chest pain and I feel short of breath.

Expected intent:

symptom_description

Example rejected input:

Can you recommend a painkiller for my headache?

Expected intent:

medication_request

Out-of-Scope Use

This model must not be used as:

a medical triage classifier;
a diagnostic model;
a medication recommendation system;
a replacement for deterministic safety rules;
a standalone medical safety system;
a general-purpose moderation classifier;
a multilingual intent classifier without additional validation.

The model only classifies intent. It does not assess symptom severity and does not provide medical advice.

Intent Classes

Intent class	Meaning	SortMed behavior
`symptom_description`	The user describes symptoms or how they feel.	Accepted for triage if the confidence is high enough.
`medication_request`	The user asks for medication, drugs, treatment, or dosage advice.	Rejected with a medication-specific safety message.
`diagnosis_request`	The user asks directly for a diagnosis or condition identification.	Rejected with a diagnosis-specific safety message.
`general_medical_question`	The user asks a general medical question instead of describing symptoms.	Rejected with a message asking for a symptom description.
`non_medical`	The input is unrelated to medical symptoms.	Rejected as out of scope.

Only symptom_description is considered a valid intent for continuing to the triage models.

Configuration

The published configuration is stored in intent_config.json.

Field	Value
Model type	`tfidf+logreg`
Number of classes	5
Valid triage intent	`symptom_description`
General confidence threshold	0.5
TF-IDF feature count	1936
Training examples	382
Test examples	96
scikit-learn version	`1.6.1`

In the SortMed backend, symptom_description is accepted only when it passes the valid-intent confidence threshold used by the input guard. Other intents are rejected with class-specific user-facing messages.

Evaluation Results

The classifier was evaluated on the held-out test split.

Metric	Test Score
Accuracy	0.8750
Macro Precision	0.8734
Macro Recall	0.8634
Macro F1	0.8558

The confusion matrix is available in intent_confusion_matrix.png.

How to Use

from huggingface_hub import hf_hub_download
import joblib
import json

repo_id = "cristian-untaru/sortmed-intent-classifier"

model_path = hf_hub_download(repo_id=repo_id, filename="intent_pipeline.joblib")
config_path = hf_hub_download(repo_id=repo_id, filename="intent_config.json")

pipeline = joblib.load(model_path)

with open(config_path, "r", encoding="utf-8") as file:
    config = json.load(file)

text = "I have chest pain and I feel short of breath."

probabilities = pipeline.predict_proba([text])[0]
classes = list(pipeline.classes_)
best_index = probabilities.argmax()

intent = classes[best_index]
confidence = float(probabilities[best_index])

print("Intent:", intent)
print("Confidence:", round(confidence, 4))
print("Valid intent:", config["valid_intent"])

Security note: joblib files rely on Python pickle serialization. Load this artifact only from trusted sources.

Repository Files

File	Description
`intent_pipeline.joblib`	Serialized scikit-learn TF-IDF + Logistic Regression pipeline.
`intent_config.json`	Intent classes, accepted intent, thresholds, feature count, split sizes, and scikit-learn version.
`intent_test_metrics.json`	Held-out test metrics for the intent classifier.
`intent_confusion_matrix.png`	Confusion matrix image for the intent classification task.
`README.md`	Model card documentation.
`.gitattributes`	Git LFS configuration for large files.

Limitations

This model has several important limitations:

It is a TF-IDF + Logistic Regression classifier, not a contextual transformer or LLM.
It may be sensitive to wording, spelling, unusual phrasing, or adversarial inputs.
It was trained for English input only.
It does not perform medical triage or diagnosis.
It does not detect emergency severity.
It should be used together with deterministic input validation rules.
It should not be treated as a standalone safety system.

Ethical and Safety Considerations

Input guards for medical applications must be conservative. This classifier is designed to reduce unsafe routing into the triage models, but it cannot guarantee perfect rejection of every invalid prompt.

For this reason, SortMed uses a hybrid validation design:

deterministic rules for empty, repetitive, non-English, malformed, or adversarial input;
this intent classifier for semantic request type detection;
triage models only after the input is accepted as a symptom description.

Any production medical system would require clinical review, larger safety testing, monitoring, and a stronger risk-management process.

Medical Disclaimer

This model is part of an academic prototype. It does not provide medical advice, diagnosis, treatment, or emergency triage.

If symptoms are severe, sudden, worsening, or potentially life-threatening, users should contact emergency services or a qualified healthcare professional immediately.

cristian-untaru
/

sortmed-intent-classifier

SortMed Intent Classifier

Model Description

Role in SortMed

Intended Use

Out-of-Scope Use

Intent Classes

Configuration

Evaluation Results

How to Use

Repository Files

Limitations

Ethical and Safety Considerations

Medical Disclaimer

Related SortMed Resources

Triage Models

Full Fine-Tuned Models

LoRA Models

Bottleneck MLP Adapter Models

Frozen Encoder Models

Datasets

Evaluation results