LLM Prompt Intent Classifier

Classifies user prompts sent to LLMs into four intent categories using all-MiniLM-L6-v2 sentence embeddings and a Logistic Regression classification head.

Labels

ID	Label	Description
0	`creative`	Fiction, brainstorming, roleplay, poetry
1	`informational`	Factual questions, explanations, definitions
2	`task`	Code, translation, summarisation, editing
3	`adversarial`	Jailbreaks, prompt injection, manipulation

Classifier comparison

Classifier	Accuracy	F1 macro	F1 weighted
Logistic Regression	0.8218	0.8222	0.8209
Linear SVM	0.7816	0.7824	0.7816
MLP	0.8103	0.8090	0.8100

Best model: Logistic Regression

               precision    recall  f1-score   support

     creative       0.78      0.89      0.83        45
informational       0.84      0.77      0.80        48
         task       0.80      0.89      0.85        37
  adversarial       0.87      0.75      0.80        44

     accuracy                           0.82       174
    macro avg       0.82      0.83      0.82       174
 weighted avg       0.83      0.82      0.82       174

Confusion matrix (best model)

                 Predicted →
                 creative  info  task  adversarial
creative             40     2     3            0
informational         3    37     5            3
task                  0     2    33            2
adversarial           8     3     0           33

Inference

from sentence_transformers import SentenceTransformer
import joblib
from huggingface_hub import hf_hub_download

embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
clf_path = hf_hub_download(repo_id="belrem/llm-prompt-intent-classifier", filename="classifier.joblib")
clf = joblib.load(clf_path)

prompt = "Write a poem about the ocean."
vec = embedder.encode([prompt])
label_id = clf.predict(vec)[0]
labels = ["creative", "informational", "task", "adversarial"]
print(labels[label_id])  # → creative

Limitations

Adversarial prompts are the hardest class: sophisticated jailbreaks using creative or hypothetical framing may be misclassified as creative or task.
Intent is inherently ambiguous — a prompt can be simultaneously creative and a task. The model predicts the dominant intent.
Dataset skew: adversarial examples from AdvBench may not reflect real-world jailbreak distributions.

Downloads last month: -; Downloads are not tracked for this model. How to track

belrem
/

llm-prompt-intent-classifier

LLM Prompt Intent Classifier

Labels

Classifier comparison

Best model: Logistic Regression

Confusion matrix (best model)

Inference

Limitations

Dataset used to train belrem/llm-prompt-intent-classifier

Space using belrem/llm-prompt-intent-classifier 1