llm-semantic-router/feedback-detector-dataset
Viewer • Updated • 20.9k • 56 • 1
How to use llm-semantic-router/mmbert-feedback-detector-merged with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="llm-semantic-router/mmbert-feedback-detector-merged") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("llm-semantic-router/mmbert-feedback-detector-merged")
model = AutoModelForSequenceClassification.from_pretrained("llm-semantic-router/mmbert-feedback-detector-merged")A multilingual 4-class user feedback classifier built on jhu-clsp/mmBERT-base. This model classifies user responses into satisfaction categories to help understand user intent in conversational AI systems.
This is the merged model (LoRA weights merged into base model) for direct inference without PEFT. For the LoRA adapter version, see llm-semantic-router/mmbert-feedback-detector-lora.
| Label | ID | Description |
|---|---|---|
SAT |
0 | User is satisfied with the response |
NEED_CLARIFICATION |
1 | User needs more explanation or clarification |
WRONG_ANSWER |
2 | User indicates the response is incorrect |
WANT_DIFFERENT |
3 | User wants alternative options or different response |
| Metric | Score |
|---|---|
| Accuracy | 96.89% |
| F1 Macro | 96.88% |
| F1 Weighted | 96.88% |
| Class | F1 Score |
|---|---|
| SAT | 100.0% |
| NEED_CLARIFICATION | 99.7% |
| WRONG_ANSWER | 94.0% |
| WANT_DIFFERENT | 93.8% |
Thanks to mmBERT's multilingual pretraining (256k vocabulary, 100+ languages), this model achieves excellent cross-lingual transfer:
| Language | Accuracy |
|---|---|
| 🇺🇸 English | 100% |
| 🇪🇸 Spanish | 100% |
| 🇫🇷 French | 100% |
| 🇩🇪 German | 100% |
| 🇨🇳 Chinese | 100% |
| 🇯🇵 Japanese | 100% |
| 🇰🇷 Korean | 100% |
| 🇸🇦 Arabic | 100% |
| 🇵🇹 Portuguese | 100% |
| 🇷🇺 Russian | 100% |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_name = "llm-semantic-router/mmbert-feedback-detector-merged"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example: Classify user feedback
text = "Thanks, that's exactly what I needed!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
pred = probs.argmax().item()
labels = ["SAT", "NEED_CLARIFICATION", "WRONG_ANSWER", "WANT_DIFFERENT"]
print(f"Prediction: {labels[pred]} ({probs[0][pred]:.1%})")
# Output: Prediction: SAT (100.0%)
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="llm-semantic-router/mmbert-feedback-detector-merged"
)
# English
result = classifier("Thanks, that's helpful!")
print(result) # [{'label': 'SAT', 'score': 0.999...}]
# Spanish (cross-lingual transfer)
result = classifier("¡Gracias, eso es muy útil!")
print(result) # [{'label': 'SAT', 'score': 0.999...}]
# Chinese
result = classifier("谢谢,这很有帮助!")
print(result) # [{'label': 'SAT', 'score': 0.98...}]
@misc{mmbert-feedback-detector,
title={mmBERT Feedback Detector},
author={vLLM Semantic Router Team},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/llm-semantic-router/mmbert-feedback-detector-merged}
}
Apache 2.0
Base model
jhu-clsp/mmBERT-base