Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
metadata
license: apache-2.0
title: ' SpeechIntentEval'
sdk: gradio
emoji: 🌖
colorFrom: indigo
colorTo: blue
pinned: true
SpeechIntentEval — Benchmark for Social & Indirect Speech Understanding
Large language models perform well on direct, literal instructions, but they often fail when people communicate through hints, emotions, politeness, hedging, or sarcasm.
Everyday speech is not explicit:
- “It’s freezing in here.” → change the temperature
- “I probably should finish that paper…” → plan or offer help
- “Sure, whatever.” → sarcasm, dismissal, frustration
SpeechIntentEval evaluates whether a model understands these socially-loaded signals.
What this demo does
Paste:
- a User Utterance (how a real human would speak)
- a Model Response
The system will:
- infer intent category (e.g., indirect request, emotional complaint)
- provide a human explanation
- judge if the response understood & acted on the implied meaning
This focuses on pragmatics, not keyword toxicity or raw accuracy.
How to use the demo
- Paste a natural sentence:
“It’s freezing in here.” - Paste your model’s reply:
“Let me turn up the heat for you.” - Click Evaluate.
You’ll see:
- inferred intent
- explanation
- verdict on whether the reply handled it correctly
Current version
This v1 uses lightweight heuristics so the UI illustrates the concept.
Planned upgrades:
- fine-tuned classifier
- annotated dataset of speech acts
- integration with FairEval and JailBreakDefense
- regression testing for safety teams
Links
- GitHub (source + dataset): https://github.com/kritibehl/SpeechIntentEval
- Portfolio: https://kritibehl.github.io/projects/
- Other safety systems:
- FairEval-Suite — human-aligned metrics
- JailBreakDefense — intent-preserving safety pipeline
Built by Kriti Behl
Graduate student — University of Florida
AI Safety · Multimodal Reasoning · Evaluation Systems