SpeechIntentEval / README.md
kriti0608's picture
Update README.md
230001d verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
license: apache-2.0
title: ' SpeechIntentEval'
sdk: gradio
emoji: 🌖
colorFrom: indigo
colorTo: blue
pinned: true

SpeechIntentEval — Benchmark for Social & Indirect Speech Understanding

Large language models perform well on direct, literal instructions, but they often fail when people communicate through hints, emotions, politeness, hedging, or sarcasm.

Everyday speech is not explicit:

  • “It’s freezing in here.” → change the temperature
  • “I probably should finish that paper…” → plan or offer help
  • “Sure, whatever.” → sarcasm, dismissal, frustration

SpeechIntentEval evaluates whether a model understands these socially-loaded signals.


What this demo does

Paste:

  1. a User Utterance (how a real human would speak)
  2. a Model Response

The system will:

  • infer intent category (e.g., indirect request, emotional complaint)
  • provide a human explanation
  • judge if the response understood & acted on the implied meaning

This focuses on pragmatics, not keyword toxicity or raw accuracy.


How to use the demo

  1. Paste a natural sentence:
    “It’s freezing in here.”
  2. Paste your model’s reply:
    “Let me turn up the heat for you.”
  3. Click Evaluate.

You’ll see:

  • inferred intent
  • explanation
  • verdict on whether the reply handled it correctly

Current version

This v1 uses lightweight heuristics so the UI illustrates the concept.
Planned upgrades:

  • fine-tuned classifier
  • annotated dataset of speech acts
  • integration with FairEval and JailBreakDefense
  • regression testing for safety teams

Links


Built by Kriti Behl

Graduate student — University of Florida
AI Safety · Multimodal Reasoning · Evaluation Systems