MarCognity-AI / benchmark /failure_analysis
elly99's picture
Create failure_analysis
a7b01bb verified
### Case 1 — Source Ambiguity
**Domain:** Medicine
**Task:** Explain the structure and functions of the integumentary system.
**Claim generated by the model:**
"The integration of dermatology with psychology and psychiatry represents a growing field that could lead to more holistic treatment approaches."
**Verification result:**
EPISTEMIC FAILURE
**Reason:**
The retrieved sources discuss psychological aspects of skin diseases but do not explicitly state the integration between dermatology, psychology, and psychiatry as a formal interdisciplinary field.
The model inferred a structured integration that is not directly present in the sources.
**Failure Category:**
Source Ambiguity
---
### Case 2 — Source Ambiguity
**Domain:** Law
**Task:** Information Society — description of an evolving legal landscape.
**Claim generated by the model:**
"The information society represents a fundamental concept for understanding contemporary legal dynamics."
**Verification result:**
EPISTEMIC FAILURE
**Reason:**
The provided document describes the evolution of legal informatics and the expansion of topics related to digital technologies, but it **does not explicitly state** that the “information society” is a fundamental concept for understanding contemporary legal dynamics.
The model produced a plausible generalization that is **not supported** by any source in the corpus.
**Failure Category:**
Source Ambiguity
---
### Case 3 — Unauthorized Inference
**Domain:** Linguistics
**Task:** Explain the cognitive substrate of Specific Language Needs (Bisogni Linguistici Specifici).
**Claim generated by the model:**
"The use of teaching strategies focused on the mental representation of language may be more effective than traditional methods."
**Verification result:**
EPISTEMIC FAILURE
**Reason:**
The analyzed document discusses the glottodidactic potential of Cognitive Linguistics and mentions instructional applications that may support learners with Specific Language Needs. However, it **does not provide empirical evidence** or experimental studies demonstrating that mental‑representation‑based strategies are more effective than traditional methods.
The model converted a *theoretical proposal* into a *claim of proven effectiveness*, which is **not supported** by the corpus.
**Failure Category:**
Unauthorized Inference