elly99 commited on
Commit
6505da3
·
verified ·
1 Parent(s): d1bfeca

Update benchmark/failure_analysis

Browse files
Files changed (1) hide show
  1. benchmark/failure_analysis +205 -0
benchmark/failure_analysis CHANGED
@@ -54,3 +54,208 @@ The model converted a *theoretical proposal* into a *claim of proven effectivene
54
 
55
  **Failure Category:**
56
  Unauthorized Inference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  **Failure Category:**
56
  Unauthorized Inference
57
+
58
+ failure_analysis: llama-4-scout-17b-16e-instruct
59
+
60
+ ## Case — Source Ambiguity
61
+
62
+ **Domain:** Medicine
63
+ **Task:** Explain the concept of *Apparato Circolatorio* using the provided anatomy compendium.
64
+
65
+ **Claim generated by the model:**
66
+ "The circulatory system includes both blood circulation and lymph circulation, and its regulation depends on sympathetic and parasympathetic nervous control, chemical mediators such as adrenaline and acetylcholine, and baroreceptors located in blood vessels."
67
+
68
+ **Verification result:**
69
+ **EPISTEMIC FAILURE**
70
+
71
+ **Reason:**
72
+ The attached document (*Compendio di Anatomia Umana – Apparato Circolatorio*) provides an anatomical overview of the circulatory system, including structural descriptions of the heart, vessels, and general organization.
73
+ However, the text **does not contain**:
74
+
75
+ - any discussion of autonomic nervous system regulation (sympathetic/parasympathetic),
76
+ - any mention of chemical mediators such as adrenaline or acetylcholine,
77
+ - any reference to baroreceptors or mechanoreceptors,
78
+ - any physiological explanation of blood pressure regulation.
79
+
80
+ These concepts belong to cardiovascular physiology, not to the anatomical content present in the provided compendium.
81
+
82
+ The model therefore produced a statement that is **scientifically correct**, but **not supported by the source**, filling informational gaps with general medical knowledge.
83
+ This creates the illusion that the claim is grounded in the document, while it is not.
84
+
85
+ **Failure Category:**
86
+ **Source Ambiguity**
87
+
88
+ ---
89
+
90
+ ## Case — Source Ambiguity
91
+
92
+ **Domain:** Law
93
+ **Task:** Explain the concept of *Real Rights and Obligatory Rights* using the provided legal text.
94
+
95
+ **Claim generated by the model:**
96
+ "Real rights and obligatory rights are clearly defined in the provided text, which explains that real rights grant an absolute power over a thing, while obligatory rights create a relative relationship between creditor and debtor."
97
+
98
+ **Verification result:**
99
+ **EPISTEMIC FAILURE**
100
+
101
+ **Reason:**
102
+ The attached document (*Parte prima. Il diritto in generale e le nozioni fondamentali del diritto privato*) contains an introductory discussion of:
103
+
104
+ - the notion of law,
105
+ - legal norms,
106
+ - their characteristics (generalità, astrattezza),
107
+ - the distinction between law and morality,
108
+ - the concept of legal order (*ordinamento giuridico*),
109
+ - the difference between written law and customary law.
110
+
111
+ However, the text **does not contain**:
112
+
113
+ - any definition of *diritti reali*,
114
+ - any definition of *diritti di obbligazione*,
115
+ - any classification of private-law rights,
116
+ - any reference to property, usufruct, servitudes, or credit obligations,
117
+ - any doctrinal or codified distinction between absolute and relative rights.
118
+
119
+ The model therefore produced a conceptually correct distinction according to general private‑law doctrine, but **none of these notions appear in the provided corpus**.
120
+ The generated claim *sounds* grounded in the text but is actually **unsupported**, because the source does not address real rights or obligations at all.
121
+
122
+ This is a typical case where the model fills a conceptual gap with external legal knowledge, generating a plausible explanation that **cannot be verified** against the document.
123
+
124
+ **Failure Category:**
125
+ **Source Ambiguity**
126
+
127
+ ---
128
+
129
+ ## Case — Source Ambiguity
130
+
131
+ **Domain:** Biology
132
+ **Task:** Explain the concept of *Cytoplasm and Cellular Organelles* using the provided biology compendium.
133
+
134
+ **Claim generated by the model:**
135
+ "The cytoplasm is a gelatinous aqueous solution containing proteins, carbohydrates, lipids, nucleic acids, and mineral salts, and includes the cytosol, the cytoskeleton, and all cellular organelles such as mitochondria, lysosomes, peroxisomes, and the Golgi apparatus."
136
+
137
+ **Verification result:**
138
+ **EPISTEMIC FAILURE**
139
+
140
+ **Reason:**
141
+ The attached document (*Dialogare: compendio di biologia*, Unità 3 — *Citoplasma e organuli cellulari*) is referenced in the index, but **the actual content of Unit 3 is not included** in the provided text.
142
+ As a result, the corpus **does not contain**:
143
+
144
+ - any definition of cytoplasm,
145
+ - any description of cytosol,
146
+ - any list of organelles,
147
+ - any explanation of their structure or function,
148
+ - any mention of mitochondria, Golgi apparatus, lysosomes, peroxisomes, or cytoskeleton.
149
+
150
+ The model therefore produced a scientifically correct explanation based on general cell biology, but **none of these details appear in the provided source**.
151
+ The generated claim *sounds* grounded in the compendium, but is actually **not verifiable** because the relevant chapter is absent.
152
+
153
+ This is a classic case where the model fills an informational gap with external biological knowledge, generating a plausible explanation that **cannot be supported** by the document.
154
+
155
+ **Failure Category:**
156
+ **Source Ambiguity**
157
+
158
+
159
+ failure_analysis:DeepSeek-R1-Distill-Qwen-1.5B
160
+
161
+ ## Case — Source Ambiguity
162
+
163
+ **Domain:** Biology
164
+ **Task:** Explain the concept of *Cytoplasm and Cellular Organelles* using the provided biology compendium.
165
+
166
+ **Claim generated by the model:**
167
+ "The cytoplasm is a gelatinous aqueous matrix composed of proteins, lipids, carbohydrates, nucleic acids, and mineral salts, and contains the cytosol, the cytoskeleton, and all major organelles such as mitochondria, lysosomes, peroxisomes, the Golgi apparatus, and the endoplasmic reticulum."
168
+
169
+ **Verification result:**
170
+ **EPISTEMIC FAILURE**
171
+
172
+ **Reason:**
173
+ The attached document (*Dialogare: compendio di biologia*, Unità 3 — *Citoplasma e organuli cellulari*) is referenced in the index, but **the actual content of Unit 3 is not included** in the provided text.
174
+ Therefore, the corpus **does not contain**:
175
+
176
+ - any definition of cytoplasm,
177
+ - any description of cytosol,
178
+ - any list of organelles,
179
+ - any explanation of organelle structure or function,
180
+ - any mention of mitochondria, Golgi apparatus, lysosomes, peroxisomes, ribosomes, or cytoskeleton.
181
+
182
+ The model produced a scientifically accurate explanation based on general cell biology, but **none of these details appear in the provided source**.
183
+ The generated claim *sounds* grounded in the compendium, but is actually **not verifiable**, because the relevant chapter is absent from the uploaded text.
184
+
185
+ This is a classic case where the model fills an informational gap with external biological knowledge, generating a plausible explanation that **cannot be supported** by the document.
186
+
187
+ **Failure Category:**
188
+ **Source Ambiguity**
189
+
190
+ ---
191
+
192
+ ## Case — Source Ambiguity
193
+
194
+ **Domain:** Statistics
195
+ **Task:** Explain the concept of *Samples, Bar Charts, and Histograms* using the provided course notes.
196
+
197
+ **Claim generated by the model:**
198
+ "A bar chart represents the frequency of numerical data grouped into intervals, while a histogram is used to compare categorical variables. In statistical analysis, histograms are preferred when dealing with qualitative data."
199
+
200
+ **Verification result:**
201
+ **EPISTEMIC FAILURE**
202
+
203
+ **Reason:**
204
+ The attached document (*Introduzione alla Statistica Matematica*, Section 1.1 — *Campioni, grafici a barre, istogramma*) introduces:
205
+
206
+ - the notion of a **sample**,
207
+ - the idea of repeated measurements on a population,
208
+ - the distinction between population and sample,
209
+ - the role of descriptive statistics in organizing data.
210
+
211
+ However, the provided text **does not include**:
212
+
213
+ - definitions of bar charts,
214
+ - definitions of histograms,
215
+ - distinctions between qualitative and quantitative graphical methods,
216
+ - any explanation of when histograms or bar charts should be used,
217
+ - any discussion of frequency intervals or binning.
218
+
219
+ The model therefore produced statements that are **scientifically incorrect** (it inverted the roles of bar charts and histograms) *and* **not verifiable** against the provided source.
220
+
221
+ Specifically:
222
+
223
+ - A **bar chart** is used for *categorical* (qualitative) data.
224
+ - A **histogram** is used for *quantitative* data grouped into intervals.
225
+
226
+ The generated claim reverses these definitions and attributes them to the document, even though the document does not contain the relevant explanations.
227
+
228
+ This is a clear case where the model fills missing information with general statistical knowledge, but in doing so introduces **incorrect content** and **unsupported assertions**.
229
+
230
+ **Failure Category:**
231
+ **Source Ambiguity**
232
+
233
+ ---
234
+ ## Case — Source Ambiguity
235
+
236
+ **Domain:** Medicine
237
+ **Task:** Explain the concept of *Apparato Digerente* using the provided anatomy compendium.
238
+
239
+ **Claim generated by the model:**
240
+ "The digestive system includes not only the gastrointestinal tract but also complex neuroendocrine regulatory circuits that modulate motility, secretion, and absorption through hormones such as gastrin, secretin, and cholecystokinin."
241
+
242
+ **Verification result:**
243
+ **EPISTEMIC FAILURE**
244
+
245
+ **Reason:**
246
+ The attached document (*Compendio di Anatomia Umana*, Section 6 — *Apparato Digerente*) is referenced in the index, but **the actual content of the digestive system chapter is not included** in the provided text.
247
+ As a result, the corpus **does not contain**:
248
+
249
+ - any anatomical description of the digestive system,
250
+ - any mention of gastrointestinal motility,
251
+ - any reference to neuroendocrine regulation,
252
+ - any discussion of hormones such as gastrin, secretin, or CCK,
253
+ - any physiological explanation of digestion, absorption, or secretion.
254
+
255
+ The model therefore produced a scientifically correct explanation based on general human physiology, but **none of these details appear in the provided source**.
256
+ The generated claim *sounds* grounded in the compendium, but is actually **not verifiable**, because the relevant chapter is absent from the uploaded text.
257
+
258
+ This is a classic case where the model fills an informational gap with external medical knowledge, generating a plausible explanation that **cannot be supported** by the document.
259
+
260
+ **Failure Category:**
261
+ **Source Ambiguity**