Spaces:

gopikrishnait
/

CapStoneRAG10

Sleeping

App Files Files Community

CapStoneRAG10 / docs /ADHERENCE_METRIC_FIX.md

Developer

Initial commit for HuggingFace Spaces - RAG Capstone Project with Qdrant Cloud

1d10b0a 4 months ago

preview code

raw

history blame contribute delete

2.56 kB

Adherence Metric Fix - Summary

Problem

The adherence metric was returning decimal values (e.g., 0.333, 0.667, 0.8) instead of boolean values (0.0 or 1.0) as defined in the RAGBench paper.

Root Cause

The _compute_adherence() method in advanced_rag_evaluator.py was computing adherence as:

# WRONG: Returns fraction
return fully_supported / total_sentences  # e.g., 2/3 = 0.667

This treats adherence as a "proportion of supported sentences" metric, which is incorrect.

Solution

Updated the method to compute adherence as a boolean according to RAGBench definition:

# CORRECT: Returns 1.0 or 0.0
return 1.0 if fully_supported_count == total_sentences else 0.0

RAGBench Definition of Adherence

Per the RAGBench paper, adherence is a boolean metric that indicates whether the response is fully grounded in the context:

1.0: Fully grounded - ALL sentences in the response are fully supported by the retrieved documents
0.0: Contains hallucination - ANY sentence in the response is not fully supported (hallucinated)

Examples

Scenario 1: All sentences fully supported

Sentence A: Supported ✓
Sentence B: Supported ✓
Sentence C: Supported ✓
Adherence = 1.0 (fully grounded)

Scenario 2: One sentence not fully supported (hallucination)

Sentence A: Supported ✓
Sentence B: Supported ✓
Sentence C: NOT Supported ✗ (hallucinated)
Adherence = 0.0 (contains hallucination)

Scenario 3: No sentences fully supported

Sentence A: NOT Supported ✗
Sentence B: NOT Supported ✗
Sentence C: NOT Supported ✗
Adherence = 0.0 (completely hallucinated)

File Changed

advanced_rag_evaluator.py (Lines 600-617)
- Updated _compute_adherence() method

Impact

Adherence metric now returns only 0.0 or 1.0
Aligns with RAGBench paper specification
Better represents "grounded vs hallucinated" classification
More intuitive interpretation: 0 = has hallucinations, 1 = fully grounded

Testing

Verified with multiple scenarios:

All supported → 1.0 ✓
Partial support → 0.0 ✓
No support → 0.0 ✓

Related Metrics

For reference, the other metrics in GPT Labeling method:

Context Relevance: Fraction (0-1) - proportion of relevant sentences
Context Utilization: Fraction (0-1) - proportion of relevant sentences used
Completeness: Fraction (0-1) - proportion of answer info covered
Adherence: Boolean (0.0 or 1.0) - whether response is fully grounded ← FIXED