Update pipeline tag, add library name, and improve model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +19 -11
README.md CHANGED
@@ -1,14 +1,15 @@
1
  ---
2
- license: mit
3
- language:
4
- - en
5
  base_model:
6
  - answerdotai/ModernBERT-base
7
- pipeline_tag: token-classification
 
 
 
8
  tags:
9
- - token classification
10
- - hallucination detection
11
  - transformers
 
12
  ---
13
 
14
  # LettuceDetect: Hallucination Detection Model
@@ -23,13 +24,20 @@ tags:
23
 
24
  ## Overview
25
 
26
- LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for Retrieval-Augmented Generation (RAG) applications. This model is built on **ModernBERT**, which has been specifically chosen and trained becasue of its extended context support (up to **8192 tokens**). This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context.
 
 
 
 
 
 
 
27
 
28
- **This is our Large model based on ModernBERT-large**
29
 
30
  ## Model Details
31
 
32
- - **Architecture:** ModernBERT (Large) with extended context support (up to 8192 tokens)
33
  - **Task:** Token Classification / Hallucination Detection
34
  - **Training Dataset:** RagTruth
35
  - **Language:** English
@@ -74,7 +82,7 @@ print("Predictions:", predictions)
74
 
75
  **Example level results**
76
 
77
- We evaluate our model on the test set of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset. Our large model, **lettucedetect-large-v1**, achieves an overall F1 score of 79.22%, outperforming prompt-based methods like GPT-4 (63.4%) and encoder-based models like [Luna](https://aclanthology.org/2025.coling-industry.34.pdf) (65.4%). It also surpasses fine-tuned LLAMA-2-13B (78.7%) (presented in [RAGTruth](https://aclanthology.org/2024.acl-long.585/)) and is competitive with the SOTA fine-tuned LLAMA-3-8B (83.9%) (presented in the [RAG-HAT paper](https://aclanthology.org/2024.emnlp-industry.113.pdf)). Overall, **lettucedetect-large-v1** and **lettucedect-base-v1** are very performant models, while being very effective in inference settings.
78
 
79
  The results on the example-level can be seen in the table below.
80
 
@@ -84,7 +92,7 @@ The results on the example-level can be seen in the table below.
84
 
85
  **Span-level results**
86
 
87
- At the span level, our model achieves the best scores across all data types, significantly outperforming previous models. The results can be seen in the table below. Note that here we don't compare to models, like [RAG-HAT](https://aclanthology.org/2024.emnlp-industry.113.pdf), since they have no span-level evaluation presented.
88
 
89
  <p align="center">
90
  <img src="https://github.com/KRLabsOrg/LettuceDetect/blob/main/assets/span_level_lettucedetect.png?raw=true" alt="Span-level Results" width="800"/>
 
1
  ---
 
 
 
2
  base_model:
3
  - answerdotai/ModernBERT-base
4
+ language:
5
+ - en
6
+ license: mit
7
+ pipeline_tag: question-answering
8
  tags:
9
+ - token-classification
10
+ - hallucination-detection
11
  - transformers
12
+ library_name: transformers
13
  ---
14
 
15
  # LettuceDetect: Hallucination Detection Model
 
24
 
25
  ## Overview
26
 
27
+ LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for Retrieval-Augmented Generation (RAG) applications. This model is built on **ModernBERT**, which has been specifically chosen and trained because of its extended context support (up to **8192 tokens**). This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context.
28
+
29
+
30
+ ## Paper
31
+
32
+ [LettuceDetect: A Hallucination Detection Framework for RAG Applications](https://hf.co/papers/2502.17125)
33
+
34
+ **Abstract:**
35
 
36
+ Retrieval Augmented Generation (RAG) systems remain vulnerable to hallucinated answers despite incorporating external knowledge sources. We present LettuceDetect a framework that addresses two critical limitations in existing hallucination detection methods: (1) the context window constraints of traditional encoder-based methods, and (2) the computational inefficiency of LLM based approaches. Building on ModernBERT's extended context capabilities (up to 8k tokens) and trained on the RAGTruth benchmark dataset, our approach outperforms all previous encoder-based models and most prompt-based models, while being approximately 30 times smaller than the best models. LettuceDetect is a token-classification model that processes context-question-answer triples, allowing for the identification of unsupported claims at the token level. Evaluations on the RAGTruth corpus demonstrate an F1 score of 79.22% for example-level detection, which is a 14.8% improvement over Luna, the previous state-of-the-art encoder-based architecture. Additionally, the system can process 30 to 60 examples per second on a single GPU, making it more practical for real-world RAG applications.
37
 
38
  ## Model Details
39
 
40
+ - **Architecture:** ModernBERT (base) with extended context support (up to 8192 tokens)
41
  - **Task:** Token Classification / Hallucination Detection
42
  - **Training Dataset:** RagTruth
43
  - **Language:** English
 
82
 
83
  **Example level results**
84
 
85
+ The model is evaluated on the test set of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset. The large version of this model, lettucedect-large-v1, achieves an overall F1 score of 79.22%, outperforming prompt-based methods like GPT-4 (63.4%) and encoder-based models like [Luna](https://aclanthology.org/2025.coling-industry.34.pdf) (65.4%). It also surpasses fine-tuned LLAMA-2-13B (78.7%) and is competitive with the SOTA fine-tuned LLAMA-3-8B (83.9%).
86
 
87
  The results on the example-level can be seen in the table below.
88
 
 
92
 
93
  **Span-level results**
94
 
95
+ At the span level, the large version of this model achieves the best scores across all data types, significantly outperforming previous models. The results can be seen in the table below.
96
 
97
  <p align="center">
98
  <img src="https://github.com/KRLabsOrg/LettuceDetect/blob/main/assets/span_level_lettucedetect.png?raw=true" alt="Span-level Results" width="800"/>