File size: 3,907 Bytes
44e3c8e
 
 
f9405b7
 
 
 
 
 
44e3c8e
f9405b7
 
 
 
44e3c8e
 
 
 
 
 
242b937
f9405b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44e3c8e
 
4342c7c
44e3c8e
f9405b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44e3c8e
f9405b7
 
 
 
44e3c8e
f9405b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44e3c8e
f9405b7
 
 
 
44e3c8e
f9405b7
44e3c8e
f9405b7
 
 
 
 
44e3c8e
f9405b7
 
 
 
 
 
 
44e3c8e
f9405b7
44e3c8e
f9405b7
 
 
 
44e3c8e
f9405b7
 
 
44e3c8e
f9405b7
 
 
 
44e3c8e
f9405b7
44e3c8e
f9405b7
 
 
 
 
 
 
 
 
 
2396a2d
f9405b7
 
 
 
44e3c8e
f9405b7
44e3c8e
f9405b7
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
library_name: transformers
base_model: huawei-noah/TinyBERT_General_4L_312D
language:
- en
license: mit
pipeline_tag: text-classification
task_ids:
- fact-checking
tags:
- edge-rag
- semantic-filtering
- hallucination-reduction
- cross-encoder
metrics:
- accuracy
- precision
- recall
- roc_auc
model-index:
- name: LF_BERT_v1
  results:
  - task:
      type: fact-checking
      name: Semantic Evidence Filtering
    dataset:
      name: Project Sentinel (HotpotQA-derived)
      type: hotpotqa/hotpot_qa
    metrics:
    - type: accuracy
      value: 0.8167
    - type: precision
      value: 0.5907
    - type: recall
      value: 0.8674
    - type: roc_auc
      value: 0.9064
---

# LF_BERT_v1

**LF_BERT_v1** is a lightweight **TinyBERT-based cross-encoder** fine-tuned for **semantic evidence filtering** in **Retrieval-Augmented Generation (RAG)** pipelines.

The model acts as a *semantic gatekeeper*, scoring `(query, candidate_sentence)` pairs to determine whether the sentence is **factually useful evidence** or a **semantic distractor**.  
It is designed for **CPU-only, edge, and offline deployments**, with millisecond-level inference latency.

This model is the core filtering component of **Project Sentinel**.

---

## Model Description

- **Architecture:** TinyBERT (4 layers, 312 hidden size)
- **Type:** Cross-encoder (joint encoding of query and sentence)
- **Task:** Binary fact-checking / evidence verification
- **Base Model:** `huawei-noah/TinyBERT_General_4L_312D`
- **Inference Latency:** ~5.3 ms (CPU)

### Input Format

```
[CLS] query [SEP] candidate_sentence [SEP]
```

- Maximum sequence length: 512 tokens

### Output

- Probability score ∈ [0,1] representing **factual utility**
- Typical deployment threshold: **0.85** (Strict Guard configuration)

---

## Intended Use

✔ Semantic filtering for RAG pipelines  
✔ Hallucination reduction  
✔ Early-exit decision systems  
✔ Edge / offline LLM deployments  

This model is especially suited for:
- Local document QA systems
- Privacy-sensitive environments
- Resource-constrained hardware (≤ 8 GB RAM)

---

## Limitations

- Trained on Wikipedia-based QA (HotpotQA)
- English-only
- Sentence-level relevance (not passage-level reasoning)
- Not a factual verifier for open-world claims

Performance may degrade on highly domain-specific or non-factual corpora.

---

## Training Data

The model was trained on a **binary dataset derived from HotpotQA (Distractor setting)**.

### Labels

- **1 – Supporting Fact:** Ground-truth evidence sentences
- **0 – Distractor:** Topically similar but factually insufficient sentences

### Dataset Statistics

| Split | Samples |
|------|--------|
| Train | 69,101 |
| Validation | 7,006 |

The dataset is intentionally **imbalanced**, reflecting real retrieval scenarios.

---

## Training Procedure

### Hyperparameters

- Learning rate: `1e-5`
- Batch size: `16`
- Epochs: `2`
- Optimizer: AdamW
- Scheduler: Linear
- Seed: `42`
- Loss: Weighted cross-entropy

### Training Results

| Epoch | Validation Loss | F1 | Accuracy | Precision | Recall | ROC-AUC |
|------|-----------------|----|----------|-----------|--------|--------|
| 1 | 0.4003 | 0.7119 | 0.8290 | 0.6146 | 0.8457 | 0.9038 |
| 2 | 0.4042 | 0.7028 | 0.8167 | 0.5907 | 0.8674 | 0.9064 |

---

## Thresholded Performance (Strict Guard)

- **Decision threshold:** 0.85
- **Hallucination rate:** 5.92%
- **Fact retention:** 60.34%
- **Average latency:** 5.30 ms (CPU)

This configuration prioritizes **trustworthiness over recall**.

---

## Citation

If you use this model, please cite:

```
@article{salih2026sentinel,
  title={Project Sentinel: Lightweight Semantic Filtering for Edge RAG},
  author={Salih, El Mehdi and Ait El Mouden, Khaoula and Akchouch, Abdelhakim},
  year={2026}
}
```

---

## Contact

**El Mehdi Salih**  
Mohammed V University – Rabat  
Email: elmehdi_salih@um5.ac.ma