File size: 3,907 Bytes
44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e 242b937 f9405b7 44e3c8e 4342c7c 44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 2396a2d f9405b7 44e3c8e f9405b7 44e3c8e f9405b7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | ---
library_name: transformers
base_model: huawei-noah/TinyBERT_General_4L_312D
language:
- en
license: mit
pipeline_tag: text-classification
task_ids:
- fact-checking
tags:
- edge-rag
- semantic-filtering
- hallucination-reduction
- cross-encoder
metrics:
- accuracy
- precision
- recall
- roc_auc
model-index:
- name: LF_BERT_v1
results:
- task:
type: fact-checking
name: Semantic Evidence Filtering
dataset:
name: Project Sentinel (HotpotQA-derived)
type: hotpotqa/hotpot_qa
metrics:
- type: accuracy
value: 0.8167
- type: precision
value: 0.5907
- type: recall
value: 0.8674
- type: roc_auc
value: 0.9064
---
# LF_BERT_v1
**LF_BERT_v1** is a lightweight **TinyBERT-based cross-encoder** fine-tuned for **semantic evidence filtering** in **Retrieval-Augmented Generation (RAG)** pipelines.
The model acts as a *semantic gatekeeper*, scoring `(query, candidate_sentence)` pairs to determine whether the sentence is **factually useful evidence** or a **semantic distractor**.
It is designed for **CPU-only, edge, and offline deployments**, with millisecond-level inference latency.
This model is the core filtering component of **Project Sentinel**.
---
## Model Description
- **Architecture:** TinyBERT (4 layers, 312 hidden size)
- **Type:** Cross-encoder (joint encoding of query and sentence)
- **Task:** Binary fact-checking / evidence verification
- **Base Model:** `huawei-noah/TinyBERT_General_4L_312D`
- **Inference Latency:** ~5.3 ms (CPU)
### Input Format
```
[CLS] query [SEP] candidate_sentence [SEP]
```
- Maximum sequence length: 512 tokens
### Output
- Probability score ∈ [0,1] representing **factual utility**
- Typical deployment threshold: **0.85** (Strict Guard configuration)
---
## Intended Use
✔ Semantic filtering for RAG pipelines
✔ Hallucination reduction
✔ Early-exit decision systems
✔ Edge / offline LLM deployments
This model is especially suited for:
- Local document QA systems
- Privacy-sensitive environments
- Resource-constrained hardware (≤ 8 GB RAM)
---
## Limitations
- Trained on Wikipedia-based QA (HotpotQA)
- English-only
- Sentence-level relevance (not passage-level reasoning)
- Not a factual verifier for open-world claims
Performance may degrade on highly domain-specific or non-factual corpora.
---
## Training Data
The model was trained on a **binary dataset derived from HotpotQA (Distractor setting)**.
### Labels
- **1 – Supporting Fact:** Ground-truth evidence sentences
- **0 – Distractor:** Topically similar but factually insufficient sentences
### Dataset Statistics
| Split | Samples |
|------|--------|
| Train | 69,101 |
| Validation | 7,006 |
The dataset is intentionally **imbalanced**, reflecting real retrieval scenarios.
---
## Training Procedure
### Hyperparameters
- Learning rate: `1e-5`
- Batch size: `16`
- Epochs: `2`
- Optimizer: AdamW
- Scheduler: Linear
- Seed: `42`
- Loss: Weighted cross-entropy
### Training Results
| Epoch | Validation Loss | F1 | Accuracy | Precision | Recall | ROC-AUC |
|------|-----------------|----|----------|-----------|--------|--------|
| 1 | 0.4003 | 0.7119 | 0.8290 | 0.6146 | 0.8457 | 0.9038 |
| 2 | 0.4042 | 0.7028 | 0.8167 | 0.5907 | 0.8674 | 0.9064 |
---
## Thresholded Performance (Strict Guard)
- **Decision threshold:** 0.85
- **Hallucination rate:** 5.92%
- **Fact retention:** 60.34%
- **Average latency:** 5.30 ms (CPU)
This configuration prioritizes **trustworthiness over recall**.
---
## Citation
If you use this model, please cite:
```
@article{salih2026sentinel,
title={Project Sentinel: Lightweight Semantic Filtering for Edge RAG},
author={Salih, El Mehdi and Ait El Mouden, Khaoula and Akchouch, Abdelhakim},
year={2026}
}
```
---
## Contact
**El Mehdi Salih**
Mohammed V University – Rabat
Email: elmehdi_salih@um5.ac.ma
|