LF_BERT_v1
LF_BERT_v1 is a lightweight TinyBERT-based cross-encoder fine-tuned for semantic evidence filtering in Retrieval-Augmented Generation (RAG) pipelines.
The model acts as a semantic gatekeeper, scoring (query, candidate_sentence) pairs to determine whether the sentence is factually useful evidence or a semantic distractor.
It is designed for CPU-only, edge, and offline deployments, with millisecond-level inference latency.
This model is the core filtering component of Project Sentinel.
Model Description
- Architecture: TinyBERT (4 layers, 312 hidden size)
- Type: Cross-encoder (joint encoding of query and sentence)
- Task: Binary fact-checking / evidence verification
- Base Model:
huawei-noah/TinyBERT_General_4L_312D - Inference Latency: ~5.3 ms (CPU)
Input Format
[CLS] query [SEP] candidate_sentence [SEP]
- Maximum sequence length: 512 tokens
Output
- Probability score โ [0,1] representing factual utility
- Typical deployment threshold: 0.85 (Strict Guard configuration)
Intended Use
โ Semantic filtering for RAG pipelines
โ Hallucination reduction
โ Early-exit decision systems
โ Edge / offline LLM deployments
This model is especially suited for:
- Local document QA systems
- Privacy-sensitive environments
- Resource-constrained hardware (โค 8 GB RAM)
Limitations
- Trained on Wikipedia-based QA (HotpotQA)
- English-only
- Sentence-level relevance (not passage-level reasoning)
- Not a factual verifier for open-world claims
Performance may degrade on highly domain-specific or non-factual corpora.
Training Data
The model was trained on a binary dataset derived from HotpotQA (Distractor setting).
Labels
- 1 โ Supporting Fact: Ground-truth evidence sentences
- 0 โ Distractor: Topically similar but factually insufficient sentences
Dataset Statistics
| Split | Samples |
|---|---|
| Train | 69,101 |
| Validation | 7,006 |
The dataset is intentionally imbalanced, reflecting real retrieval scenarios.
Training Procedure
Hyperparameters
- Learning rate:
1e-5 - Batch size:
16 - Epochs:
2 - Optimizer: AdamW
- Scheduler: Linear
- Seed:
42 - Loss: Weighted cross-entropy
Training Results
| Epoch | Validation Loss | F1 | Accuracy | Precision | Recall | ROC-AUC |
|---|---|---|---|---|---|---|
| 1 | 0.4003 | 0.7119 | 0.8290 | 0.6146 | 0.8457 | 0.9038 |
| 2 | 0.4042 | 0.7028 | 0.8167 | 0.5907 | 0.8674 | 0.9064 |
Thresholded Performance (Strict Guard)
- Decision threshold: 0.85
- Hallucination rate: 5.92%
- Fact retention: 60.34%
- Average latency: 5.30 ms (CPU)
This configuration prioritizes trustworthiness over recall.
Citation
If you use this model, please cite:
@article{salih2026sentinel,
title={Project Sentinel: Lightweight Semantic Filtering for Edge RAG},
author={Salih, El Mehdi and Ait El Mouden, Khaoula and Akchouch, Abdelhakim},
year={2026}
}
Contact
El Mehdi Salih
Mohammed V University โ Rabat
Email: elmehdi_salih@um5.ac.ma
- Downloads last month
- 4
Model tree for Mehd1SLH/LF_BERT_v1
Base model
huawei-noah/TinyBERT_General_4L_312DEvaluation results
- accuracy on Project Sentinel (HotpotQA-derived)self-reported0.817
- precision on Project Sentinel (HotpotQA-derived)self-reported0.591
- recall on Project Sentinel (HotpotQA-derived)self-reported0.867
- roc_auc on Project Sentinel (HotpotQA-derived)self-reported0.906