LF_BERT_v1

LF_BERT_v1 is a lightweight TinyBERT-based cross-encoder fine-tuned for semantic evidence filtering in Retrieval-Augmented Generation (RAG) pipelines.

The model acts as a semantic gatekeeper, scoring (query, candidate_sentence) pairs to determine whether the sentence is factually useful evidence or a semantic distractor.
It is designed for CPU-only, edge, and offline deployments, with millisecond-level inference latency.

This model is the core filtering component of Project Sentinel.

Model Description

Architecture: TinyBERT (4 layers, 312 hidden size)
Type: Cross-encoder (joint encoding of query and sentence)
Task: Binary fact-checking / evidence verification
Base Model: huawei-noah/TinyBERT_General_4L_312D
Inference Latency: ~5.3 ms (CPU)

Input Format

[CLS] query [SEP] candidate_sentence [SEP]

Maximum sequence length: 512 tokens

Output

Probability score ∈ [0,1] representing factual utility
Typical deployment threshold: 0.85 (Strict Guard configuration)

Intended Use

✔ Semantic filtering for RAG pipelines
✔ Hallucination reduction
✔ Early-exit decision systems
✔ Edge / offline LLM deployments

This model is especially suited for:

Local document QA systems
Privacy-sensitive environments
Resource-constrained hardware (≤ 8 GB RAM)

Limitations

Trained on Wikipedia-based QA (HotpotQA)
English-only
Sentence-level relevance (not passage-level reasoning)
Not a factual verifier for open-world claims

Performance may degrade on highly domain-specific or non-factual corpora.

Training Data

The model was trained on a binary dataset derived from HotpotQA (Distractor setting).

Labels

1 – Supporting Fact: Ground-truth evidence sentences
0 – Distractor: Topically similar but factually insufficient sentences

Dataset Statistics

Split	Samples
Train	69,101
Validation	7,006

The dataset is intentionally imbalanced, reflecting real retrieval scenarios.

Training Procedure

Hyperparameters

Learning rate: 1e-5
Batch size: 16
Epochs: 2
Optimizer: AdamW
Scheduler: Linear
Seed: 42
Loss: Weighted cross-entropy

Training Results

Epoch	Validation Loss	F1	Accuracy	Precision	Recall	ROC-AUC
1	0.4003	0.7119	0.8290	0.6146	0.8457	0.9038
2	0.4042	0.7028	0.8167	0.5907	0.8674	0.9064

Thresholded Performance (Strict Guard)

Decision threshold: 0.85
Hallucination rate: 5.92%
Fact retention: 60.34%
Average latency: 5.30 ms (CPU)

This configuration prioritizes trustworthiness over recall.

Citation

If you use this model, please cite:

@article{salih2026sentinel,
  title={Project Sentinel: Lightweight Semantic Filtering for Edge RAG},
  author={Salih, El Mehdi and Ait El Mouden, Khaoula and Akchouch, Abdelhakim},
  year={2026}
}

Contact

El Mehdi Salih
Mohammed V University – Rabat
Email: elmehdi_salih@um5.ac.ma

Downloads last month: 2

Safetensors

Model size

14.4M params

Tensor type

F32

Model tree for Mehd1SLH/LF_BERT_v1

Base model

huawei-noah/TinyBERT_General_4L_312D

Finetuned

(60)

this model

Evaluation results

accuracy on Project Sentinel (HotpotQA-derived)
self-reported

0.817
precision on Project Sentinel (HotpotQA-derived)
self-reported

0.591
recall on Project Sentinel (HotpotQA-derived)
self-reported

0.867
roc_auc on Project Sentinel (HotpotQA-derived)
self-reported

0.906