LF_BERT_v1

LF_BERT_v1 is a lightweight TinyBERT-based cross-encoder fine-tuned for semantic evidence filtering in Retrieval-Augmented Generation (RAG) pipelines.

The model acts as a semantic gatekeeper, scoring (query, candidate_sentence) pairs to determine whether the sentence is factually useful evidence or a semantic distractor.
It is designed for CPU-only, edge, and offline deployments, with millisecond-level inference latency.

This model is the core filtering component of Project Sentinel.


Model Description

  • Architecture: TinyBERT (4 layers, 312 hidden size)
  • Type: Cross-encoder (joint encoding of query and sentence)
  • Task: Binary fact-checking / evidence verification
  • Base Model: huawei-noah/TinyBERT_General_4L_312D
  • Inference Latency: ~5.3 ms (CPU)

Input Format

[CLS] query [SEP] candidate_sentence [SEP]
  • Maximum sequence length: 512 tokens

Output

  • Probability score โˆˆ [0,1] representing factual utility
  • Typical deployment threshold: 0.85 (Strict Guard configuration)

Intended Use

โœ” Semantic filtering for RAG pipelines
โœ” Hallucination reduction
โœ” Early-exit decision systems
โœ” Edge / offline LLM deployments

This model is especially suited for:

  • Local document QA systems
  • Privacy-sensitive environments
  • Resource-constrained hardware (โ‰ค 8 GB RAM)

Limitations

  • Trained on Wikipedia-based QA (HotpotQA)
  • English-only
  • Sentence-level relevance (not passage-level reasoning)
  • Not a factual verifier for open-world claims

Performance may degrade on highly domain-specific or non-factual corpora.


Training Data

The model was trained on a binary dataset derived from HotpotQA (Distractor setting).

Labels

  • 1 โ€“ Supporting Fact: Ground-truth evidence sentences
  • 0 โ€“ Distractor: Topically similar but factually insufficient sentences

Dataset Statistics

Split Samples
Train 69,101
Validation 7,006

The dataset is intentionally imbalanced, reflecting real retrieval scenarios.


Training Procedure

Hyperparameters

  • Learning rate: 1e-5
  • Batch size: 16
  • Epochs: 2
  • Optimizer: AdamW
  • Scheduler: Linear
  • Seed: 42
  • Loss: Weighted cross-entropy

Training Results

Epoch Validation Loss F1 Accuracy Precision Recall ROC-AUC
1 0.4003 0.7119 0.8290 0.6146 0.8457 0.9038
2 0.4042 0.7028 0.8167 0.5907 0.8674 0.9064

Thresholded Performance (Strict Guard)

  • Decision threshold: 0.85
  • Hallucination rate: 5.92%
  • Fact retention: 60.34%
  • Average latency: 5.30 ms (CPU)

This configuration prioritizes trustworthiness over recall.


Citation

If you use this model, please cite:

@article{salih2026sentinel,
  title={Project Sentinel: Lightweight Semantic Filtering for Edge RAG},
  author={Salih, El Mehdi and Ait El Mouden, Khaoula and Akchouch, Abdelhakim},
  year={2026}
}

Contact

El Mehdi Salih
Mohammed V University โ€“ Rabat
Email: elmehdi_salih@um5.ac.ma

Downloads last month
4
Safetensors
Model size
14.4M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Mehd1SLH/LF_BERT_v1

Finetuned
(47)
this model

Evaluation results

  • accuracy on Project Sentinel (HotpotQA-derived)
    self-reported
    0.817
  • precision on Project Sentinel (HotpotQA-derived)
    self-reported
    0.591
  • recall on Project Sentinel (HotpotQA-derived)
    self-reported
    0.867
  • roc_auc on Project Sentinel (HotpotQA-derived)
    self-reported
    0.906