LF_BERT_v1 / README.md
Mehd1SLH's picture
Update README.md
2396a2d verified
metadata
library_name: transformers
base_model: huawei-noah/TinyBERT_General_4L_312D
language:
  - en
license: mit
pipeline_tag: text-classification
task_ids:
  - fact-checking
tags:
  - edge-rag
  - semantic-filtering
  - hallucination-reduction
  - cross-encoder
metrics:
  - accuracy
  - precision
  - recall
  - roc_auc
model-index:
  - name: LF_BERT_v1
    results:
      - task:
          type: fact-checking
          name: Semantic Evidence Filtering
        dataset:
          name: Project Sentinel (HotpotQA-derived)
          type: hotpotqa/hotpot_qa
        metrics:
          - type: accuracy
            value: 0.8167
          - type: precision
            value: 0.5907
          - type: recall
            value: 0.8674
          - type: roc_auc
            value: 0.9064

LF_BERT_v1

LF_BERT_v1 is a lightweight TinyBERT-based cross-encoder fine-tuned for semantic evidence filtering in Retrieval-Augmented Generation (RAG) pipelines.

The model acts as a semantic gatekeeper, scoring (query, candidate_sentence) pairs to determine whether the sentence is factually useful evidence or a semantic distractor.
It is designed for CPU-only, edge, and offline deployments, with millisecond-level inference latency.

This model is the core filtering component of Project Sentinel.


Model Description

  • Architecture: TinyBERT (4 layers, 312 hidden size)
  • Type: Cross-encoder (joint encoding of query and sentence)
  • Task: Binary fact-checking / evidence verification
  • Base Model: huawei-noah/TinyBERT_General_4L_312D
  • Inference Latency: ~5.3 ms (CPU)

Input Format

[CLS] query [SEP] candidate_sentence [SEP]
  • Maximum sequence length: 512 tokens

Output

  • Probability score ∈ [0,1] representing factual utility
  • Typical deployment threshold: 0.85 (Strict Guard configuration)

Intended Use

✔ Semantic filtering for RAG pipelines
✔ Hallucination reduction
✔ Early-exit decision systems
✔ Edge / offline LLM deployments

This model is especially suited for:

  • Local document QA systems
  • Privacy-sensitive environments
  • Resource-constrained hardware (≤ 8 GB RAM)

Limitations

  • Trained on Wikipedia-based QA (HotpotQA)
  • English-only
  • Sentence-level relevance (not passage-level reasoning)
  • Not a factual verifier for open-world claims

Performance may degrade on highly domain-specific or non-factual corpora.


Training Data

The model was trained on a binary dataset derived from HotpotQA (Distractor setting).

Labels

  • 1 – Supporting Fact: Ground-truth evidence sentences
  • 0 – Distractor: Topically similar but factually insufficient sentences

Dataset Statistics

Split Samples
Train 69,101
Validation 7,006

The dataset is intentionally imbalanced, reflecting real retrieval scenarios.


Training Procedure

Hyperparameters

  • Learning rate: 1e-5
  • Batch size: 16
  • Epochs: 2
  • Optimizer: AdamW
  • Scheduler: Linear
  • Seed: 42
  • Loss: Weighted cross-entropy

Training Results

Epoch Validation Loss F1 Accuracy Precision Recall ROC-AUC
1 0.4003 0.7119 0.8290 0.6146 0.8457 0.9038
2 0.4042 0.7028 0.8167 0.5907 0.8674 0.9064

Thresholded Performance (Strict Guard)

  • Decision threshold: 0.85
  • Hallucination rate: 5.92%
  • Fact retention: 60.34%
  • Average latency: 5.30 ms (CPU)

This configuration prioritizes trustworthiness over recall.


Citation

If you use this model, please cite:

@article{salih2026sentinel,
  title={Project Sentinel: Lightweight Semantic Filtering for Edge RAG},
  author={Salih, El Mehdi and Ait El Mouden, Khaoula and Akchouch, Abdelhakim},
  year={2026}
}

Contact

El Mehdi Salih
Mohammed V University – Rabat
Email: elmehdi_salih@um5.ac.ma