Update README.md

2396a2d verified 3 months ago

3.91 kB

library_name: transformers
base_model: huawei-noah/TinyBERT_General_4L_312D
language:
  - en
license: mit
pipeline_tag: text-classification
task_ids:
  - fact-checking
tags:
  - edge-rag
  - semantic-filtering
  - hallucination-reduction
  - cross-encoder
metrics:
  - accuracy
  - precision
  - recall
  - roc_auc
model-index:
  - name: LF_BERT_v1
    results:
      - task:
          type: fact-checking
          name: Semantic Evidence Filtering
        dataset:
          name: Project Sentinel (HotpotQA-derived)
          type: hotpotqa/hotpot_qa
        metrics:
          - type: accuracy
            value: 0.8167
          - type: precision
            value: 0.5907
          - type: recall
            value: 0.8674
          - type: roc_auc
            value: 0.9064

LF_BERT_v1

LF_BERT_v1 is a lightweight TinyBERT-based cross-encoder fine-tuned for semantic evidence filtering in Retrieval-Augmented Generation (RAG) pipelines.

The model acts as a semantic gatekeeper, scoring (query, candidate_sentence) pairs to determine whether the sentence is factually useful evidence or a semantic distractor.
It is designed for CPU-only, edge, and offline deployments, with millisecond-level inference latency.

This model is the core filtering component of Project Sentinel.

Model Description

Architecture: TinyBERT (4 layers, 312 hidden size)
Type: Cross-encoder (joint encoding of query and sentence)
Task: Binary fact-checking / evidence verification
Base Model: huawei-noah/TinyBERT_General_4L_312D
Inference Latency: ~5.3 ms (CPU)

Input Format

[CLS] query [SEP] candidate_sentence [SEP]

Maximum sequence length: 512 tokens

Output

Probability score ∈ [0,1] representing factual utility
Typical deployment threshold: 0.85 (Strict Guard configuration)

Intended Use

✔ Semantic filtering for RAG pipelines
✔ Hallucination reduction
✔ Early-exit decision systems
✔ Edge / offline LLM deployments

This model is especially suited for:

Local document QA systems
Privacy-sensitive environments
Resource-constrained hardware (≤ 8 GB RAM)

Limitations

Trained on Wikipedia-based QA (HotpotQA)
English-only
Sentence-level relevance (not passage-level reasoning)
Not a factual verifier for open-world claims

Performance may degrade on highly domain-specific or non-factual corpora.

Training Data

The model was trained on a binary dataset derived from HotpotQA (Distractor setting).

Labels

1 – Supporting Fact: Ground-truth evidence sentences
0 – Distractor: Topically similar but factually insufficient sentences

Dataset Statistics

Split	Samples
Train	69,101
Validation	7,006

The dataset is intentionally imbalanced, reflecting real retrieval scenarios.

Training Procedure

Hyperparameters

Learning rate: 1e-5
Batch size: 16
Epochs: 2
Optimizer: AdamW
Scheduler: Linear
Seed: 42
Loss: Weighted cross-entropy

Training Results

Epoch	Validation Loss	F1	Accuracy	Precision	Recall	ROC-AUC
1	0.4003	0.7119	0.8290	0.6146	0.8457	0.9038
2	0.4042	0.7028	0.8167	0.5907	0.8674	0.9064

Thresholded Performance (Strict Guard)

Decision threshold: 0.85
Hallucination rate: 5.92%
Fact retention: 60.34%
Average latency: 5.30 ms (CPU)

This configuration prioritizes trustworthiness over recall.

Citation

If you use this model, please cite:

@article{salih2026sentinel,
  title={Project Sentinel: Lightweight Semantic Filtering for Edge RAG},
  author={Salih, El Mehdi and Ait El Mouden, Khaoula and Akchouch, Abdelhakim},
  year={2026}
}

Contact

El Mehdi Salih
Mohammed V University – Rabat
Email: elmehdi_salih@um5.ac.ma