--- library_name: transformers base_model: huawei-noah/TinyBERT_General_4L_312D language: - en license: mit pipeline_tag: text-classification task_ids: - fact-checking tags: - edge-rag - semantic-filtering - hallucination-reduction - cross-encoder metrics: - accuracy - precision - recall - roc_auc model-index: - name: LF_BERT_v1 results: - task: type: fact-checking name: Semantic Evidence Filtering dataset: name: Project Sentinel (HotpotQA-derived) type: hotpotqa/hotpot_qa metrics: - type: accuracy value: 0.8167 - type: precision value: 0.5907 - type: recall value: 0.8674 - type: roc_auc value: 0.9064 --- # LF_BERT_v1 **LF_BERT_v1** is a lightweight **TinyBERT-based cross-encoder** fine-tuned for **semantic evidence filtering** in **Retrieval-Augmented Generation (RAG)** pipelines. The model acts as a *semantic gatekeeper*, scoring `(query, candidate_sentence)` pairs to determine whether the sentence is **factually useful evidence** or a **semantic distractor**. It is designed for **CPU-only, edge, and offline deployments**, with millisecond-level inference latency. This model is the core filtering component of **Project Sentinel**. --- ## Model Description - **Architecture:** TinyBERT (4 layers, 312 hidden size) - **Type:** Cross-encoder (joint encoding of query and sentence) - **Task:** Binary fact-checking / evidence verification - **Base Model:** `huawei-noah/TinyBERT_General_4L_312D` - **Inference Latency:** ~5.3 ms (CPU) ### Input Format ``` [CLS] query [SEP] candidate_sentence [SEP] ``` - Maximum sequence length: 512 tokens ### Output - Probability score ∈ [0,1] representing **factual utility** - Typical deployment threshold: **0.85** (Strict Guard configuration) --- ## Intended Use ✔ Semantic filtering for RAG pipelines ✔ Hallucination reduction ✔ Early-exit decision systems ✔ Edge / offline LLM deployments This model is especially suited for: - Local document QA systems - Privacy-sensitive environments - Resource-constrained hardware (≤ 8 GB RAM) --- ## Limitations - Trained on Wikipedia-based QA (HotpotQA) - English-only - Sentence-level relevance (not passage-level reasoning) - Not a factual verifier for open-world claims Performance may degrade on highly domain-specific or non-factual corpora. --- ## Training Data The model was trained on a **binary dataset derived from HotpotQA (Distractor setting)**. ### Labels - **1 – Supporting Fact:** Ground-truth evidence sentences - **0 – Distractor:** Topically similar but factually insufficient sentences ### Dataset Statistics | Split | Samples | |------|--------| | Train | 69,101 | | Validation | 7,006 | The dataset is intentionally **imbalanced**, reflecting real retrieval scenarios. --- ## Training Procedure ### Hyperparameters - Learning rate: `1e-5` - Batch size: `16` - Epochs: `2` - Optimizer: AdamW - Scheduler: Linear - Seed: `42` - Loss: Weighted cross-entropy ### Training Results | Epoch | Validation Loss | F1 | Accuracy | Precision | Recall | ROC-AUC | |------|-----------------|----|----------|-----------|--------|--------| | 1 | 0.4003 | 0.7119 | 0.8290 | 0.6146 | 0.8457 | 0.9038 | | 2 | 0.4042 | 0.7028 | 0.8167 | 0.5907 | 0.8674 | 0.9064 | --- ## Thresholded Performance (Strict Guard) - **Decision threshold:** 0.85 - **Hallucination rate:** 5.92% - **Fact retention:** 60.34% - **Average latency:** 5.30 ms (CPU) This configuration prioritizes **trustworthiness over recall**. --- ## Citation If you use this model, please cite: ``` @article{salih2026sentinel, title={Project Sentinel: Lightweight Semantic Filtering for Edge RAG}, author={Salih, El Mehdi and Ait El Mouden, Khaoula and Akchouch, Abdelhakim}, year={2026} } ``` --- ## Contact **El Mehdi Salih** Mohammed V University – Rabat Email: elmehdi_salih@um5.ac.ma