---
library_name: transformers
base_model: huawei-noah/TinyBERT_General_4L_312D
language:
- en
license: mit
pipeline_tag: text-classification
task_ids:
- fact-checking
tags:
- edge-rag
- semantic-filtering
- hallucination-reduction
- cross-encoder
metrics:
- accuracy
- precision
- recall
- roc_auc
model-index:
- name: LF_BERT_v1
  results:
  - task:
      type: fact-checking
      name: Semantic Evidence Filtering
    dataset:
      name: Project Sentinel (HotpotQA-derived)
      type: hotpotqa/hotpot_qa
    metrics:
    - type: accuracy
      value: 0.8167
    - type: precision
      value: 0.5907
    - type: recall
      value: 0.8674
    - type: roc_auc
      value: 0.9064
---

# LF_BERT_v1

**LF_BERT_v1** is a lightweight **TinyBERT-based cross-encoder** fine-tuned for **semantic evidence filtering** in **Retrieval-Augmented Generation (RAG)** pipelines.

The model acts as a *semantic gatekeeper*, scoring `(query, candidate_sentence)` pairs to determine whether the sentence is **factually useful evidence** or a **semantic distractor**.  
It is designed for **CPU-only, edge, and offline deployments**, with millisecond-level inference latency.

This model is the core filtering component of **Project Sentinel**.

---

## Model Description

- **Architecture:** TinyBERT (4 layers, 312 hidden size)
- **Type:** Cross-encoder (joint encoding of query and sentence)
- **Task:** Binary fact-checking / evidence verification
- **Base Model:** `huawei-noah/TinyBERT_General_4L_312D`
- **Inference Latency:** ~5.3 ms (CPU)

### Input Format

```
[CLS] query [SEP] candidate_sentence [SEP]
```

- Maximum sequence length: 512 tokens

### Output

- Probability score ∈ [0,1] representing **factual utility**
- Typical deployment threshold: **0.85** (Strict Guard configuration)

---

## Intended Use

✔ Semantic filtering for RAG pipelines  
✔ Hallucination reduction  
✔ Early-exit decision systems  
✔ Edge / offline LLM deployments  

This model is especially suited for:
- Local document QA systems
- Privacy-sensitive environments
- Resource-constrained hardware (≤ 8 GB RAM)

---

## Limitations

- Trained on Wikipedia-based QA (HotpotQA)
- English-only
- Sentence-level relevance (not passage-level reasoning)
- Not a factual verifier for open-world claims

Performance may degrade on highly domain-specific or non-factual corpora.

---

## Training Data

The model was trained on a **binary dataset derived from HotpotQA (Distractor setting)**.

### Labels

- **1 – Supporting Fact:** Ground-truth evidence sentences
- **0 – Distractor:** Topically similar but factually insufficient sentences

### Dataset Statistics

| Split | Samples |
|------|--------|
| Train | 69,101 |
| Validation | 7,006 |

The dataset is intentionally **imbalanced**, reflecting real retrieval scenarios.

---

## Training Procedure

### Hyperparameters

- Learning rate: `1e-5`
- Batch size: `16`
- Epochs: `2`
- Optimizer: AdamW
- Scheduler: Linear
- Seed: `42`
- Loss: Weighted cross-entropy

### Training Results

| Epoch | Validation Loss | F1 | Accuracy | Precision | Recall | ROC-AUC |
|------|-----------------|----|----------|-----------|--------|--------|
| 1 | 0.4003 | 0.7119 | 0.8290 | 0.6146 | 0.8457 | 0.9038 |
| 2 | 0.4042 | 0.7028 | 0.8167 | 0.5907 | 0.8674 | 0.9064 |

---

## Thresholded Performance (Strict Guard)

- **Decision threshold:** 0.85
- **Hallucination rate:** 5.92%
- **Fact retention:** 60.34%
- **Average latency:** 5.30 ms (CPU)

This configuration prioritizes **trustworthiness over recall**.

---

## Citation

If you use this model, please cite:

```
@article{salih2026sentinel,
  title={Project Sentinel: Lightweight Semantic Filtering for Edge RAG},
  author={Salih, El Mehdi and Ait El Mouden, Khaoula and Akchouch, Abdelhakim},
  year={2026}
}
```

---

## Contact

**El Mehdi Salih**  
Mohammed V University – Rabat  
Email: elmehdi_salih@um5.ac.ma