Update README.md

2396a2d verified 3 months ago

3.91 kB

	---
	library_name: transformers
	base_model: huawei-noah/TinyBERT_General_4L_312D
	language:
	- en
	license: mit
	pipeline_tag: text-classification
	task_ids:
	- fact-checking
	tags:
	- edge-rag
	- semantic-filtering
	- hallucination-reduction
	- cross-encoder
	metrics:
	- accuracy
	- precision
	- recall
	- roc_auc
	model-index:
	- name: LF_BERT_v1
	results:
	- task:
	type: fact-checking
	name: Semantic Evidence Filtering
	dataset:
	name: Project Sentinel (HotpotQA-derived)
	type: hotpotqa/hotpot_qa
	metrics:
	- type: accuracy
	value: 0.8167
	- type: precision
	value: 0.5907
	- type: recall
	value: 0.8674
	- type: roc_auc
	value: 0.9064
	---

	# LF_BERT_v1

	LF_BERT_v1 is a lightweight TinyBERT-based cross-encoder fine-tuned for semantic evidence filtering in Retrieval-Augmented Generation (RAG) pipelines.

	The model acts as a semantic gatekeeper, scoring `(query, candidate_sentence)` pairs to determine whether the sentence is factually useful evidence or a semantic distractor.
	It is designed for CPU-only, edge, and offline deployments, with millisecond-level inference latency.

	This model is the core filtering component of Project Sentinel.

	---

	## Model Description

	- Architecture: TinyBERT (4 layers, 312 hidden size)
	- Type: Cross-encoder (joint encoding of query and sentence)
	- Task: Binary fact-checking / evidence verification
	- Base Model: `huawei-noah/TinyBERT_General_4L_312D`
	- Inference Latency: ~5.3 ms (CPU)

	### Input Format

	```
	[CLS] query [SEP] candidate_sentence [SEP]
	```

	- Maximum sequence length: 512 tokens

	### Output

	- Probability score ∈ [0,1] representing factual utility
	- Typical deployment threshold: 0.85 (Strict Guard configuration)

	---

	## Intended Use

	✔ Semantic filtering for RAG pipelines
	✔ Hallucination reduction
	✔ Early-exit decision systems
	✔ Edge / offline LLM deployments

	This model is especially suited for:
	- Local document QA systems
	- Privacy-sensitive environments
	- Resource-constrained hardware (≤ 8 GB RAM)

	---

	## Limitations

	- Trained on Wikipedia-based QA (HotpotQA)
	- English-only
	- Sentence-level relevance (not passage-level reasoning)
	- Not a factual verifier for open-world claims

	Performance may degrade on highly domain-specific or non-factual corpora.

	---

	## Training Data

	The model was trained on a binary dataset derived from HotpotQA (Distractor setting).

	### Labels

	- 1 – Supporting Fact: Ground-truth evidence sentences
	- 0 – Distractor: Topically similar but factually insufficient sentences

	### Dataset Statistics

	\| Split \| Samples \|
	\|------\|--------\|
	\| Train \| 69,101 \|
	\| Validation \| 7,006 \|

	The dataset is intentionally imbalanced, reflecting real retrieval scenarios.

	---

	## Training Procedure

	### Hyperparameters

	- Learning rate: `1e-5`
	- Batch size: `16`
	- Epochs: `2`
	- Optimizer: AdamW
	- Scheduler: Linear
	- Seed: `42`
	- Loss: Weighted cross-entropy

	### Training Results

	\| Epoch \| Validation Loss \| F1 \| Accuracy \| Precision \| Recall \| ROC-AUC \|
	\|------\|-----------------\|----\|----------\|-----------\|--------\|--------\|
	\| 1 \| 0.4003 \| 0.7119 \| 0.8290 \| 0.6146 \| 0.8457 \| 0.9038 \|
	\| 2 \| 0.4042 \| 0.7028 \| 0.8167 \| 0.5907 \| 0.8674 \| 0.9064 \|

	---

	## Thresholded Performance (Strict Guard)

	- Decision threshold: 0.85
	- Hallucination rate: 5.92%
	- Fact retention: 60.34%
	- Average latency: 5.30 ms (CPU)

	This configuration prioritizes trustworthiness over recall.

	---

	## Citation

	If you use this model, please cite:

	```
	@article{salih2026sentinel,
	title={Project Sentinel: Lightweight Semantic Filtering for Edge RAG},
	author={Salih, El Mehdi and Ait El Mouden, Khaoula and Akchouch, Abdelhakim},
	year={2026}
	}
	```

	---

	## Contact

	El Mehdi Salih
	Mohammed V University – Rabat
	Email: elmehdi_salih@um5.ac.ma