Upload folder using huggingface_hub

945946a verified 12 days ago

12.7 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: answerdotai/ModernBERT-large
	tags:
	- text-classification
	- credibility
	- conspiracy-detection
	- fake-news
	- misinformation
	- safety
	- modernbert
	- multilingual
	pipeline_tag: text-classification
	language:
	- en
	- fr
	metrics:
	- accuracy
	- f1
	model-index:
	- name: credibility-gate-v3
	results:
	- task:
	type: text-classification
	dataset:
	name: ErfanMoosaviMonazzah/fake-news-detection-dataset-English + CredibilityCorpus (multilingual) + synthetic
	type: custom
	metrics:
	- type: accuracy
	value: 0.9991
	- type: f1
	value: 0.9991
	---

	# Credibility Gate v3

	3-class credibility classifier: TRUTHFUL / MIXED / CONSPIRACY. Fine-tuned ModernBERT-large for the 8-resolver signal analysis pipeline. Corpus-enhanced, multilingual-aware successor to v1/v2.

	Developed by EpsilonGreedyAI
	- HuggingFace: https://huggingface.co/EpsilonGreedyAI

	---

	## What's New in v3

	\| Version \| Dataset \| Epochs \| Accuracy \| F1 \| Key Change \|
	\|---------\|---------\|:------:\|:--------:\|:--:\|------------\|
	\| v1 \| 30K fake-news + 130 synthetic \| 2 \| 1.000* \| 1.000* \| Initial release, 19-example test \|
	\| v2 \| v1 data + 100 balanced MIXED \| 2 \| 1.000* \| 1.000* \| Fixed CONF bias (27% to 100% MIXED) \|
	\| v3 \| v2 data + 3,953 CredibilityCorpus examples \| 3 \| 0.9991 \| 0.9991 \| Real-world multilingual, 34K total \|

	*Small held-out set (19 examples). v3 evaluated on full 3,413-example test split.

	### Key Improvements

	1. Real-world training data — 3,953 examples from CredibilityCorpus (rumors, disinformation, tweets, news articles in English and French), replacing hand-crafted synthetic MIXED examples with authentic ambiguous claims
	2. 34K total dataset — 15,671 TRUTHFUL / 3,496 MIXED / 14,916 CONSPIRACY, up from 30K binary + 130 synthetic
	3. 3 epochs — extended training for sharper boundary confidence, from 2 epochs (v1/v2) to 3
	4. Multilingual awareness — French-language credibility examples (hollande.txt, UEFA_Euro_2016_Fr.txt) provide cross-lingual signal exposure
	5. Proper test split — evaluation on 3,413 held-out examples (10%), not small hand-picked set
	6. Training time — 117 minutes (7,024s) on RTX 5060 Ti 17GB, bf16 + gradient checkpointing

	---

	## Model Description

	Credibility Gate v3 is a production-grade content safety classifier that scores input text on a 3-tier credibility spectrum. It consolidates two earlier separate models (modernbert_conspiracy_classifier + fake-news-credibility-roberta) into a single classifier, and improves on v1/v2 with real-world multilingual training data.

	### Labels

	\| Label \| Meaning \| Pipeline Action \|
	\|-------\|---------\|----------------\|
	\| `TRUTHFUL` \| Established fact or common knowledge \| Route directly to LLM \|
	\| `MIXED` \| Plausible but unverifiable (rumors, anonymous sources, preliminary findings) \| Route with warning context \|
	\| `CONSPIRACY` \| False claim, conspiracy theory, or dangerous misinformation \| Block or flag for human review \|

	### Design Philosophy

	Most fact-check models (LIAR, FEVER, PolitiFact-based) fail catastrophically on conspiracy theories — they either bypass them as "not worth checking" (mmbert32k-factcheck-classifier) or actively endorse them as SUPPORTS (distilbert-factcheck). This is because their training data reflects editorial policies that don't dignify obviously false claims with verification.

	Credibility Gate v3 was trained specifically to catch the "beneath refutation" void where radicalization pipelines live. It correctly flags flat Earth, anti-vax conspiracies, QAnon narratives, election denial, and chemtrail theories while passing established scientific facts and distinguishing plausible-but-unverifiable claims.

	---

	## Intended Use

	### Primary Use Case
	Pre-LLM content safety gate in a multi-resolver signal analysis pipeline. Position: after jailbreak detector, before routing-model.

	### Pipeline Position
	```
	REQUEST
	-> (jailbreak detector)
	-> (THIS MODEL — 3-class credibility)
	-> (routing-model — complexity + domain)
	-> (pii-classifier)
	-> (intent-classifier)
	-> (hallucination-checker)
	-> (semantic-router)
	-> (citation verification)
	RESPONSE
	```

	### Out-of-Scope
	- Not a fact-verification engine — classifies linguistic patterns, not ground truth
	- Primary training language is English; French examples provide cross-lingual signal but accuracy on non-English text is not validated
	- Not for automated censorship without human oversight
	- Does not handle multimodal content (images, video)

	---

	## Training

	### Architecture
	- Base model: [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large)
	- Parameters: 396M (28 layers, 1024 hidden, 16 attention heads)
	- Context: 8192 tokens (trained at 512 max length)
	- Optimizations: bfloat16 mixed precision, gradient checkpointing, dynamic padding, fused AdamW optimizer

	### Training Data

	\| Source \| Examples \| Classes \| Notes \|
	\|--------\|:--------:\|---------\|-------\|
	\| fake-news-detection-dataset-English \| 30,000 \| TRUTHFUL + CONSPIRACY \| Binary real/fake news articles \|
	\| CredibilityCorpus — rumors_disinformation.txt \| 1,612 \| CONSPIRACY (374) + MIXED (1,238) \| Real-world rumor tracker data \|
	\| CredibilityCorpus — hollande.txt \| 370 \| MIXED \| French political claims \|
	\| CredibilityCorpus — lemon.txt \| 269 \| MIXED \| French news claims \|
	\| CredibilityCorpus — pin.txt \| 678 \| MIXED \| Multilingual claims \|
	\| CredibilityCorpus — swine-flu.txt \| 1,023 \| TRUTHFUL (183) + MIXED (840) \| Health-related claims \|
	\| Synthetic CONSPIRACY \| 20 \| CONSPIRACY \| Hand-crafted conspiracy narratives \|
	\| Synthetic MIXED \| 100 \| MIXED \| 10 categories x 10 examples each \|
	\| Synthetic TRUTHFUL \| 10 \| TRUTHFUL \| Established scientific/historical facts \|
	\| Total \| 34,083 \| \| \|

	### Class Distribution
	\| Class \| Count \| % \|
	\|-------\|:-----:\|:--:\|
	\| TRUTHFUL \| 15,671 \| 46.0% \|
	\| MIXED \| 3,496 \| 10.3% \|
	\| CONSPIRACY \| 14,916 \| 43.8% \|

	### CredibilityCorpus Sources
	3,953 real-world examples from 7 corpus files covering:
	- Rumors & disinformation (rumors_disinformation.txt) — tracked online rumors with verified outcomes
	- French political claims (hollande.txt, lemon.txt) — cross-lingual credibility signals
	- Multilingual claims (pin.txt) — diverse source material
	- Health misinformation (swine-flu.txt) — domain-specific rumor tracking
	- Social media (randomtweets.txt, RihannaConcert.txt, UEFA_Euro_2016*.txt) — real-world tweet-level claims

	### Hyperparameters
	- Epochs: 3
	- Learning rate: 5e-5 (linear decay)
	- Batch size: 12 (effective 24 with gradient accumulation x2)
	- Steps: 3,834 total (1,278 per epoch)
	- Optimizer: AdamW (fused)
	- Max sequence length: 512
	- Precision: bfloat16
	- Gradient checkpointing: enabled
	- Hardware: NVIDIA RTX 5060 Ti (17.1 GB VRAM), CUDA 12.8, Windows 10
	- Training time: 7,024s (117 minutes)

	---

	## Performance

	### Evaluation Metrics (held-out test set, ~3,413 examples)
	\| Metric \| Epoch 1 \| Epoch 2 \| Epoch 3 \|
	\|--------\|:-------:\|:-------:\|:-------:\|
	\| Eval Loss \| 0.00837 \| 0.00608 \| 0.00262 \|
	\| F1 (weighted) \| 0.9976 \| 0.9985 \| 0.9991 \|
	\| Accuracy \| 0.9977 \| 0.9985 \| 0.9991 \|

	### Training Loss Curve
	\| Epoch \| Train Loss \| Gradient Norm \|
	\|:-----:\|:----------:\|:-------------:\|
	\| 0.0 \| 0.5372 \| 1.73 \|
	\| 0.5 \| 0.0462 \| 0.74 \|
	\| 1.0 \| 0.0311 \| 0.59 \|
	\| 1.5 \| 0.0197 \| 0.17 \|
	\| 2.0 \| 0.0073 \| 0.00 \|
	\| 2.5 \| 0.0001 \| 0.00 \|
	\| 3.0 \| 0.0041 \| 5.45 \|

	Convergence reached by epoch ~2.5. Loss at epoch 3 endpoint: 0.0041.

	### Smoke Test (v3)
	\| Claim \| Verdict \| Confidence \|
	\|-------\|---------\|:----------:\|
	\| "The Earth is flat and NASA faked the moon landing." \| CONSPIRACY \| 0.9999 \|
	\| "The Earth orbits the Sun at 93 million miles." \| TRUTHFUL \| 1.0000 \|
	\| "COVID-19 vaccines contain microchips." \| CONSPIRACY \| 0.9950 \|
	\| "A new study suggests fasting reduces inflammation." \| MIXED \| 1.0000 \|

	### Conspiracy Detection (7 claims vs baselines)
	\| Model \| Caught \| Notes \|
	\|-------\|:------:\|-------\|
	\| credibility-gate-v3 (this model) \| 7/7 \| 3-class with real-world MIXED nuance \|
	\| credibility-gate-v1 \| 7/7 \| Synthetic MIXED only \|
	\| modernbert_conspiracy_classifier \| 7/7 \| Binary only, no credibility scoring \|
	\| roberta-credibility \| 5/7 \| Misses "election stolen" and "moon landing" \|
	\| mmbert32k-factcheck-classifier \| 0/7 \| Classifies ALL as NO_FACT_CHECK_NEEDED \|
	\| distilbert-factcheck \| 0/7 \| Classifies ALL as SUPPORTS (active endorsement) \|

	---

	## Usage

	### Quick Start with Transformers

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="EpsilonGreedyAI/credibility-gate-v3",
	device=0 # GPU, or -1 for CPU
	)

	# Classify a claim
	result = classifier("The Earth is flat and NASA faked the moon landing.")
	print(result)
	# [{'label': 'CONSPIRACY', 'score': 0.99}]

	# Batch classification
	texts = [
	"The Earth orbits the Sun at 93 million miles.",
	"Anonymous sources claim the CEO is stepping down.",
	"5G towers are causing the coronavirus.",
	]
	results = classifier(texts)
	for text, r in zip(texts, results):
	print(f"{r['label']} ({r['score']:.2f}): {text}")
	```

	### Loading with PyTorch

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model = AutoModelForSequenceClassification.from_pretrained(
	"EpsilonGreedyAI/credibility-gate-v3",
	dtype=torch.float32,
	)
	tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/credibility-gate-v3")

	inputs = tokenizer("Climate change is a hoax.", return_tensors="pt")
	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=1)
	predicted = probs.argmax().item()
	label = model.config.id2label[str(predicted)]
	print(f"{label}: {probs[0][predicted]:.4f}")
	```

	### Inference Performance
	\| Hardware \| Latency \| Batch Size \|
	\|----------\|:-------:\|:----------:\|
	\| RTX 5060 Ti (GPU) \| ~5ms \| 1 \|
	\| RTX 5060 Ti (GPU) \| ~15ms \| 8 \|

	### Using with ONNX Runtime (CPU deployment)

	```python
	from optimum.onnxruntime import ORTModelForSequenceClassification
	from transformers import AutoTokenizer

	model = ORTModelForSequenceClassification.from_pretrained(
	"EpsilonGreedyAI/credibility-gate-v3",
	export=True,
	)
	tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/credibility-gate-v3")
	```

	---

	## Limitations

	### Known Weaknesses
	1. Primary language is English — CredibilityCorpus includes French examples for cross-lingual signal, but accuracy on non-English text is not validated against a held-out multilingual test set
	2. MIXED class is smallest (10.3%) — despite CredibilityCorpus addition, MIXED remains the minority class. Real-world class imbalance reflects the data landscape but may affect recall on edge cases
	3. Satire/Sarcasm — may misclassify obvious satire (The Onion) as CONSPIRACY
	4. Novel conspiracies — trained on known conspiracy patterns; emerging or novel conspiracy narratives may not be detected
	5. Confidence calibration — confidence scores are softmax outputs, not calibrated probabilities
	6. Social media noise — several CredibilityCorpus tweet files contained 0 parseable examples; real-time social media ingestion would require dedicated preprocessing

	### Bias Considerations
	- Training data reflects English-language news media biases
	- CONSPIRACY class is weighted toward Western conspiracy theories
	- CredibilityCorpus sources may reflect the biases of their original curators
	- French-language examples (hollande.txt, lemon.txt) are primarily political claims — not a balanced cross-lingual sample

	---

	## Version History

	\| Version \| Date \| Key Change \|
	\|---------\|------\|------------\|
	\| v1 \| 2026-06-15 \| Initial — 30K binary + 130 synthetic, 19/19 test accuracy \|
	\| v2 \| 2026-06-15 \| Fixed MIXED CONF bias — 100 balanced examples, 19/19 accuracy \|
	\| v3 \| 2026-06-15 \| CredibilityCorpus integration — 3,953 real-world examples, 34K total, 3 epochs, 99.91% on full test split \|

	---

	## Citation

	```bibtex
	@misc{epsilon-greedy-ai-credibility-gate-v3,
	author = {EpsilonGreedyAI},
	title = {Credibility Gate v3 — Corpus-enhanced 3-class credibility classifier for AI safety pipelines},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/EpsilonGreedyAI/credibility-gate-v3}},
	}
	```

	## License

	Apache 2.0

	---

	Built for a custom multiple-resolver signal analysis pipeline. Trained on Windows 10, RTX 5060 Ti 17GB, Python 3.14, torch 2.11.0+cu128, transformers 5.5.0.