Credibility Gate v3

3-class credibility classifier: TRUTHFUL / MIXED / CONSPIRACY. Fine-tuned ModernBERT-large for the 8-resolver signal analysis pipeline. Corpus-enhanced, multilingual-aware successor to v1/v2.

Developed by EpsilonGreedyAI

HuggingFace: https://huggingface.co/EpsilonGreedyAI

What's New in v3

Version	Dataset	Epochs	Accuracy	F1	Key Change
v1	30K fake-news + 130 synthetic	2	1.000*	1.000*	Initial release, 19-example test
v2	v1 data + 100 balanced MIXED	2	1.000*	1.000*	Fixed CONF bias (27% to 100% MIXED)
v3	v2 data + 3,953 CredibilityCorpus examples	3	0.9991	0.9991	Real-world multilingual, 34K total

*Small held-out set (19 examples). v3 evaluated on full 3,413-example test split.

Key Improvements

Real-world training data — 3,953 examples from CredibilityCorpus (rumors, disinformation, tweets, news articles in English and French), replacing hand-crafted synthetic MIXED examples with authentic ambiguous claims
34K total dataset — 15,671 TRUTHFUL / 3,496 MIXED / 14,916 CONSPIRACY, up from 30K binary + 130 synthetic
3 epochs — extended training for sharper boundary confidence, from 2 epochs (v1/v2) to 3
Multilingual awareness — French-language credibility examples (hollande.txt, UEFA_Euro_2016_Fr.txt) provide cross-lingual signal exposure
Proper test split — evaluation on 3,413 held-out examples (10%), not small hand-picked set
Training time — 117 minutes (7,024s) on RTX 5060 Ti 17GB, bf16 + gradient checkpointing

Model Description

Credibility Gate v3 is a production-grade content safety classifier that scores input text on a 3-tier credibility spectrum. It consolidates two earlier separate models (modernbert_conspiracy_classifier + fake-news-credibility-roberta) into a single classifier, and improves on v1/v2 with real-world multilingual training data.

Labels

Label	Meaning	Pipeline Action
`TRUTHFUL`	Established fact or common knowledge	Route directly to LLM
`MIXED`	Plausible but unverifiable (rumors, anonymous sources, preliminary findings)	Route with warning context
`CONSPIRACY`	False claim, conspiracy theory, or dangerous misinformation	Block or flag for human review

Design Philosophy

Most fact-check models (LIAR, FEVER, PolitiFact-based) fail catastrophically on conspiracy theories — they either bypass them as "not worth checking" (mmbert32k-factcheck-classifier) or actively endorse them as SUPPORTS (distilbert-factcheck). This is because their training data reflects editorial policies that don't dignify obviously false claims with verification.

Credibility Gate v3 was trained specifically to catch the "beneath refutation" void where radicalization pipelines live. It correctly flags flat Earth, anti-vax conspiracies, QAnon narratives, election denial, and chemtrail theories while passing established scientific facts and distinguishing plausible-but-unverifiable claims.

Intended Use

Primary Use Case

Pre-LLM content safety gate in a multi-resolver signal analysis pipeline. Position: after jailbreak detector, before routing-model.

Pipeline Position

REQUEST
  ->  (jailbreak detector)
  ->  (THIS MODEL — 3-class credibility)
  ->  (routing-model — complexity + domain)
  ->  (pii-classifier)
  ->  (intent-classifier)
  ->  (hallucination-checker)
  ->  (semantic-router)
  ->  (citation verification)
RESPONSE

Out-of-Scope

Not a fact-verification engine — classifies linguistic patterns, not ground truth
Primary training language is English; French examples provide cross-lingual signal but accuracy on non-English text is not validated
Not for automated censorship without human oversight
Does not handle multimodal content (images, video)

Training

Architecture

Base model: answerdotai/ModernBERT-large
Parameters: 396M (28 layers, 1024 hidden, 16 attention heads)
Context: 8192 tokens (trained at 512 max length)
Optimizations: bfloat16 mixed precision, gradient checkpointing, dynamic padding, fused AdamW optimizer

Training Data

Source	Examples	Classes	Notes
fake-news-detection-dataset-English	30,000	TRUTHFUL + CONSPIRACY	Binary real/fake news articles
CredibilityCorpus — rumors_disinformation.txt	1,612	CONSPIRACY (374) + MIXED (1,238)	Real-world rumor tracker data
CredibilityCorpus — hollande.txt	370	MIXED	French political claims
CredibilityCorpus — lemon.txt	269	MIXED	French news claims
CredibilityCorpus — pin.txt	678	MIXED	Multilingual claims
CredibilityCorpus — swine-flu.txt	1,023	TRUTHFUL (183) + MIXED (840)	Health-related claims
Synthetic CONSPIRACY	20	CONSPIRACY	Hand-crafted conspiracy narratives
Synthetic MIXED	100	MIXED	10 categories x 10 examples each
Synthetic TRUTHFUL	10	TRUTHFUL	Established scientific/historical facts
Total	34,083

Class Distribution

Class	Count	%
TRUTHFUL	15,671	46.0%
MIXED	3,496	10.3%
CONSPIRACY	14,916	43.8%

CredibilityCorpus Sources

3,953 real-world examples from 7 corpus files covering:

Rumors & disinformation (rumors_disinformation.txt) — tracked online rumors with verified outcomes
French political claims (hollande.txt, lemon.txt) — cross-lingual credibility signals
Multilingual claims (pin.txt) — diverse source material
Health misinformation (swine-flu.txt) — domain-specific rumor tracking
Social media (randomtweets*.txt, RihannaConcert*.txt, UEFA_Euro_2016*.txt) — real-world tweet-level claims

Hyperparameters

Epochs: 3
Learning rate: 5e-5 (linear decay)
Batch size: 12 (effective 24 with gradient accumulation x2)
Steps: 3,834 total (1,278 per epoch)
Optimizer: AdamW (fused)
Max sequence length: 512
Precision: bfloat16
Gradient checkpointing: enabled
Hardware: NVIDIA RTX 5060 Ti (17.1 GB VRAM), CUDA 12.8, Windows 10
Training time: 7,024s (117 minutes)

Performance

Evaluation Metrics (held-out test set, ~3,413 examples)

Metric	Epoch 1	Epoch 2	Epoch 3
Eval Loss	0.00837	0.00608	0.00262
F1 (weighted)	0.9976	0.9985	0.9991
Accuracy	0.9977	0.9985	0.9991

Training Loss Curve

Epoch	Train Loss	Gradient Norm
0.0	0.5372	1.73
0.5	0.0462	0.74
1.0	0.0311	0.59
1.5	0.0197	0.17
2.0	0.0073	0.00
2.5	0.0001	0.00
3.0	0.0041	5.45

Convergence reached by epoch ~2.5. Loss at epoch 3 endpoint: 0.0041.

Smoke Test (v3)

Claim	Verdict	Confidence
"The Earth is flat and NASA faked the moon landing."	CONSPIRACY	0.9999
"The Earth orbits the Sun at 93 million miles."	TRUTHFUL	1.0000
"COVID-19 vaccines contain microchips."	CONSPIRACY	0.9950
"A new study suggests fasting reduces inflammation."	MIXED	1.0000

Conspiracy Detection (7 claims vs baselines)

Model	Caught	Notes
credibility-gate-v3 (this model)	7/7	3-class with real-world MIXED nuance
credibility-gate-v1	7/7	Synthetic MIXED only
modernbert_conspiracy_classifier	7/7	Binary only, no credibility scoring
roberta-credibility	5/7	Misses "election stolen" and "moon landing"
mmbert32k-factcheck-classifier	0/7	Classifies ALL as NO_FACT_CHECK_NEEDED
distilbert-factcheck	0/7	Classifies ALL as SUPPORTS (active endorsement)

Usage

Quick Start with Transformers

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="EpsilonGreedyAI/credibility-gate-v3",
    device=0  # GPU, or -1 for CPU
)

# Classify a claim
result = classifier("The Earth is flat and NASA faked the moon landing.")
print(result)
# [{'label': 'CONSPIRACY', 'score': 0.99}]

# Batch classification
texts = [
    "The Earth orbits the Sun at 93 million miles.",
    "Anonymous sources claim the CEO is stepping down.",
    "5G towers are causing the coronavirus.",
]
results = classifier(texts)
for text, r in zip(texts, results):
    print(f"{r['label']} ({r['score']:.2f}): {text}")

Loading with PyTorch

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained(
    "EpsilonGreedyAI/credibility-gate-v3",
    dtype=torch.float32,
)
tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/credibility-gate-v3")

inputs = tokenizer("Climate change is a hoax.", return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    predicted = probs.argmax().item()
    label = model.config.id2label[str(predicted)]
    print(f"{label}: {probs[0][predicted]:.4f}")

Inference Performance

Hardware	Latency	Batch Size
RTX 5060 Ti (GPU)	~5ms	1
RTX 5060 Ti (GPU)	~15ms	8

Using with ONNX Runtime (CPU deployment)

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

model = ORTModelForSequenceClassification.from_pretrained(
    "EpsilonGreedyAI/credibility-gate-v3",
    export=True,
)
tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/credibility-gate-v3")

Limitations

Known Weaknesses

Primary language is English — CredibilityCorpus includes French examples for cross-lingual signal, but accuracy on non-English text is not validated against a held-out multilingual test set
MIXED class is smallest (10.3%) — despite CredibilityCorpus addition, MIXED remains the minority class. Real-world class imbalance reflects the data landscape but may affect recall on edge cases
Satire/Sarcasm — may misclassify obvious satire (The Onion) as CONSPIRACY
Novel conspiracies — trained on known conspiracy patterns; emerging or novel conspiracy narratives may not be detected
Confidence calibration — confidence scores are softmax outputs, not calibrated probabilities
Social media noise — several CredibilityCorpus tweet files contained 0 parseable examples; real-time social media ingestion would require dedicated preprocessing

Bias Considerations

Training data reflects English-language news media biases
CONSPIRACY class is weighted toward Western conspiracy theories
CredibilityCorpus sources may reflect the biases of their original curators
French-language examples (hollande.txt, lemon.txt) are primarily political claims — not a balanced cross-lingual sample

Version History

Version	Date	Key Change
v1	2026-06-15	Initial — 30K binary + 130 synthetic, 19/19 test accuracy
v2	2026-06-15	Fixed MIXED CONF bias — 100 balanced examples, 19/19 accuracy
v3	2026-06-15	CredibilityCorpus integration — 3,953 real-world examples, 34K total, 3 epochs, 99.91% on full test split

Citation

@misc{epsilon-greedy-ai-credibility-gate-v3,
  author = {EpsilonGreedyAI},
  title = {Credibility Gate v3 — Corpus-enhanced 3-class credibility classifier for AI safety pipelines},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/EpsilonGreedyAI/credibility-gate-v3}},
}

License

Apache 2.0

Built for a custom multiple-resolver signal analysis pipeline. Trained on Windows 10, RTX 5060 Ti 17GB, Python 3.14, torch 2.11.0+cu128, transformers 5.5.0.

Downloads last month: 22

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for EpsilonGreedyAI/EGAI-credibility-gate-v3

Base model

answerdotai/ModernBERT-large

Finetuned

(322)

this model

Evaluation results

accuracy on ErfanMoosaviMonazzah/fake-news-detection-dataset-English + CredibilityCorpus (multilingual) + synthetic
self-reported

0.999
f1 on ErfanMoosaviMonazzah/fake-news-detection-dataset-English + CredibilityCorpus (multilingual) + synthetic
self-reported

0.999