Credibility Gate v3

3-class credibility classifier: TRUTHFUL / MIXED / CONSPIRACY. Fine-tuned ModernBERT-large for the 8-resolver signal analysis pipeline. Corpus-enhanced, multilingual-aware successor to v1/v2.

Developed by EpsilonGreedyAI


What's New in v3

Version Dataset Epochs Accuracy F1 Key Change
v1 30K fake-news + 130 synthetic 2 1.000* 1.000* Initial release, 19-example test
v2 v1 data + 100 balanced MIXED 2 1.000* 1.000* Fixed CONF bias (27% to 100% MIXED)
v3 v2 data + 3,953 CredibilityCorpus examples 3 0.9991 0.9991 Real-world multilingual, 34K total

*Small held-out set (19 examples). v3 evaluated on full 3,413-example test split.

Key Improvements

  1. Real-world training data β€” 3,953 examples from CredibilityCorpus (rumors, disinformation, tweets, news articles in English and French), replacing hand-crafted synthetic MIXED examples with authentic ambiguous claims
  2. 34K total dataset β€” 15,671 TRUTHFUL / 3,496 MIXED / 14,916 CONSPIRACY, up from 30K binary + 130 synthetic
  3. 3 epochs β€” extended training for sharper boundary confidence, from 2 epochs (v1/v2) to 3
  4. Multilingual awareness β€” French-language credibility examples (hollande.txt, UEFA_Euro_2016_Fr.txt) provide cross-lingual signal exposure
  5. Proper test split β€” evaluation on 3,413 held-out examples (10%), not small hand-picked set
  6. Training time β€” 117 minutes (7,024s) on RTX 5060 Ti 17GB, bf16 + gradient checkpointing

Model Description

Credibility Gate v3 is a production-grade content safety classifier that scores input text on a 3-tier credibility spectrum. It consolidates two earlier separate models (modernbert_conspiracy_classifier + fake-news-credibility-roberta) into a single classifier, and improves on v1/v2 with real-world multilingual training data.

Labels

Label Meaning Pipeline Action
TRUTHFUL Established fact or common knowledge Route directly to LLM
MIXED Plausible but unverifiable (rumors, anonymous sources, preliminary findings) Route with warning context
CONSPIRACY False claim, conspiracy theory, or dangerous misinformation Block or flag for human review

Design Philosophy

Most fact-check models (LIAR, FEVER, PolitiFact-based) fail catastrophically on conspiracy theories β€” they either bypass them as "not worth checking" (mmbert32k-factcheck-classifier) or actively endorse them as SUPPORTS (distilbert-factcheck). This is because their training data reflects editorial policies that don't dignify obviously false claims with verification.

Credibility Gate v3 was trained specifically to catch the "beneath refutation" void where radicalization pipelines live. It correctly flags flat Earth, anti-vax conspiracies, QAnon narratives, election denial, and chemtrail theories while passing established scientific facts and distinguishing plausible-but-unverifiable claims.


Intended Use

Primary Use Case

Pre-LLM content safety gate in a multi-resolver signal analysis pipeline. Position: after jailbreak detector, before routing-model.

Pipeline Position

REQUEST
  ->  (jailbreak detector)
  ->  (THIS MODEL β€” 3-class credibility)
  ->  (routing-model β€” complexity + domain)
  ->  (pii-classifier)
  ->  (intent-classifier)
  ->  (hallucination-checker)
  ->  (semantic-router)
  ->  (citation verification)
RESPONSE

Out-of-Scope

  • Not a fact-verification engine β€” classifies linguistic patterns, not ground truth
  • Primary training language is English; French examples provide cross-lingual signal but accuracy on non-English text is not validated
  • Not for automated censorship without human oversight
  • Does not handle multimodal content (images, video)

Training

Architecture

  • Base model: answerdotai/ModernBERT-large
  • Parameters: 396M (28 layers, 1024 hidden, 16 attention heads)
  • Context: 8192 tokens (trained at 512 max length)
  • Optimizations: bfloat16 mixed precision, gradient checkpointing, dynamic padding, fused AdamW optimizer

Training Data

Source Examples Classes Notes
fake-news-detection-dataset-English 30,000 TRUTHFUL + CONSPIRACY Binary real/fake news articles
CredibilityCorpus β€” rumors_disinformation.txt 1,612 CONSPIRACY (374) + MIXED (1,238) Real-world rumor tracker data
CredibilityCorpus β€” hollande.txt 370 MIXED French political claims
CredibilityCorpus β€” lemon.txt 269 MIXED French news claims
CredibilityCorpus β€” pin.txt 678 MIXED Multilingual claims
CredibilityCorpus β€” swine-flu.txt 1,023 TRUTHFUL (183) + MIXED (840) Health-related claims
Synthetic CONSPIRACY 20 CONSPIRACY Hand-crafted conspiracy narratives
Synthetic MIXED 100 MIXED 10 categories x 10 examples each
Synthetic TRUTHFUL 10 TRUTHFUL Established scientific/historical facts
Total 34,083

Class Distribution

Class Count %
TRUTHFUL 15,671 46.0%
MIXED 3,496 10.3%
CONSPIRACY 14,916 43.8%

CredibilityCorpus Sources

3,953 real-world examples from 7 corpus files covering:

  • Rumors & disinformation (rumors_disinformation.txt) β€” tracked online rumors with verified outcomes
  • French political claims (hollande.txt, lemon.txt) β€” cross-lingual credibility signals
  • Multilingual claims (pin.txt) β€” diverse source material
  • Health misinformation (swine-flu.txt) β€” domain-specific rumor tracking
  • Social media (randomtweets*.txt, RihannaConcert*.txt, UEFA_Euro_2016*.txt) β€” real-world tweet-level claims

Hyperparameters

  • Epochs: 3
  • Learning rate: 5e-5 (linear decay)
  • Batch size: 12 (effective 24 with gradient accumulation x2)
  • Steps: 3,834 total (1,278 per epoch)
  • Optimizer: AdamW (fused)
  • Max sequence length: 512
  • Precision: bfloat16
  • Gradient checkpointing: enabled
  • Hardware: NVIDIA RTX 5060 Ti (17.1 GB VRAM), CUDA 12.8, Windows 10
  • Training time: 7,024s (117 minutes)

Performance

Evaluation Metrics (held-out test set, ~3,413 examples)

Metric Epoch 1 Epoch 2 Epoch 3
Eval Loss 0.00837 0.00608 0.00262
F1 (weighted) 0.9976 0.9985 0.9991
Accuracy 0.9977 0.9985 0.9991

Training Loss Curve

Epoch Train Loss Gradient Norm
0.0 0.5372 1.73
0.5 0.0462 0.74
1.0 0.0311 0.59
1.5 0.0197 0.17
2.0 0.0073 0.00
2.5 0.0001 0.00
3.0 0.0041 5.45

Convergence reached by epoch ~2.5. Loss at epoch 3 endpoint: 0.0041.

Smoke Test (v3)

Claim Verdict Confidence
"The Earth is flat and NASA faked the moon landing." CONSPIRACY 0.9999
"The Earth orbits the Sun at 93 million miles." TRUTHFUL 1.0000
"COVID-19 vaccines contain microchips." CONSPIRACY 0.9950
"A new study suggests fasting reduces inflammation." MIXED 1.0000

Conspiracy Detection (7 claims vs baselines)

Model Caught Notes
credibility-gate-v3 (this model) 7/7 3-class with real-world MIXED nuance
credibility-gate-v1 7/7 Synthetic MIXED only
modernbert_conspiracy_classifier 7/7 Binary only, no credibility scoring
roberta-credibility 5/7 Misses "election stolen" and "moon landing"
mmbert32k-factcheck-classifier 0/7 Classifies ALL as NO_FACT_CHECK_NEEDED
distilbert-factcheck 0/7 Classifies ALL as SUPPORTS (active endorsement)

Usage

Quick Start with Transformers

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="EpsilonGreedyAI/credibility-gate-v3",
    device=0  # GPU, or -1 for CPU
)

# Classify a claim
result = classifier("The Earth is flat and NASA faked the moon landing.")
print(result)
# [{'label': 'CONSPIRACY', 'score': 0.99}]

# Batch classification
texts = [
    "The Earth orbits the Sun at 93 million miles.",
    "Anonymous sources claim the CEO is stepping down.",
    "5G towers are causing the coronavirus.",
]
results = classifier(texts)
for text, r in zip(texts, results):
    print(f"{r['label']} ({r['score']:.2f}): {text}")

Loading with PyTorch

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained(
    "EpsilonGreedyAI/credibility-gate-v3",
    dtype=torch.float32,
)
tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/credibility-gate-v3")

inputs = tokenizer("Climate change is a hoax.", return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=1)
    predicted = probs.argmax().item()
    label = model.config.id2label[str(predicted)]
    print(f"{label}: {probs[0][predicted]:.4f}")

Inference Performance

Hardware Latency Batch Size
RTX 5060 Ti (GPU) ~5ms 1
RTX 5060 Ti (GPU) ~15ms 8

Using with ONNX Runtime (CPU deployment)

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

model = ORTModelForSequenceClassification.from_pretrained(
    "EpsilonGreedyAI/credibility-gate-v3",
    export=True,
)
tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/credibility-gate-v3")

Limitations

Known Weaknesses

  1. Primary language is English β€” CredibilityCorpus includes French examples for cross-lingual signal, but accuracy on non-English text is not validated against a held-out multilingual test set
  2. MIXED class is smallest (10.3%) β€” despite CredibilityCorpus addition, MIXED remains the minority class. Real-world class imbalance reflects the data landscape but may affect recall on edge cases
  3. Satire/Sarcasm β€” may misclassify obvious satire (The Onion) as CONSPIRACY
  4. Novel conspiracies β€” trained on known conspiracy patterns; emerging or novel conspiracy narratives may not be detected
  5. Confidence calibration β€” confidence scores are softmax outputs, not calibrated probabilities
  6. Social media noise β€” several CredibilityCorpus tweet files contained 0 parseable examples; real-time social media ingestion would require dedicated preprocessing

Bias Considerations

  • Training data reflects English-language news media biases
  • CONSPIRACY class is weighted toward Western conspiracy theories
  • CredibilityCorpus sources may reflect the biases of their original curators
  • French-language examples (hollande.txt, lemon.txt) are primarily political claims β€” not a balanced cross-lingual sample

Version History

Version Date Key Change
v1 2026-06-15 Initial β€” 30K binary + 130 synthetic, 19/19 test accuracy
v2 2026-06-15 Fixed MIXED CONF bias β€” 100 balanced examples, 19/19 accuracy
v3 2026-06-15 CredibilityCorpus integration β€” 3,953 real-world examples, 34K total, 3 epochs, 99.91% on full test split

Citation

@misc{epsilon-greedy-ai-credibility-gate-v3,
  author = {EpsilonGreedyAI},
  title = {Credibility Gate v3 β€” Corpus-enhanced 3-class credibility classifier for AI safety pipelines},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/EpsilonGreedyAI/credibility-gate-v3}},
}

License

Apache 2.0


Built for a custom multiple-resolver signal analysis pipeline. Trained on Windows 10, RTX 5060 Ti 17GB, Python 3.14, torch 2.11.0+cu128, transformers 5.5.0.

Downloads last month
22
Safetensors
Model size
0.4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for EpsilonGreedyAI/EGAI-credibility-gate-v3

Finetuned
(322)
this model

Evaluation results

  • accuracy on ErfanMoosaviMonazzah/fake-news-detection-dataset-English + CredibilityCorpus (multilingual) + synthetic
    self-reported
    0.999
  • f1 on ErfanMoosaviMonazzah/fake-news-detection-dataset-English + CredibilityCorpus (multilingual) + synthetic
    self-reported
    0.999