Instructions to use EpsilonGreedyAI/EGAI-credibility-gate-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EpsilonGreedyAI/EGAI-credibility-gate-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="EpsilonGreedyAI/EGAI-credibility-gate-v3")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/EGAI-credibility-gate-v3") model = AutoModelForSequenceClassification.from_pretrained("EpsilonGreedyAI/EGAI-credibility-gate-v3") - Notebooks
- Google Colab
- Kaggle
Credibility Gate v3
3-class credibility classifier: TRUTHFUL / MIXED / CONSPIRACY. Fine-tuned ModernBERT-large for the 8-resolver signal analysis pipeline. Corpus-enhanced, multilingual-aware successor to v1/v2.
Developed by EpsilonGreedyAI
- HuggingFace: https://huggingface.co/EpsilonGreedyAI
What's New in v3
| Version | Dataset | Epochs | Accuracy | F1 | Key Change |
|---|---|---|---|---|---|
| v1 | 30K fake-news + 130 synthetic | 2 | 1.000* | 1.000* | Initial release, 19-example test |
| v2 | v1 data + 100 balanced MIXED | 2 | 1.000* | 1.000* | Fixed CONF bias (27% to 100% MIXED) |
| v3 | v2 data + 3,953 CredibilityCorpus examples | 3 | 0.9991 | 0.9991 | Real-world multilingual, 34K total |
*Small held-out set (19 examples). v3 evaluated on full 3,413-example test split.
Key Improvements
- Real-world training data β 3,953 examples from CredibilityCorpus (rumors, disinformation, tweets, news articles in English and French), replacing hand-crafted synthetic MIXED examples with authentic ambiguous claims
- 34K total dataset β 15,671 TRUTHFUL / 3,496 MIXED / 14,916 CONSPIRACY, up from 30K binary + 130 synthetic
- 3 epochs β extended training for sharper boundary confidence, from 2 epochs (v1/v2) to 3
- Multilingual awareness β French-language credibility examples (hollande.txt, UEFA_Euro_2016_Fr.txt) provide cross-lingual signal exposure
- Proper test split β evaluation on 3,413 held-out examples (10%), not small hand-picked set
- Training time β 117 minutes (7,024s) on RTX 5060 Ti 17GB, bf16 + gradient checkpointing
Model Description
Credibility Gate v3 is a production-grade content safety classifier that scores input text on a 3-tier credibility spectrum. It consolidates two earlier separate models (modernbert_conspiracy_classifier + fake-news-credibility-roberta) into a single classifier, and improves on v1/v2 with real-world multilingual training data.
Labels
| Label | Meaning | Pipeline Action |
|---|---|---|
TRUTHFUL |
Established fact or common knowledge | Route directly to LLM |
MIXED |
Plausible but unverifiable (rumors, anonymous sources, preliminary findings) | Route with warning context |
CONSPIRACY |
False claim, conspiracy theory, or dangerous misinformation | Block or flag for human review |
Design Philosophy
Most fact-check models (LIAR, FEVER, PolitiFact-based) fail catastrophically on conspiracy theories β they either bypass them as "not worth checking" (mmbert32k-factcheck-classifier) or actively endorse them as SUPPORTS (distilbert-factcheck). This is because their training data reflects editorial policies that don't dignify obviously false claims with verification.
Credibility Gate v3 was trained specifically to catch the "beneath refutation" void where radicalization pipelines live. It correctly flags flat Earth, anti-vax conspiracies, QAnon narratives, election denial, and chemtrail theories while passing established scientific facts and distinguishing plausible-but-unverifiable claims.
Intended Use
Primary Use Case
Pre-LLM content safety gate in a multi-resolver signal analysis pipeline. Position: after jailbreak detector, before routing-model.
Pipeline Position
REQUEST
-> (jailbreak detector)
-> (THIS MODEL β 3-class credibility)
-> (routing-model β complexity + domain)
-> (pii-classifier)
-> (intent-classifier)
-> (hallucination-checker)
-> (semantic-router)
-> (citation verification)
RESPONSE
Out-of-Scope
- Not a fact-verification engine β classifies linguistic patterns, not ground truth
- Primary training language is English; French examples provide cross-lingual signal but accuracy on non-English text is not validated
- Not for automated censorship without human oversight
- Does not handle multimodal content (images, video)
Training
Architecture
- Base model: answerdotai/ModernBERT-large
- Parameters: 396M (28 layers, 1024 hidden, 16 attention heads)
- Context: 8192 tokens (trained at 512 max length)
- Optimizations: bfloat16 mixed precision, gradient checkpointing, dynamic padding, fused AdamW optimizer
Training Data
| Source | Examples | Classes | Notes |
|---|---|---|---|
| fake-news-detection-dataset-English | 30,000 | TRUTHFUL + CONSPIRACY | Binary real/fake news articles |
| CredibilityCorpus β rumors_disinformation.txt | 1,612 | CONSPIRACY (374) + MIXED (1,238) | Real-world rumor tracker data |
| CredibilityCorpus β hollande.txt | 370 | MIXED | French political claims |
| CredibilityCorpus β lemon.txt | 269 | MIXED | French news claims |
| CredibilityCorpus β pin.txt | 678 | MIXED | Multilingual claims |
| CredibilityCorpus β swine-flu.txt | 1,023 | TRUTHFUL (183) + MIXED (840) | Health-related claims |
| Synthetic CONSPIRACY | 20 | CONSPIRACY | Hand-crafted conspiracy narratives |
| Synthetic MIXED | 100 | MIXED | 10 categories x 10 examples each |
| Synthetic TRUTHFUL | 10 | TRUTHFUL | Established scientific/historical facts |
| Total | 34,083 |
Class Distribution
| Class | Count | % |
|---|---|---|
| TRUTHFUL | 15,671 | 46.0% |
| MIXED | 3,496 | 10.3% |
| CONSPIRACY | 14,916 | 43.8% |
CredibilityCorpus Sources
3,953 real-world examples from 7 corpus files covering:
- Rumors & disinformation (rumors_disinformation.txt) β tracked online rumors with verified outcomes
- French political claims (hollande.txt, lemon.txt) β cross-lingual credibility signals
- Multilingual claims (pin.txt) β diverse source material
- Health misinformation (swine-flu.txt) β domain-specific rumor tracking
- Social media (randomtweets*.txt, RihannaConcert*.txt, UEFA_Euro_2016*.txt) β real-world tweet-level claims
Hyperparameters
- Epochs: 3
- Learning rate: 5e-5 (linear decay)
- Batch size: 12 (effective 24 with gradient accumulation x2)
- Steps: 3,834 total (1,278 per epoch)
- Optimizer: AdamW (fused)
- Max sequence length: 512
- Precision: bfloat16
- Gradient checkpointing: enabled
- Hardware: NVIDIA RTX 5060 Ti (17.1 GB VRAM), CUDA 12.8, Windows 10
- Training time: 7,024s (117 minutes)
Performance
Evaluation Metrics (held-out test set, ~3,413 examples)
| Metric | Epoch 1 | Epoch 2 | Epoch 3 |
|---|---|---|---|
| Eval Loss | 0.00837 | 0.00608 | 0.00262 |
| F1 (weighted) | 0.9976 | 0.9985 | 0.9991 |
| Accuracy | 0.9977 | 0.9985 | 0.9991 |
Training Loss Curve
| Epoch | Train Loss | Gradient Norm |
|---|---|---|
| 0.0 | 0.5372 | 1.73 |
| 0.5 | 0.0462 | 0.74 |
| 1.0 | 0.0311 | 0.59 |
| 1.5 | 0.0197 | 0.17 |
| 2.0 | 0.0073 | 0.00 |
| 2.5 | 0.0001 | 0.00 |
| 3.0 | 0.0041 | 5.45 |
Convergence reached by epoch ~2.5. Loss at epoch 3 endpoint: 0.0041.
Smoke Test (v3)
| Claim | Verdict | Confidence |
|---|---|---|
| "The Earth is flat and NASA faked the moon landing." | CONSPIRACY | 0.9999 |
| "The Earth orbits the Sun at 93 million miles." | TRUTHFUL | 1.0000 |
| "COVID-19 vaccines contain microchips." | CONSPIRACY | 0.9950 |
| "A new study suggests fasting reduces inflammation." | MIXED | 1.0000 |
Conspiracy Detection (7 claims vs baselines)
| Model | Caught | Notes |
|---|---|---|
| credibility-gate-v3 (this model) | 7/7 | 3-class with real-world MIXED nuance |
| credibility-gate-v1 | 7/7 | Synthetic MIXED only |
| modernbert_conspiracy_classifier | 7/7 | Binary only, no credibility scoring |
| roberta-credibility | 5/7 | Misses "election stolen" and "moon landing" |
| mmbert32k-factcheck-classifier | 0/7 | Classifies ALL as NO_FACT_CHECK_NEEDED |
| distilbert-factcheck | 0/7 | Classifies ALL as SUPPORTS (active endorsement) |
Usage
Quick Start with Transformers
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="EpsilonGreedyAI/credibility-gate-v3",
device=0 # GPU, or -1 for CPU
)
# Classify a claim
result = classifier("The Earth is flat and NASA faked the moon landing.")
print(result)
# [{'label': 'CONSPIRACY', 'score': 0.99}]
# Batch classification
texts = [
"The Earth orbits the Sun at 93 million miles.",
"Anonymous sources claim the CEO is stepping down.",
"5G towers are causing the coronavirus.",
]
results = classifier(texts)
for text, r in zip(texts, results):
print(f"{r['label']} ({r['score']:.2f}): {text}")
Loading with PyTorch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained(
"EpsilonGreedyAI/credibility-gate-v3",
dtype=torch.float32,
)
tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/credibility-gate-v3")
inputs = tokenizer("Climate change is a hoax.", return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
predicted = probs.argmax().item()
label = model.config.id2label[str(predicted)]
print(f"{label}: {probs[0][predicted]:.4f}")
Inference Performance
| Hardware | Latency | Batch Size |
|---|---|---|
| RTX 5060 Ti (GPU) | ~5ms | 1 |
| RTX 5060 Ti (GPU) | ~15ms | 8 |
Using with ONNX Runtime (CPU deployment)
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
model = ORTModelForSequenceClassification.from_pretrained(
"EpsilonGreedyAI/credibility-gate-v3",
export=True,
)
tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/credibility-gate-v3")
Limitations
Known Weaknesses
- Primary language is English β CredibilityCorpus includes French examples for cross-lingual signal, but accuracy on non-English text is not validated against a held-out multilingual test set
- MIXED class is smallest (10.3%) β despite CredibilityCorpus addition, MIXED remains the minority class. Real-world class imbalance reflects the data landscape but may affect recall on edge cases
- Satire/Sarcasm β may misclassify obvious satire (The Onion) as CONSPIRACY
- Novel conspiracies β trained on known conspiracy patterns; emerging or novel conspiracy narratives may not be detected
- Confidence calibration β confidence scores are softmax outputs, not calibrated probabilities
- Social media noise β several CredibilityCorpus tweet files contained 0 parseable examples; real-time social media ingestion would require dedicated preprocessing
Bias Considerations
- Training data reflects English-language news media biases
- CONSPIRACY class is weighted toward Western conspiracy theories
- CredibilityCorpus sources may reflect the biases of their original curators
- French-language examples (hollande.txt, lemon.txt) are primarily political claims β not a balanced cross-lingual sample
Version History
| Version | Date | Key Change |
|---|---|---|
| v1 | 2026-06-15 | Initial β 30K binary + 130 synthetic, 19/19 test accuracy |
| v2 | 2026-06-15 | Fixed MIXED CONF bias β 100 balanced examples, 19/19 accuracy |
| v3 | 2026-06-15 | CredibilityCorpus integration β 3,953 real-world examples, 34K total, 3 epochs, 99.91% on full test split |
Citation
@misc{epsilon-greedy-ai-credibility-gate-v3,
author = {EpsilonGreedyAI},
title = {Credibility Gate v3 β Corpus-enhanced 3-class credibility classifier for AI safety pipelines},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/EpsilonGreedyAI/credibility-gate-v3}},
}
License
Apache 2.0
Built for a custom multiple-resolver signal analysis pipeline. Trained on Windows 10, RTX 5060 Ti 17GB, Python 3.14, torch 2.11.0+cu128, transformers 5.5.0.
- Downloads last month
- 22
Model tree for EpsilonGreedyAI/EGAI-credibility-gate-v3
Base model
answerdotai/ModernBERT-largeEvaluation results
- accuracy on ErfanMoosaviMonazzah/fake-news-detection-dataset-English + CredibilityCorpus (multilingual) + syntheticself-reported0.999
- f1 on ErfanMoosaviMonazzah/fake-news-detection-dataset-English + CredibilityCorpus (multilingual) + syntheticself-reported0.999