Text Classification
Transformers
Safetensors
English
French
modernbert
credibility
conspiracy-detection
fake-news
misinformation
safety
multilingual
Eval Results (legacy)
text-embeddings-inference
Instructions to use EpsilonGreedyAI/EGAI-credibility-gate-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EpsilonGreedyAI/EGAI-credibility-gate-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="EpsilonGreedyAI/EGAI-credibility-gate-v3")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/EGAI-credibility-gate-v3") model = AutoModelForSequenceClassification.from_pretrained("EpsilonGreedyAI/EGAI-credibility-gate-v3") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| license: apache-2.0 | |
| base_model: answerdotai/ModernBERT-large | |
| tags: | |
| - text-classification | |
| - credibility | |
| - conspiracy-detection | |
| - fake-news | |
| - misinformation | |
| - safety | |
| - modernbert | |
| - multilingual | |
| pipeline_tag: text-classification | |
| language: | |
| - en | |
| - fr | |
| metrics: | |
| - accuracy | |
| - f1 | |
| model-index: | |
| - name: credibility-gate-v3 | |
| results: | |
| - task: | |
| type: text-classification | |
| dataset: | |
| name: ErfanMoosaviMonazzah/fake-news-detection-dataset-English + CredibilityCorpus (multilingual) + synthetic | |
| type: custom | |
| metrics: | |
| - type: accuracy | |
| value: 0.9991 | |
| - type: f1 | |
| value: 0.9991 | |
| # Credibility Gate v3 | |
| 3-class credibility classifier: **TRUTHFUL / MIXED / CONSPIRACY**. Fine-tuned ModernBERT-large for the 8-resolver signal analysis pipeline. Corpus-enhanced, multilingual-aware successor to v1/v2. | |
| Developed by **EpsilonGreedyAI** | |
| - HuggingFace: https://huggingface.co/EpsilonGreedyAI | |
| --- | |
| ## What's New in v3 | |
| | Version | Dataset | Epochs | Accuracy | F1 | Key Change | | |
| |---------|---------|:------:|:--------:|:--:|------------| | |
| | v1 | 30K fake-news + 130 synthetic | 2 | 1.000* | 1.000* | Initial release, 19-example test | | |
| | v2 | v1 data + 100 balanced MIXED | 2 | 1.000* | 1.000* | Fixed CONF bias (27% to 100% MIXED) | | |
| | **v3** | v2 data + **3,953 CredibilityCorpus examples** | 3 | **0.9991** | **0.9991** | Real-world multilingual, 34K total | | |
| *Small held-out set (19 examples). v3 evaluated on full 3,413-example test split. | |
| ### Key Improvements | |
| 1. **Real-world training data** β 3,953 examples from CredibilityCorpus (rumors, disinformation, tweets, news articles in English and French), replacing hand-crafted synthetic MIXED examples with authentic ambiguous claims | |
| 2. **34K total dataset** β 15,671 TRUTHFUL / 3,496 MIXED / 14,916 CONSPIRACY, up from 30K binary + 130 synthetic | |
| 3. **3 epochs** β extended training for sharper boundary confidence, from 2 epochs (v1/v2) to 3 | |
| 4. **Multilingual awareness** β French-language credibility examples (hollande.txt, UEFA_Euro_2016_Fr.txt) provide cross-lingual signal exposure | |
| 5. **Proper test split** β evaluation on 3,413 held-out examples (10%), not small hand-picked set | |
| 6. **Training time** β 117 minutes (7,024s) on RTX 5060 Ti 17GB, bf16 + gradient checkpointing | |
| --- | |
| ## Model Description | |
| Credibility Gate v3 is a production-grade content safety classifier that scores input text on a 3-tier credibility spectrum. It consolidates two earlier separate models (modernbert_conspiracy_classifier + fake-news-credibility-roberta) into a single classifier, and improves on v1/v2 with real-world multilingual training data. | |
| ### Labels | |
| | Label | Meaning | Pipeline Action | | |
| |-------|---------|----------------| | |
| | `TRUTHFUL` | Established fact or common knowledge | Route directly to LLM | | |
| | `MIXED` | Plausible but unverifiable (rumors, anonymous sources, preliminary findings) | Route with warning context | | |
| | `CONSPIRACY` | False claim, conspiracy theory, or dangerous misinformation | Block or flag for human review | | |
| ### Design Philosophy | |
| Most fact-check models (LIAR, FEVER, PolitiFact-based) fail catastrophically on conspiracy theories β they either bypass them as "not worth checking" (mmbert32k-factcheck-classifier) or actively endorse them as SUPPORTS (distilbert-factcheck). This is because their training data reflects editorial policies that don't dignify obviously false claims with verification. | |
| Credibility Gate v3 was trained specifically to catch the "beneath refutation" void where radicalization pipelines live. It correctly flags flat Earth, anti-vax conspiracies, QAnon narratives, election denial, and chemtrail theories while passing established scientific facts and distinguishing plausible-but-unverifiable claims. | |
| --- | |
| ## Intended Use | |
| ### Primary Use Case | |
| Pre-LLM content safety gate in a multi-resolver signal analysis pipeline. Position: after jailbreak detector, before routing-model. | |
| ### Pipeline Position | |
| ``` | |
| REQUEST | |
| -> (jailbreak detector) | |
| -> (THIS MODEL β 3-class credibility) | |
| -> (routing-model β complexity + domain) | |
| -> (pii-classifier) | |
| -> (intent-classifier) | |
| -> (hallucination-checker) | |
| -> (semantic-router) | |
| -> (citation verification) | |
| RESPONSE | |
| ``` | |
| ### Out-of-Scope | |
| - Not a fact-verification engine β classifies linguistic patterns, not ground truth | |
| - Primary training language is English; French examples provide cross-lingual signal but accuracy on non-English text is not validated | |
| - Not for automated censorship without human oversight | |
| - Does not handle multimodal content (images, video) | |
| --- | |
| ## Training | |
| ### Architecture | |
| - **Base model:** [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) | |
| - **Parameters:** 396M (28 layers, 1024 hidden, 16 attention heads) | |
| - **Context:** 8192 tokens (trained at 512 max length) | |
| - **Optimizations:** bfloat16 mixed precision, gradient checkpointing, dynamic padding, fused AdamW optimizer | |
| ### Training Data | |
| | Source | Examples | Classes | Notes | | |
| |--------|:--------:|---------|-------| | |
| | fake-news-detection-dataset-English | 30,000 | TRUTHFUL + CONSPIRACY | Binary real/fake news articles | | |
| | CredibilityCorpus β rumors_disinformation.txt | 1,612 | CONSPIRACY (374) + MIXED (1,238) | Real-world rumor tracker data | | |
| | CredibilityCorpus β hollande.txt | 370 | MIXED | French political claims | | |
| | CredibilityCorpus β lemon.txt | 269 | MIXED | French news claims | | |
| | CredibilityCorpus β pin.txt | 678 | MIXED | Multilingual claims | | |
| | CredibilityCorpus β swine-flu.txt | 1,023 | TRUTHFUL (183) + MIXED (840) | Health-related claims | | |
| | Synthetic CONSPIRACY | 20 | CONSPIRACY | Hand-crafted conspiracy narratives | | |
| | Synthetic MIXED | 100 | MIXED | 10 categories x 10 examples each | | |
| | Synthetic TRUTHFUL | 10 | TRUTHFUL | Established scientific/historical facts | | |
| | **Total** | **34,083** | | | | |
| ### Class Distribution | |
| | Class | Count | % | | |
| |-------|:-----:|:--:| | |
| | TRUTHFUL | 15,671 | 46.0% | | |
| | MIXED | 3,496 | 10.3% | | |
| | CONSPIRACY | 14,916 | 43.8% | | |
| ### CredibilityCorpus Sources | |
| 3,953 real-world examples from 7 corpus files covering: | |
| - **Rumors & disinformation** (rumors_disinformation.txt) β tracked online rumors with verified outcomes | |
| - **French political claims** (hollande.txt, lemon.txt) β cross-lingual credibility signals | |
| - **Multilingual claims** (pin.txt) β diverse source material | |
| - **Health misinformation** (swine-flu.txt) β domain-specific rumor tracking | |
| - **Social media** (randomtweets*.txt, RihannaConcert*.txt, UEFA_Euro_2016*.txt) β real-world tweet-level claims | |
| ### Hyperparameters | |
| - **Epochs:** 3 | |
| - **Learning rate:** 5e-5 (linear decay) | |
| - **Batch size:** 12 (effective 24 with gradient accumulation x2) | |
| - **Steps:** 3,834 total (1,278 per epoch) | |
| - **Optimizer:** AdamW (fused) | |
| - **Max sequence length:** 512 | |
| - **Precision:** bfloat16 | |
| - **Gradient checkpointing:** enabled | |
| - **Hardware:** NVIDIA RTX 5060 Ti (17.1 GB VRAM), CUDA 12.8, Windows 10 | |
| - **Training time:** 7,024s (117 minutes) | |
| --- | |
| ## Performance | |
| ### Evaluation Metrics (held-out test set, ~3,413 examples) | |
| | Metric | Epoch 1 | Epoch 2 | Epoch 3 | | |
| |--------|:-------:|:-------:|:-------:| | |
| | Eval Loss | 0.00837 | 0.00608 | **0.00262** | | |
| | F1 (weighted) | 0.9976 | 0.9985 | **0.9991** | | |
| | Accuracy | 0.9977 | 0.9985 | **0.9991** | | |
| ### Training Loss Curve | |
| | Epoch | Train Loss | Gradient Norm | | |
| |:-----:|:----------:|:-------------:| | |
| | 0.0 | 0.5372 | 1.73 | | |
| | 0.5 | 0.0462 | 0.74 | | |
| | 1.0 | 0.0311 | 0.59 | | |
| | 1.5 | 0.0197 | 0.17 | | |
| | 2.0 | 0.0073 | 0.00 | | |
| | 2.5 | 0.0001 | 0.00 | | |
| | 3.0 | 0.0041 | 5.45 | | |
| Convergence reached by epoch ~2.5. Loss at epoch 3 endpoint: 0.0041. | |
| ### Smoke Test (v3) | |
| | Claim | Verdict | Confidence | | |
| |-------|---------|:----------:| | |
| | "The Earth is flat and NASA faked the moon landing." | CONSPIRACY | 0.9999 | | |
| | "The Earth orbits the Sun at 93 million miles." | TRUTHFUL | 1.0000 | | |
| | "COVID-19 vaccines contain microchips." | CONSPIRACY | 0.9950 | | |
| | "A new study suggests fasting reduces inflammation." | MIXED | 1.0000 | | |
| ### Conspiracy Detection (7 claims vs baselines) | |
| | Model | Caught | Notes | | |
| |-------|:------:|-------| | |
| | **credibility-gate-v3 (this model)** | **7/7** | 3-class with real-world MIXED nuance | | |
| | credibility-gate-v1 | 7/7 | Synthetic MIXED only | | |
| | modernbert_conspiracy_classifier | 7/7 | Binary only, no credibility scoring | | |
| | roberta-credibility | 5/7 | Misses "election stolen" and "moon landing" | | |
| | mmbert32k-factcheck-classifier | 0/7 | Classifies ALL as NO_FACT_CHECK_NEEDED | | |
| | distilbert-factcheck | 0/7 | Classifies ALL as SUPPORTS (active endorsement) | | |
| --- | |
| ## Usage | |
| ### Quick Start with Transformers | |
| ```python | |
| from transformers import pipeline | |
| classifier = pipeline( | |
| "text-classification", | |
| model="EpsilonGreedyAI/credibility-gate-v3", | |
| device=0 # GPU, or -1 for CPU | |
| ) | |
| # Classify a claim | |
| result = classifier("The Earth is flat and NASA faked the moon landing.") | |
| print(result) | |
| # [{'label': 'CONSPIRACY', 'score': 0.99}] | |
| # Batch classification | |
| texts = [ | |
| "The Earth orbits the Sun at 93 million miles.", | |
| "Anonymous sources claim the CEO is stepping down.", | |
| "5G towers are causing the coronavirus.", | |
| ] | |
| results = classifier(texts) | |
| for text, r in zip(texts, results): | |
| print(f"{r['label']} ({r['score']:.2f}): {text}") | |
| ``` | |
| ### Loading with PyTorch | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| model = AutoModelForSequenceClassification.from_pretrained( | |
| "EpsilonGreedyAI/credibility-gate-v3", | |
| dtype=torch.float32, | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/credibility-gate-v3") | |
| inputs = tokenizer("Climate change is a hoax.", return_tensors="pt") | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| probs = torch.softmax(outputs.logits, dim=1) | |
| predicted = probs.argmax().item() | |
| label = model.config.id2label[str(predicted)] | |
| print(f"{label}: {probs[0][predicted]:.4f}") | |
| ``` | |
| ### Inference Performance | |
| | Hardware | Latency | Batch Size | | |
| |----------|:-------:|:----------:| | |
| | RTX 5060 Ti (GPU) | ~5ms | 1 | | |
| | RTX 5060 Ti (GPU) | ~15ms | 8 | | |
| ### Using with ONNX Runtime (CPU deployment) | |
| ```python | |
| from optimum.onnxruntime import ORTModelForSequenceClassification | |
| from transformers import AutoTokenizer | |
| model = ORTModelForSequenceClassification.from_pretrained( | |
| "EpsilonGreedyAI/credibility-gate-v3", | |
| export=True, | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained("EpsilonGreedyAI/credibility-gate-v3") | |
| ``` | |
| --- | |
| ## Limitations | |
| ### Known Weaknesses | |
| 1. **Primary language is English** β CredibilityCorpus includes French examples for cross-lingual signal, but accuracy on non-English text is not validated against a held-out multilingual test set | |
| 2. **MIXED class is smallest (10.3%)** β despite CredibilityCorpus addition, MIXED remains the minority class. Real-world class imbalance reflects the data landscape but may affect recall on edge cases | |
| 3. **Satire/Sarcasm** β may misclassify obvious satire (The Onion) as CONSPIRACY | |
| 4. **Novel conspiracies** β trained on known conspiracy patterns; emerging or novel conspiracy narratives may not be detected | |
| 5. **Confidence calibration** β confidence scores are softmax outputs, not calibrated probabilities | |
| 6. **Social media noise** β several CredibilityCorpus tweet files contained 0 parseable examples; real-time social media ingestion would require dedicated preprocessing | |
| ### Bias Considerations | |
| - Training data reflects English-language news media biases | |
| - CONSPIRACY class is weighted toward Western conspiracy theories | |
| - CredibilityCorpus sources may reflect the biases of their original curators | |
| - French-language examples (hollande.txt, lemon.txt) are primarily political claims β not a balanced cross-lingual sample | |
| --- | |
| ## Version History | |
| | Version | Date | Key Change | | |
| |---------|------|------------| | |
| | v1 | 2026-06-15 | Initial β 30K binary + 130 synthetic, 19/19 test accuracy | | |
| | v2 | 2026-06-15 | Fixed MIXED CONF bias β 100 balanced examples, 19/19 accuracy | | |
| | **v3** | **2026-06-15** | **CredibilityCorpus integration β 3,953 real-world examples, 34K total, 3 epochs, 99.91% on full test split** | | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @misc{epsilon-greedy-ai-credibility-gate-v3, | |
| author = {EpsilonGreedyAI}, | |
| title = {Credibility Gate v3 β Corpus-enhanced 3-class credibility classifier for AI safety pipelines}, | |
| year = {2026}, | |
| publisher = {Hugging Face}, | |
| howpublished = {\url{https://huggingface.co/EpsilonGreedyAI/credibility-gate-v3}}, | |
| } | |
| ``` | |
| ## License | |
| Apache 2.0 | |
| --- | |
| *Built for a custom multiple-resolver signal analysis pipeline. Trained on Windows 10, RTX 5060 Ti 17GB, Python 3.14, torch 2.11.0+cu128, transformers 5.5.0.* | |