|
|
--- |
|
|
license: other |
|
|
license_name: collapse-index-open-model-license |
|
|
license_link: LICENSE.md |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-classification |
|
|
- distilbert |
|
|
- rhetorical-confidence |
|
|
- behavioral-stability |
|
|
- type-i-ghost-detection |
|
|
- ai-safety |
|
|
base_model: distilbert-base-uncased |
|
|
datasets: |
|
|
- synthetic |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# ProBERT v1.0 |
|
|
|
|
|
 |
|
|
|
|
|
## What ProBERT Does |
|
|
|
|
|
**Detects rhetorical overconfidence in text.** |
|
|
|
|
|
ProBERT classifies text into three patterns: |
|
|
- ✅ **process_clarity** - Step-by-step reasoning you can verify |
|
|
- ⚠️ **rhetorical_confidence** - Assertive claims without supporting process |
|
|
- 🔄 **scope_blur** - Vague generalizations with ambiguous boundaries |
|
|
|
|
|
Use it to flag risky language in LLM outputs, documentation, support tickets, or any text where confident assertions without reasoning could cause problems. |
|
|
|
|
|
**Trained on just 450 examples (150 per class).** Achieves 95.6% accuracy and shows strong transfer signals to real-world domains it never saw during training. When tested on Yelp reviews, ProBERT and untrained base DistilBERT disagree 84% of the time—suggesting the training added real capability, not just noise. |
|
|
|
|
|
**Why safety teams care:** When you evaluate ProBERT itself under perturbation testing (the Collapse Index protocol), it exhibits **zero Type I errors**—predictions that are stable, confident, and wrong. Most models have 5-15% Type I errors. ProBERT: 0. Additionally, ProBERT is **underconfident by design**: 98.4% accuracy but only 72.5% mean confidence (26% confidence deficit). This conservative calibration means it doubts itself when right rather than being cocky when wrong—the safe direction for production deployments. |
|
|
|
|
|
--- |
|
|
|
|
|
## Table of Contents |
|
|
|
|
|
- [Model Card](#model-card) |
|
|
- [Model Details](#model-details) |
|
|
- [Performance](#performance) |
|
|
- [Training Impact Demonstration](#training-impact-demonstration-unlabeled-transfer) |
|
|
- [Metrics Explained](#metrics-explained) |
|
|
- [Reproducibility](#reproducibility) |
|
|
- [What It Does](#what-it-does) |
|
|
- [Quick Start](#quick-start) |
|
|
- [Proposed Use Cases](#proposed-use-cases) |
|
|
- [Design Choices](#design-choices) |
|
|
- [Limitations](#limitations) |
|
|
- [Maintenance & Updates](#maintenance--updates) |
|
|
- [License](#license) |
|
|
- [Citation](#citation) |
|
|
- [Attributions](#attributions) |
|
|
- [About Derivatives & Model Evaluation](#about-derivatives--model-evaluation) |
|
|
- [Contact and Resources](#contact-and-resources) |
|
|
- [Support](#support) |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Card |
|
|
|
|
|
**ProBERT v1.0** |
|
|
|
|
|
A 66M-parameter DistilBERT specialist trained to detect rhetorical overconfidence patterns. Fast, stable, and ready for production. |
|
|
|
|
|
### Model Details |
|
|
|
|
|
- **Model Type**: DistilBERT-based sequence classifier |
|
|
- **Parameters**: 66M (runs on CPU, no GPU required) |
|
|
- **Training Data**: 450 examples (150 per class, synthetic) |
|
|
- **Inference Speed**: ~30ms per sample on CPU (Intel i5, 8GB), <5ms on GPU |
|
|
- **Memory**: <500MB RAM required |
|
|
- **Classes**: 3 (process_clarity, rhetorical_confidence, scope_blur) |
|
|
- **License**: Collapse Index Open Model License v1.0 (permissive use + attribution) |
|
|
- **Released**: January 31, 2026 |
|
|
- **SHA256**: `288520E28AEC14D1BFA2474E2694CAF612070DCA839AAECDA3B95F12FE418A11` |
|
|
|
|
|
**Deployment-Ready:** No A100 clusters, no multi-GPU setups, no waiting. Deploy on a basic server, edge device, or even in-browser with ONNX. Production inference costs pennies. |
|
|
|
|
|
### Performance |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| Test Accuracy | 95.6% | |
|
|
| Macro F1 | 0.955 | |
|
|
| Collapse Index (CI) — Behavioral Stability | 0.003 | |
|
|
| Structural Retention (SRI) — Decision Coherence | 0.997 | |
|
|
| Type I Errors (Stable + Confident + Wrong) | 0 | |
|
|
| **Calibration (ECE)** | **0.263** | |
|
|
| **Confidence on Errors** | **0.673 max, 0.0% ≥ 0.8** | |
|
|
| **Accuracy vs Mean Confidence** | **98.4% acc, 72.5% conf (-26% gap)** | |
|
|
|
|
|
### Training Impact Demonstration (Unlabeled Transfer) |
|
|
|
|
|
**The Question:** Is ProBERT just a renamed DistilBERT, or did training actually matter? |
|
|
|
|
|
**The Test:** ProBERT trained on **450 synthetic examples** vs. vanilla DistilBERT with a **random 3-class classification head** (untrained baseline). Both tested on three real-world datasets they never saw during training (zero-shot, no fine-tuning). |
|
|
|
|
|
**Important:** These datasets are unlabeled for this 3-class taxonomy (process_clarity, rhetorical_confidence, scope_blur). We report confidence and agreement as behavioral transfer signals, not accuracy. No ground-truth labels exist, so no accuracy/F1/ECE can be computed per domain. Calibration (ECE) and accuracy are measured on the labeled 450-sample test split. |
|
|
|
|
|
| Dataset | Domain | Mean max-prob (ProBERT, unlabeled) | Mean max-prob (Base, unlabeled) | Top-1 agreement (unlabeled) | Training Impact | |
|
|
|---------|--------|-------------------------------------|-------------------------------------|------------------------------|-----------------| |
|
|
| **Python Code** | Clear technical | 0.744 | 0.359 | **94%** | 2x confidence boost - Base has weak signal, ProBERT makes it decisive | |
|
|
| **Dolly-15k** | Mixed instructions | 0.413 | 0.361 | **43%** | Pattern recognition - Training teaches structure on general content | |
|
|
| **Yelp Reviews** | Ambiguous narrative | 0.412 | 0.356 | **16%** | Essential learning - Base completely lost, ProBERT learned the pattern | |
|
|
|
|
|
### The Progression (94% → 43% → 16%) |
|
|
|
|
|
**Training matters MORE as content gets more ambiguous:** |
|
|
- **Clear signal (Python code):** Base model's embeddings capture some structure (94% agreement), but ProBERT doubles confidence (0.74 vs 0.36) and eliminates confusion |
|
|
- **Mixed content (Dolly-15k):** Moderate disagreement (43%) shows training teaches pattern recognition beyond embeddings alone |
|
|
- **Ambiguous narratives (Yelp):** Massive disagreement (16%) suggests training essential - base model predicts randomly, ProBERT learned scope_blur pattern |
|
|
|
|
|
Key Findings: |
|
|
1. **ProBERT is demonstrably different from base DistilBERT** - This isn't a renamed model. 450 synthetic examples produced strong behavioral transfer to completely unseen real-world domains |
|
|
2. **Insane data efficiency** - From 450 training examples to 84% disagreement with base model on Yelp (16% agreement = base is clueless, ProBERT learned the pattern) |
|
|
3. **Self-calibrating confidence** - High confidence (0.74) on clear signals, low confidence (0.40) on ambiguous data, no retraining required |
|
|
4. **Training impact scales with ambiguity** - On content where base models fail (16% agreement), ProBERT's training made the difference |
|
|
|
|
|
### Metrics Explained |
|
|
|
|
|
**Standard Metrics:** |
|
|
- **Test Accuracy (95.6%)**: Correct predictions on held-out test set |
|
|
- **Macro F1 (0.955)**: Balanced performance across all three classes |
|
|
|
|
|
**Behavioral Stability Metrics (Collapse Index Protocol):** |
|
|
|
|
|
- **Collapse Index (CI)**: Measures prediction stability under benign perturbations (typos, reformatting, synonyms). Lower is better. |
|
|
- CI ≤ 0.15 = Stable ✅ |
|
|
- CI > 0.45 = Unstable ⚠️ |
|
|
- **ProBERT: 0.003** (near-perfect stability) |
|
|
|
|
|
- **Structural Retention Index (SRI)**: Measures decision coherence—whether the model holds its reasoning structure across input variants. Higher is better. |
|
|
- SRI ≥ 0.85 = Good coherence ✅ |
|
|
- SRI < 0.40 = Breakdown 🚨 |
|
|
- **ProBERT: 0.997** (excellent coherence) |
|
|
|
|
|
- **Type I Errors**: Predictions that are stable (low CI), confident (high probability), but **wrong**. These are dangerous because they look like correct predictions behaviorally. Most models have 5-15% Type I errors. **ProBERT: 0**. |
|
|
|
|
|
**Calibration Metrics (Post-Training Validation):** |
|
|
|
|
|
- **ECE (Expected Calibration Error)**: Measures alignment between confidence and accuracy. Range 0-1, lower is better. |
|
|
- ECE ≤ 0.05 = Well-calibrated ✅ |
|
|
- ECE > 0.15 = Miscalibrated ⚠️ |
|
|
- **ProBERT: 0.263** (high, but from underconfidence—see below) |
|
|
|
|
|
- **Confidence Gap**: Accuracy minus mean confidence. Positive gap = underconfident (too cautious), negative gap = overconfident (too bold). |
|
|
- **ProBERT: +26%** (98.4% accuracy, 72.5% mean confidence) |
|
|
- Model doubts itself when RIGHT, not cocky when wrong |
|
|
|
|
|
- **High-Confidence Error Rate**: What % of high-confidence predictions (≥0.8) are wrong? |
|
|
- Most models: 3-10% errors at high confidence |
|
|
- **ProBERT: 0%** (perfect separation—all 7 errors below 0.7 confidence) |
|
|
|
|
|
**What this means:** ProBERT doesn't just predict accurately, it predicts *consistently and coherently* across different wordings of the same input. The high ECE comes from **conservative calibration**—ProBERT is less confident than it should be, which makes it safer for production (won't confidently output wrong answers). When combined with perturbation testing, you get a complete picture of model reliability. |
|
|
|
|
|
**Transparency Note:** Current calibration metrics measured on synthetic test set (450 samples). For production use cases requiring OOD calibration validation, we recommend evaluating on domain-specific held-out data to confirm conservative calibration holds. |
|
|
|
|
|
**Reproducibility:** |
|
|
|
|
|
All calibration metrics can be reproduced using the included evaluation script: |
|
|
|
|
|
```bash |
|
|
# Auto-detect mode (uses defaults) |
|
|
python eval_calibration.py --probert |
|
|
|
|
|
# Explicit paths (for custom locations) |
|
|
python eval_calibration.py \ |
|
|
--model_dir probert_model \ |
|
|
--csv probert_training_20260131_004706.csv |
|
|
``` |
|
|
|
|
|
The `--probert` flag auto-detects the model directory and latest predictions CSV. The script computes ECE, confidence gaps, and high-confidence error rates. Full source included in the model repository for transparency. |
|
|
|
|
|
**Evaluation Transparency:** |
|
|
|
|
|
| Component | Status | |
|
|
|-----------|--------| |
|
|
| Metric definitions (CI, SRI, Type I) | Open (see case study) | |
|
|
| Perturbation protocol | Proprietary | |
|
|
| Evaluation thresholds | Fixed (documented above) | |
|
|
| Full methodology | Available via evaluation services | |
|
|
|
|
|
### What It Does |
|
|
|
|
|
ProBERT classifies text into three patterns: |
|
|
|
|
|
| Class | Description | Example | |
|
|
|-------|-------------|---------| |
|
|
| **process_clarity** | Step-by-step, testable reasoning | "Step 1: Check input. Step 2: Validate schema. If invalid, return error." | |
|
|
| **rhetorical_confidence** | Authority without process | "This revolutionary approach will transform your business and guarantee results." | |
|
|
| **scope_blur** | Vague generalizations | "Trust your intuition and embrace the journey. The universe has a plan." | |
|
|
|
|
|
**Important:** ProBERT flags `rhetorical_confidence` as a **risk signal, not a truth judgment**. Some domains (executive summaries, medical conclusions, legal holdings) legitimately require confident language without step-by-step exposition. Context determines appropriateness—ProBERT provides the signal, you provide the judgment. |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("collapseindex/ProBERT-1.0") |
|
|
tokenizer = AutoTokenizer.from_pretrained("collapseindex/ProBERT-1.0") |
|
|
|
|
|
text = "This revolutionary AI will transform your business" |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
|
|
outputs = model(**inputs) |
|
|
probs = torch.softmax(outputs.logits, dim=1)[0] |
|
|
|
|
|
# [process_clarity, rhetorical_confidence, scope_blur] |
|
|
print(f"Scores: {probs}") |
|
|
# → rhetorical_confidence will be highest (~0.67) |
|
|
``` |
|
|
|
|
|
### Proposed Use Cases |
|
|
|
|
|
**Safety & Compliance:** |
|
|
|
|
|
1. **LLM Output Validation**: Flag when your model makes assertions without showing its work |
|
|
2. **Medical/Legal Documentation**: Detect confident claims without explicit reasoning (liability risk) |
|
|
3. **Prompt Injection Detection**: Catch authority-without-reasoning attempts to override system instructions |
|
|
4. **Regulatory Filing Review**: Ensure procedures documented with *how*, not just mandates |
|
|
|
|
|
**Output Quality:** |
|
|
|
|
|
5. **LLM Output Filtering**: Keep only high-clarity responses, reject rhetorical patterns |
|
|
6. **Chatbot Moderation**: Flag confident hallucinations before deployment |
|
|
7. **Customer Support Grading**: Distinguish confident-but-vague responses from clear solutions |
|
|
8. **Grant/Research Proposal Screening**: Detect overclaims without methodology |
|
|
|
|
|
**Data & Training:** |
|
|
|
|
|
9. **Training Data Cleaning**: Filter instruction datasets for process-driven examples only |
|
|
10. **Synthetic Data Detection**: ML-generated text has rhetorical patterns + no process chain |
|
|
11. **Code Review Automation**: Flag comments that are rhetorical vs genuinely explanatory |
|
|
12. **Resume Parsing**: Detect buzzword-heavy claims vs specific accomplishments |
|
|
|
|
|
**Measurement & Comparison:** |
|
|
|
|
|
13. **Safety Benchmarking**: Compare models on their ability to avoid Type I failures |
|
|
14. **CI Stability Anchor**: Combine with behavior metrics (ProBERT scores + perturbation tests = definitive Type I measurement) |
|
|
|
|
|
### License |
|
|
|
|
|
**Collapse Index Open Model License v1.0** - A permissive license designed to maximize adoption while protecting methodology and evaluation claims. |
|
|
|
|
|
**What you CAN do (no cost, no permission needed):** |
|
|
- ✅ Use commercially (including SaaS, products, internal tools) |
|
|
- ✅ Create derivatives (fine-tune, distill, ensemble, etc.) |
|
|
- ✅ Distribute and redistribute (including modified versions) |
|
|
- ✅ Use for research, education, or personal projects |
|
|
|
|
|
**What you MUST do:** |
|
|
- 📝 **Attribution**: Include "Built with ProBERT™" in documentation/UI |
|
|
- 📝 Provide copyright notice and link to license |
|
|
|
|
|
**What you CAN'T do without authorization:** |
|
|
- ❌ Claim "Collapse Index validated" or "CI-evaluated" without providing validation data OR obtaining official evaluation services |
|
|
- ❌ Remove or bypass safety/calibration mechanisms |
|
|
- ❌ Use ProBERT™, Collapse Index™, or Type I Ghost Detection™ trademarks to imply endorsement |
|
|
|
|
|
**License terminates if you:** |
|
|
- Sue us for patent infringement |
|
|
- Remove safety mechanisms from the model |
|
|
- Make false evaluation claims |
|
|
|
|
|
**Key Protection:** The license is permissive (like Apache 2.0) for model use, but protects the **Collapse Index evaluation methodology**. You can train derivatives freely, but can't claim they're "Type I ghost validated" without backing it up. |
|
|
|
|
|
**Full license text:** [LICENSE.md](LICENSE.md) |
|
|
|
|
|
### Citation |
|
|
|
|
|
```bibtex |
|
|
@software{kwon2026probert, |
|
|
author = {Kwon, Alex}, |
|
|
title = {ProBERT: Process-First BERT for Rhetorical Confidence Detection}, |
|
|
version = {1.0}, |
|
|
year = {2026}, |
|
|
month = jan, |
|
|
note = {66M-parameter specialist achieving 95.6\% accuracy with zero Type I ghosts}, |
|
|
url = {https://huggingface.co/collapseindex/ProBERT-1.0}, |
|
|
orcid = {0009-0002-2566-5538}, |
|
|
} |
|
|
``` |
|
|
|
|
|
### Attributions |
|
|
|
|
|
**ProBERT** is built on [DistilBERT](https://github.com/huggingface/transformers), which is distributed under the Apache 2.0 license. See [ATTRIBUTIONS.md](ATTRIBUTIONS.md) for full license text. |
|
|
|
|
|
### Design Choices |
|
|
|
|
|
**Why Synthetic Training?** |
|
|
|
|
|
Modern datasets are contaminated. Real LinkedIn posts have been through GPT/Claude. Customer support tickets got the "AI improve this" treatment. Grant proposals use the ChatGPT rewrite button. Research papers get polished by Anthropic's writing assistant. |
|
|
|
|
|
Training on clean synthetic data means ProBERT learned *actual rhetorical patterns*, not LLM artifacts. So when it detects `rhetorical_confidence`, you're getting signal about genuine overconfident reasoning—not just "this smells like ChatGPT polished it." |
|
|
|
|
|
**The upside**: Clean signal, zero LLM contamination, measures what matters. |
|
|
**The tradeoff**: May not generalize perfectly to highly domain-specific professional jargon (but that's a feature, not a bug—domain-specific jargon *should* be validated separately). |
|
|
|
|
|
### Limitations |
|
|
|
|
|
- **English only**: Trained on English text patterns |
|
|
- **128 token max**: Longer documents will be truncated |
|
|
- **3 classes**: Fine-grained pattern distinction within these categories not available |
|
|
|
|
|
### Maintenance & Updates |
|
|
|
|
|
ProBERT-1.0 is production frozen. |
|
|
|
|
|
- **Bug reports** - Submit via [GitHub issues](https://github.com/collapseindex/ProBERT-1.0/issues) |
|
|
- **Feature requests** - Accepted but evaluated for ProBERT-2.0 planning |
|
|
- **Updates cadence** - Quarterly or as-needed for critical fixes |
|
|
- **Versions** - All versions available on HuggingFace with full changelogs |
|
|
|
|
|
ProBERT prioritizes stability over rapid iteration. Once deployed, you can trust the weights won't change unexpectedly. |
|
|
|
|
|
**Versioning:** |
|
|
- **ProBERT-1.0** - You are here (frozen) |
|
|
- **ProBERT-1.1** - Bug fixes + minor improvements (if needed) |
|
|
- **ProBERT-2.0** - Major retraining (multilingual, larger dataset, new architecture) |
|
|
|
|
|
**About Derivatives & Model Evaluation:** |
|
|
|
|
|
Planning to fine-tune ProBERT or improve your own model? We recommend validating on Collapse Index stability metrics, a methodology that measures Type I ghosts, coherence degradation, and behavioral stability. |
|
|
|
|
|
**[Get your training evaluated](https://collapseindex.org/evals.html)** - Whether you're fine-tuning ProBERT, benchmarking your own model, or validating a derivative, we offer custom evaluation using the same proprietary methodology that validated ProBERT. |
|
|
|
|
|
### Contact and Resources |
|
|
|
|
|
**Collapse Index Labs** |
|
|
|
|
|
For safety teams, research institutions, or labs building Type I ghost detection into your pipeline: |
|
|
|
|
|
**ask@collapseindex.org** |
|
|
|
|
|
**Case Study**: https://collapseindex.org/case-studies/template.html?s=probert-case-study |
|
|
|
|
|
**GitHub**: https://github.com/collapseindex/ProBERT-1.0 |
|
|
|
|
|
**HuggingFace**: https://huggingface.co/collapseindex/ProBERT-1.0 |
|
|
|
|
|
**Website**: https://collapseindex.org/ |
|
|
|
|
|
### Support |
|
|
|
|
|
ProBERT is free and open-source. If you find it useful, consider supporting continued development: |
|
|
|
|
|
**[☕ Buy me a coffee](https://ko-fi.com/collapseindex)** - Help fund ProBERT maintenance and future versions. |
|
|
|
|
|
--- |
|
|
|