collapseindex
/

ProBERT-1.0

+---
+license: other
+license_name: collapse-index-open-model-license
+license_link: LICENSE.md
+language:
+- en
+library_name: transformers
+tags:
+- text-classification
+- distilbert
+- rhetorical-confidence
+- behavioral-stability
+- type-i-ghost-detection
+- ai-safety
+base_model: distilbert-base-uncased
+datasets:
+- synthetic
+metrics:
+- accuracy
+- f1
+pipeline_tag: text-classification
+---
+# ProBERT v1.0
+![ProBERT Banner](probertbanner.png)
+## What ProBERT Does
+**Detects rhetorical overconfidence in text.**
+ProBERT classifies text into three patterns:
+- ✅ **process_clarity** - Step-by-step reasoning you can verify
+- ⚠️ **rhetorical_confidence** - Assertive claims without supporting process
+- 🔄 **scope_blur** - Vague generalizations with ambiguous boundaries
+Use it to flag risky language in LLM outputs, documentation, support tickets, or any text where confident assertions without reasoning could cause problems.
+**Why safety teams care:** When you evaluate ProBERT itself under perturbation testing (the Collapse Index protocol), it exhibits **zero Type I errors**—predictions that are stable, confident, and wrong. Most models have 5-15% Type I errors. ProBERT: 0. This makes it a reliable signal for downstream safety systems.
+---
+## Table of Contents
+- [Model Card](#model-card)
+  - [Model Details](#model-details)
+  - [Performance](#performance)
+  - [Metrics Explained](#metrics-explained)
+  - [What It Does](#what-it-does)
+  - [Quick Start](#quick-start)
+- [Proposed Use Cases](#proposed-use-cases)
+- [Design Choices](#design-choices)
+- [Limitations](#limitations)
+- [Maintenance & Updates](#maintenance--updates)
+- [License](#license)
+- [Citation](#citation)
+- [Attributions](#attributions)
+- [About Derivatives & Model Evaluation](#about-derivatives--model-evaluation)
+- [Contact and Resources](#contact-and-resources)
+- [Support](#support)
+---
+## Model Card
+**ProBERT v1.0**
+A 66M-parameter DistilBERT specialist trained to detect rhetorical overconfidence patterns. Fast, stable, and ready for production.
+### Model Details
+- **Model Type**: DistilBERT-based sequence classifier
+- **Parameters**: 66M (runs on CPU, no GPU required)
+- **Inference Speed**: ~30ms per sample on CPU (Intel i5, 8GB), <5ms on GPU
+- **Memory**: <500MB RAM required
+- **Classes**: 3 (process_clarity, rhetorical_confidence, scope_blur)
+- **License**: Collapse Index Open Model License v1.0 (permissive use + attribution)
+- **Released**: January 31, 2026
+- **SHA256**: `288520E28AEC14D1BFA2474E2694CAF612070DCA839AAECDA3B95F12FE418A11`
+**Deployment-Ready:** No A100 clusters, no multi-GPU setups, no waiting. Deploy on a basic server, edge device, or even in-browser with ONNX. Production inference costs pennies.
+### Performance
+| Metric | Score |
+|--------|-------|
+| Test Accuracy | 95.6% |
+| Macro F1 | 0.955 |
+| Collapse Index (CI) — Behavioral Stability | 0.003 |
+| Structural Retention (SRI) — Decision Coherence | 0.997 |
+| Type I Errors (Stable + Confident + Wrong) | 0 |
+### Baseline Comparison: ProBERT vs. Vanilla DistilBERT
+**The Question:** Is ProBERT just a renamed DistilBERT, or did training actually matter?
+**The Test:** ProBERT (trained specialist) vs. vanilla DistilBERT with a **random 3-class classification head** (untrained baseline) on three real-world datasets (zero-shot, no fine-tuning):
+| Dataset | Domain | ProBERT Conf | Base Conf | Agreement | Training Impact |
+|---------|--------|--------------|-----------|-----------|-----------------|
+| **Python Code** | Clear technical | 0.744 | 0.359 | **94%** | 2x confidence boost - Base has weak signal, ProBERT makes it decisive |
+| **Dolly-15k** | Mixed instructions | 0.413 | 0.361 | **43%** | Pattern recognition - Training teaches structure on general content |
+| **Yelp Reviews** | Ambiguous narrative | 0.412 | 0.356 | **16%** | Essential learning - Base completely lost, ProBERT learned the pattern |
+### The Progression (94% → 43% → 16%)
+**Training matters MORE as content gets more ambiguous:**
+- **Clear signal (Python code):** Base model's embeddings capture some structure (94% agreement), but ProBERT doubles confidence (0.74 vs 0.36) and eliminates confusion
+- **Mixed content (Dolly-15k):** Moderate disagreement (43%) shows training teaches pattern recognition beyond embeddings alone
+- **Ambiguous narratives (Yelp):** Massive disagreement (16%) proves training essential - base model predicts randomly, ProBERT learned scope_blur pattern
+**Key Findings:**
+1. **ProBERT is demonstrably different from base DistilBERT** - This isn't a renamed model, the training generalized perfectly from synthetic data to real-world domains
+2. **Self-calibrating confidence** - High confidence (0.74) on clear signals, low confidence (0.40) on ambiguous data, no retraining required
+3. **Training impact scales with ambiguity** - On content where base models fail (16% agreement), ProBERT's training made the difference
+### Metrics Explained
+**Standard Metrics:**
+- **Test Accuracy (95.6%)**: Correct predictions on held-out test set
+- **Macro F1 (0.955)**: Balanced performance across all three classes
+**Behavioral Stability Metrics (Collapse Index Protocol):**
+- **Collapse Index (CI)**: Measures prediction stability under benign perturbations (typos, reformatting, synonyms). Lower is better.
+  - CI ≤ 0.15 = Stable ✅
+  - CI > 0.45 = Unstable ⚠️
+  - **ProBERT: 0.003** (near-perfect stability)
+- **Structural Retention Index (SRI)**: Measures decision coherence—whether the model holds its reasoning structure across input variants. Higher is better.
+  - SRI ≥ 0.85 = Good coherence ✅
+  - SRI < 0.40 = Breakdown 🚨
+  - **ProBERT: 0.997** (excellent coherence)
+- **Type I Errors**: Predictions that are stable (low CI), confident (high probability), but **wrong**. These are dangerous because they look like correct predictions behaviorally. Most models have 5-15% Type I errors. **ProBERT: 0**.
+**What this means:** ProBERT doesn't just predict accurately, it predicts *consistently and coherently* across different wordings of the same input. When combined with perturbation testing, you get a complete picture of model reliability.
+**Evaluation Transparency:**
+| Component | Status |
+|-----------|--------|
+| Metric definitions (CI, SRI, Type I) | Open (see case study) |
+| Perturbation protocol | Proprietary |
+| Evaluation thresholds | Fixed (documented above) |
+| Full methodology | Available via evaluation services |
+### What It Does
+ProBERT classifies text into three patterns:
+| Class | Description | Example |
+|-------|-------------|---------|
+| **process_clarity** | Step-by-step, testable reasoning | "Step 1: Check input. Step 2: Validate schema. If invalid, return error." |
+| **rhetorical_confidence** | Authority without process | "This revolutionary approach will transform your business and guarantee results." |
+| **scope_blur** | Vague generalizations | "Trust your intuition and embrace the journey. The universe has a plan." |
+**Important:** ProBERT flags `rhetorical_confidence` as a **risk signal, not a truth judgment**. Some domains (executive summaries, medical conclusions, legal holdings) legitimately require confident language without step-by-step exposition. Context determines appropriateness—ProBERT provides the signal, you provide the judgment.
+### Quick Start
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+model = AutoModelForSequenceClassification.from_pretrained("collapseindex/ProBERT-1.0")
+tokenizer = AutoTokenizer.from_pretrained("collapseindex/ProBERT-1.0")
+text = "This revolutionary AI will transform your business"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
+outputs = model(**inputs)
+probs = torch.softmax(outputs.logits, dim=1)[0]
+# [process_clarity, rhetorical_confidence, scope_blur]
+print(f"Scores: {probs}")
+# → rhetorical_confidence will be highest (~0.67)
+```
+### Proposed Use Cases
+**Safety & Compliance:**
+1. **LLM Output Validation**: Flag when your model makes assertions without showing its work
+2. **Medical/Legal Documentation**: Detect confident claims without explicit reasoning (liability risk)
+3. **Prompt Injection Detection**: Catch authority-without-reasoning attempts to override system instructions
+4. **Regulatory Filing Review**: Ensure procedures documented with *how*, not just mandates
+**Output Quality:**
+5. **LLM Output Filtering**: Keep only high-clarity responses, reject rhetorical patterns
+6. **Chatbot Moderation**: Flag confident hallucinations before deployment
+7. **Customer Support Grading**: Distinguish confident-but-vague responses from clear solutions
+8. **Grant/Research Proposal Screening**: Detect overclaims without methodology
+**Data & Training:**
+9. **Training Data Cleaning**: Filter instruction datasets for process-driven examples only
+10. **Synthetic Data Detection**: ML-generated text has rhetorical patterns + no process chain
+11. **Code Review Automation**: Flag comments that are rhetorical vs genuinely explanatory
+12. **Resume Parsing**: Detect buzzword-heavy claims vs specific accomplishments
+**Measurement & Comparison:**
+13. **Safety Benchmarking**: Compare models on their ability to avoid Type I failures
+14. **CI Stability Anchor**: Combine with behavior metrics (ProBERT scores + perturbation tests = definitive Type I measurement)
+### License
+**Collapse Index Open Model License v1.0** - A permissive license designed to maximize adoption while protecting methodology and evaluation claims.
+**What you CAN do (no cost, no permission needed):**
+- ✅ Use commercially (including SaaS, products, internal tools)
+- ✅ Create derivatives (fine-tune, distill, ensemble, etc.)
+- ✅ Distribute and redistribute (including modified versions)
+- ✅ Use for research, education, or personal projects
+**What you MUST do:**
+- 📝 **Attribution**: Include "Built with ProBERT™" in documentation/UI
+- 📝 Provide copyright notice and link to license
+**What you CAN'T do without authorization:**
+- ❌ Claim "Collapse Index validated" or "CI-evaluated" without providing validation data OR obtaining official evaluation services
+- ❌ Remove or bypass safety/calibration mechanisms
+- ❌ Use ProBERT™, Collapse Index™, or Type I Ghost Detection™ trademarks to imply endorsement
+**License terminates if you:**
+- Sue us for patent infringement
+- Remove safety mechanisms from the model
+- Make false evaluation claims
+**Key Protection:** The license is permissive (like Apache 2.0) for model use, but protects the **Collapse Index evaluation methodology**. You can train derivatives freely, but can't claim they're "Type I ghost validated" without backing it up.
+**Full license text:** [LICENSE.md](LICENSE.md)
+### Citation
+```bibtex
+@software{kwon2026probert,
+  author       = {Kwon, Alex},
+  title        = {ProBERT: Process-First BERT for Rhetorical Confidence Detection},
+  version      = {1.0},
+  year         = {2026},
+  month        = jan,
+  note         = {66M-parameter specialist achieving 95.6\% accuracy with zero Type I ghosts},
+  url          = {https://huggingface.co/collapseindex/ProBERT-1.0},
+  orcid        = {0009-0002-2566-5538},
+}
+```
+### Attributions
+**ProBERT** is built on [DistilBERT](https://github.com/huggingface/transformers), which is distributed under the Apache 2.0 license. See [ATTRIBUTIONS.md](ATTRIBUTIONS.md) for full license text.
+### Design Choices
+**Why Synthetic Training?**
+Modern datasets are contaminated. Real LinkedIn posts have been through GPT/Claude. Customer support tickets got the "AI improve this" treatment. Grant proposals use the ChatGPT rewrite button. Research papers get polished by Anthropic's writing assistant.
+Training on clean synthetic data means ProBERT learned *actual rhetorical patterns*, not LLM artifacts. So when it detects `rhetorical_confidence`, you're getting signal about genuine overconfident reasoning—not just "this smells like ChatGPT polished it."
+**The upside**: Clean signal, zero LLM contamination, measures what matters.
+**The tradeoff**: May not generalize perfectly to highly domain-specific professional jargon (but that's a feature, not a bug—domain-specific jargon *should* be validated separately).
+### Limitations
+- **English only**: Trained on English text patterns
+- **128 token max**: Longer documents will be truncated
+- **3 classes**: Fine-grained pattern distinction within these categories not available
+### Maintenance & Updates
+ProBERT-1.0 is production frozen.
+- **Bug reports** - Submit via [GitHub issues](https://github.com/collapseindex/ProBERT-1.0/issues)
+- **Feature requests** - Accepted but evaluated for ProBERT-2.0 planning
+- **Updates cadence** - Quarterly or as-needed for critical fixes
+- **Versions** - All versions available on HuggingFace with full changelogs
+ProBERT prioritizes stability over rapid iteration. Once deployed, you can trust the weights won't change unexpectedly.
+**Versioning:**
+- **ProBERT-1.0** - You are here (frozen)
+- **ProBERT-1.1** - Bug fixes + minor improvements (if needed)
+- **ProBERT-2.0** - Major retraining (multilingual, larger dataset, new architecture)
+**About Derivatives & Model Evaluation:**
+Planning to fine-tune ProBERT or improve your own model? We recommend validating on Collapse Index stability metrics, a methodology that measures Type I ghosts, coherence degradation, and behavioral stability.
+**[Get your training evaluated](https://collapseindex.org/evals.html)** - Whether you're fine-tuning ProBERT, benchmarking your own model, or validating a derivative, we offer custom evaluation using the same proprietary methodology that validated ProBERT.
+### Contact and Resources
+**Collapse Index Labs**
+For safety teams, research institutions, or labs building Type I ghost detection into your pipeline:
+**ask@collapseindex.org**
+**Case Study**: https://collapseindex.org/case-studies/template.html?s=probert-case-study
+**GitHub**: https://github.com/collapseindex/ProBERT-1.0
+**HuggingFace**: https://huggingface.co/collapseindex/ProBERT-1.0
+**Website**: https://collapseindex.org/
+### Support
+ProBERT is free and open-source. If you find it useful, consider supporting continued development:
+**[☕ Buy me a coffee](https://ko-fi.com/collapseindex)** - Help fund ProBERT maintenance and future versions.
+---