ProBERT-1.0 / README.md

collapseindex

Update README.md

ad8c0fb verified 3 days ago

preview code

raw

history blame contribute delete

17.8 kB

metadata

license: other
license_name: collapse-index-open-model-license
license_link: LICENSE.md
language:
  - en
library_name: transformers
tags:
  - text-classification
  - distilbert
  - rhetorical-confidence
  - behavioral-stability
  - type-i-ghost-detection
  - ai-safety
base_model: distilbert-base-uncased
datasets:
  - synthetic
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification

ProBERT v1.0

What ProBERT Does

Detects rhetorical overconfidence in text.

ProBERT classifies text into three patterns:

✅ process_clarity - Step-by-step reasoning you can verify
⚠️ rhetorical_confidence - Assertive claims without supporting process
🔄 scope_blur - Vague generalizations with ambiguous boundaries

Use it to flag risky language in LLM outputs, documentation, support tickets, or any text where confident assertions without reasoning could cause problems.

Trained on just 450 examples (150 per class). Achieves 95.6% accuracy and shows strong transfer signals to real-world domains it never saw during training. When tested on Yelp reviews, ProBERT and untrained base DistilBERT disagree 84% of the time—suggesting the training added real capability, not just noise.

Why safety teams care: When you evaluate ProBERT itself under perturbation testing (the Collapse Index protocol), it exhibits zero Type I errors—predictions that are stable, confident, and wrong. Most models have 5-15% Type I errors. ProBERT: 0. Additionally, ProBERT is underconfident by design: 98.4% accuracy but only 72.5% mean confidence (26% confidence deficit). This conservative calibration means it doubts itself when right rather than being cocky when wrong—the safe direction for production deployments.

Model Card
Proposed Use Cases
Design Choices
Limitations
Maintenance & Updates
License
Citation
Attributions
About Derivatives & Model Evaluation
Contact and Resources
Support

Model Card

ProBERT v1.0

A 66M-parameter DistilBERT specialist trained to detect rhetorical overconfidence patterns. Fast, stable, and ready for production.

Model Details

Model Type: DistilBERT-based sequence classifier
Parameters: 66M (runs on CPU, no GPU required)
Training Data: 450 examples (150 per class, synthetic)
Inference Speed: ~30ms per sample on CPU (Intel i5, 8GB), <5ms on GPU
Memory: <500MB RAM required
Classes: 3 (process_clarity, rhetorical_confidence, scope_blur)
License: Collapse Index Open Model License v1.0 (permissive use + attribution)
Released: January 31, 2026
SHA256: 288520E28AEC14D1BFA2474E2694CAF612070DCA839AAECDA3B95F12FE418A11

Deployment-Ready: No A100 clusters, no multi-GPU setups, no waiting. Deploy on a basic server, edge device, or even in-browser with ONNX. Production inference costs pennies.

Performance

Metric	Score
Test Accuracy	95.6%
Macro F1	0.955
Collapse Index (CI) — Behavioral Stability	0.003
Structural Retention (SRI) — Decision Coherence	0.997
Type I Errors (Stable + Confident + Wrong)	0
Calibration (ECE)	0.263
Confidence on Errors	0.673 max, 0.0% ≥ 0.8
Accuracy vs Mean Confidence	98.4% acc, 72.5% conf (-26% gap)

Training Impact Demonstration (Unlabeled Transfer)

The Question: Is ProBERT just a renamed DistilBERT, or did training actually matter?

The Test: ProBERT trained on 450 synthetic examples vs. vanilla DistilBERT with a random 3-class classification head (untrained baseline). Both tested on three real-world datasets they never saw during training (zero-shot, no fine-tuning).

Important: These datasets are unlabeled for this 3-class taxonomy (process_clarity, rhetorical_confidence, scope_blur). We report confidence and agreement as behavioral transfer signals, not accuracy. No ground-truth labels exist, so no accuracy/F1/ECE can be computed per domain. Calibration (ECE) and accuracy are measured on the labeled 450-sample test split.

Dataset	Domain	Mean max-prob (ProBERT, unlabeled)	Mean max-prob (Base, unlabeled)	Top-1 agreement (unlabeled)	Training Impact
Python Code	Clear technical	0.744	0.359	94%	2x confidence boost - Base has weak signal, ProBERT makes it decisive
Dolly-15k	Mixed instructions	0.413	0.361	43%	Pattern recognition - Training teaches structure on general content
Yelp Reviews	Ambiguous narrative	0.412	0.356	16%	Essential learning - Base completely lost, ProBERT learned the pattern

The Progression (94% → 43% → 16%)

Training matters MORE as content gets more ambiguous:

Clear signal (Python code): Base model's embeddings capture some structure (94% agreement), but ProBERT doubles confidence (0.74 vs 0.36) and eliminates confusion
Mixed content (Dolly-15k): Moderate disagreement (43%) shows training teaches pattern recognition beyond embeddings alone
Ambiguous narratives (Yelp): Massive disagreement (16%) suggests training essential - base model predicts randomly, ProBERT learned scope_blur pattern

Key Findings:

ProBERT is demonstrably different from base DistilBERT - This isn't a renamed model. 450 synthetic examples produced strong behavioral transfer to completely unseen real-world domains
Insane data efficiency - From 450 training examples to 84% disagreement with base model on Yelp (16% agreement = base is clueless, ProBERT learned the pattern)
Self-calibrating confidence - High confidence (0.74) on clear signals, low confidence (0.40) on ambiguous data, no retraining required
Training impact scales with ambiguity - On content where base models fail (16% agreement), ProBERT's training made the difference

Metrics Explained

Standard Metrics:

Test Accuracy (95.6%): Correct predictions on held-out test set
Macro F1 (0.955): Balanced performance across all three classes

Behavioral Stability Metrics (Collapse Index Protocol):

Collapse Index (CI): Measures prediction stability under benign perturbations (typos, reformatting, synonyms). Lower is better.
- CI ≤ 0.15 = Stable ✅
- CI > 0.45 = Unstable ⚠️
- ProBERT: 0.003 (near-perfect stability)
Structural Retention Index (SRI): Measures decision coherence—whether the model holds its reasoning structure across input variants. Higher is better.
- SRI ≥ 0.85 = Good coherence ✅
- SRI < 0.40 = Breakdown 🚨
- ProBERT: 0.997 (excellent coherence)
Type I Errors: Predictions that are stable (low CI), confident (high probability), but wrong. These are dangerous because they look like correct predictions behaviorally. Most models have 5-15% Type I errors. ProBERT: 0.

Calibration Metrics (Post-Training Validation):

ECE (Expected Calibration Error): Measures alignment between confidence and accuracy. Range 0-1, lower is better.
- ECE ≤ 0.05 = Well-calibrated ✅
- ECE > 0.15 = Miscalibrated ⚠️
- ProBERT: 0.263 (high, but from underconfidence—see below)
Confidence Gap: Accuracy minus mean confidence. Positive gap = underconfident (too cautious), negative gap = overconfident (too bold).
- ProBERT: +26% (98.4% accuracy, 72.5% mean confidence)
- Model doubts itself when RIGHT, not cocky when wrong
High-Confidence Error Rate: What % of high-confidence predictions (≥0.8) are wrong?
- Most models: 3-10% errors at high confidence
- ProBERT: 0% (perfect separation—all 7 errors below 0.7 confidence)

What this means: ProBERT doesn't just predict accurately, it predicts consistently and coherently across different wordings of the same input. The high ECE comes from conservative calibration—ProBERT is less confident than it should be, which makes it safer for production (won't confidently output wrong answers). When combined with perturbation testing, you get a complete picture of model reliability.

Transparency Note: Current calibration metrics measured on synthetic test set (450 samples). For production use cases requiring OOD calibration validation, we recommend evaluating on domain-specific held-out data to confirm conservative calibration holds.

Reproducibility:

All calibration metrics can be reproduced using the included evaluation script:

# Auto-detect mode (uses defaults)
python eval_calibration.py --probert

# Explicit paths (for custom locations)
python eval_calibration.py \
  --model_dir probert_model \
  --csv probert_training_20260131_004706.csv

The --probert flag auto-detects the model directory and latest predictions CSV. The script computes ECE, confidence gaps, and high-confidence error rates. Full source included in the model repository for transparency.

Evaluation Transparency:

Component	Status
Metric definitions (CI, SRI, Type I)	Open (see case study)
Perturbation protocol	Proprietary
Evaluation thresholds	Fixed (documented above)
Full methodology	Available via evaluation services

What It Does

ProBERT classifies text into three patterns:

Class	Description	Example
process_clarity	Step-by-step, testable reasoning	"Step 1: Check input. Step 2: Validate schema. If invalid, return error."
rhetorical_confidence	Authority without process	"This revolutionary approach will transform your business and guarantee results."
scope_blur	Vague generalizations	"Trust your intuition and embrace the journey. The universe has a plan."

Important: ProBERT flags rhetorical_confidence as a risk signal, not a truth judgment. Some domains (executive summaries, medical conclusions, legal holdings) legitimately require confident language without step-by-step exposition. Context determines appropriateness—ProBERT provides the signal, you provide the judgment.

Quick Start

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained("collapseindex/ProBERT-1.0")
tokenizer = AutoTokenizer.from_pretrained("collapseindex/ProBERT-1.0")

text = "This revolutionary AI will transform your business"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)[0]

# [process_clarity, rhetorical_confidence, scope_blur]
print(f"Scores: {probs}")
# → rhetorical_confidence will be highest (~0.67)

Proposed Use Cases

Safety & Compliance:

LLM Output Validation: Flag when your model makes assertions without showing its work
Medical/Legal Documentation: Detect confident claims without explicit reasoning (liability risk)
Prompt Injection Detection: Catch authority-without-reasoning attempts to override system instructions
Regulatory Filing Review: Ensure procedures documented with how, not just mandates

Output Quality:

LLM Output Filtering: Keep only high-clarity responses, reject rhetorical patterns
Chatbot Moderation: Flag confident hallucinations before deployment
Customer Support Grading: Distinguish confident-but-vague responses from clear solutions
Grant/Research Proposal Screening: Detect overclaims without methodology

Data & Training:

Training Data Cleaning: Filter instruction datasets for process-driven examples only
Synthetic Data Detection: ML-generated text has rhetorical patterns + no process chain
Code Review Automation: Flag comments that are rhetorical vs genuinely explanatory
Resume Parsing: Detect buzzword-heavy claims vs specific accomplishments

Measurement & Comparison:

Safety Benchmarking: Compare models on their ability to avoid Type I failures
CI Stability Anchor: Combine with behavior metrics (ProBERT scores + perturbation tests = definitive Type I measurement)

License

Collapse Index Open Model License v1.0 - A permissive license designed to maximize adoption while protecting methodology and evaluation claims.

What you CAN do (no cost, no permission needed):

✅ Use commercially (including SaaS, products, internal tools)
✅ Create derivatives (fine-tune, distill, ensemble, etc.)
✅ Distribute and redistribute (including modified versions)
✅ Use for research, education, or personal projects

What you MUST do:

📝 Attribution: Include "Built with ProBERT™" in documentation/UI
📝 Provide copyright notice and link to license

What you CAN'T do without authorization:

❌ Claim "Collapse Index validated" or "CI-evaluated" without providing validation data OR obtaining official evaluation services
❌ Remove or bypass safety/calibration mechanisms
❌ Use ProBERT™, Collapse Index™, or Type I Ghost Detection™ trademarks to imply endorsement

License terminates if you:

Sue us for patent infringement
Remove safety mechanisms from the model
Make false evaluation claims

Key Protection: The license is permissive (like Apache 2.0) for model use, but protects the Collapse Index evaluation methodology. You can train derivatives freely, but can't claim they're "Type I ghost validated" without backing it up.

Full license text: LICENSE.md

Citation

@software{kwon2026probert,
  author       = {Kwon, Alex},
  title        = {ProBERT: Process-First BERT for Rhetorical Confidence Detection},
  version      = {1.0},
  year         = {2026},
  month        = jan,
  note         = {66M-parameter specialist achieving 95.6\% accuracy with zero Type I ghosts},
  url          = {https://huggingface.co/collapseindex/ProBERT-1.0},
  orcid        = {0009-0002-2566-5538},
}

Attributions

ProBERT is built on DistilBERT, which is distributed under the Apache 2.0 license. See ATTRIBUTIONS.md for full license text.

Design Choices

Why Synthetic Training?

Modern datasets are contaminated. Real LinkedIn posts have been through GPT/Claude. Customer support tickets got the "AI improve this" treatment. Grant proposals use the ChatGPT rewrite button. Research papers get polished by Anthropic's writing assistant.

Training on clean synthetic data means ProBERT learned actual rhetorical patterns, not LLM artifacts. So when it detects rhetorical_confidence, you're getting signal about genuine overconfident reasoning—not just "this smells like ChatGPT polished it."

The upside: Clean signal, zero LLM contamination, measures what matters. The tradeoff: May not generalize perfectly to highly domain-specific professional jargon (but that's a feature, not a bug—domain-specific jargon should be validated separately).

Limitations

English only: Trained on English text patterns
128 token max: Longer documents will be truncated
3 classes: Fine-grained pattern distinction within these categories not available

Maintenance & Updates

ProBERT-1.0 is production frozen.

Bug reports - Submit via GitHub issues
Feature requests - Accepted but evaluated for ProBERT-2.0 planning
Updates cadence - Quarterly or as-needed for critical fixes
Versions - All versions available on HuggingFace with full changelogs

ProBERT prioritizes stability over rapid iteration. Once deployed, you can trust the weights won't change unexpectedly.

Versioning:

ProBERT-1.0 - You are here (frozen)
ProBERT-1.1 - Bug fixes + minor improvements (if needed)
ProBERT-2.0 - Major retraining (multilingual, larger dataset, new architecture)

About Derivatives & Model Evaluation:

Planning to fine-tune ProBERT or improve your own model? We recommend validating on Collapse Index stability metrics, a methodology that measures Type I ghosts, coherence degradation, and behavioral stability.

Get your training evaluated - Whether you're fine-tuning ProBERT, benchmarking your own model, or validating a derivative, we offer custom evaluation using the same proprietary methodology that validated ProBERT.

Contact and Resources

Collapse Index Labs

For safety teams, research institutions, or labs building Type I ghost detection into your pipeline:

ask@collapseindex.org

Case Study: https://collapseindex.org/case-studies/template.html?s=probert-case-study

GitHub: https://github.com/collapseindex/ProBERT-1.0

HuggingFace: https://huggingface.co/collapseindex/ProBERT-1.0

Website: https://collapseindex.org/

Support

ProBERT is free and open-source. If you find it useful, consider supporting continued development:

☕ Buy me a coffee - Help fund ProBERT maintenance and future versions.

collapseindex
/

ProBERT-1.0