ProBERT-1.0 / README.md

Update README.md

ad8c0fb verified 3 days ago

17.8 kB

	---
	license: other
	license_name: collapse-index-open-model-license
	license_link: LICENSE.md
	language:
	- en
	library_name: transformers
	tags:
	- text-classification
	- distilbert
	- rhetorical-confidence
	- behavioral-stability
	- type-i-ghost-detection
	- ai-safety
	base_model: distilbert-base-uncased
	datasets:
	- synthetic
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	---

	# ProBERT v1.0

	![ProBERT Banner](probertbanner.png)

	## What ProBERT Does

	Detects rhetorical overconfidence in text.

	ProBERT classifies text into three patterns:
	- ✅ process_clarity - Step-by-step reasoning you can verify
	- ⚠️ rhetorical_confidence - Assertive claims without supporting process
	- 🔄 scope_blur - Vague generalizations with ambiguous boundaries

	Use it to flag risky language in LLM outputs, documentation, support tickets, or any text where confident assertions without reasoning could cause problems.

	Trained on just 450 examples (150 per class). Achieves 95.6% accuracy and shows strong transfer signals to real-world domains it never saw during training. When tested on Yelp reviews, ProBERT and untrained base DistilBERT disagree 84% of the time—suggesting the training added real capability, not just noise.

	Why safety teams care: When you evaluate ProBERT itself under perturbation testing (the Collapse Index protocol), it exhibits zero Type I errors—predictions that are stable, confident, and wrong. Most models have 5-15% Type I errors. ProBERT: 0. Additionally, ProBERT is underconfident by design: 98.4% accuracy but only 72.5% mean confidence (26% confidence deficit). This conservative calibration means it doubts itself when right rather than being cocky when wrong—the safe direction for production deployments.

	---

	## Table of Contents

	- [Model Card](#model-card)
	- [Model Details](#model-details)
	- [Performance](#performance)
	- [Training Impact Demonstration](#training-impact-demonstration-unlabeled-transfer)
	- [Metrics Explained](#metrics-explained)
	- [Reproducibility](#reproducibility)
	- [What It Does](#what-it-does)
	- [Quick Start](#quick-start)
	- [Proposed Use Cases](#proposed-use-cases)
	- [Design Choices](#design-choices)
	- [Limitations](#limitations)
	- [Maintenance & Updates](#maintenance--updates)
	- [License](#license)
	- [Citation](#citation)
	- [Attributions](#attributions)
	- [About Derivatives & Model Evaluation](#about-derivatives--model-evaluation)
	- [Contact and Resources](#contact-and-resources)
	- [Support](#support)

	---

	## Model Card

	ProBERT v1.0

	A 66M-parameter DistilBERT specialist trained to detect rhetorical overconfidence patterns. Fast, stable, and ready for production.

	### Model Details

	- Model Type: DistilBERT-based sequence classifier
	- Parameters: 66M (runs on CPU, no GPU required)
	- Training Data: 450 examples (150 per class, synthetic)
	- Inference Speed: ~30ms per sample on CPU (Intel i5, 8GB), <5ms on GPU
	- Memory: <500MB RAM required
	- Classes: 3 (process_clarity, rhetorical_confidence, scope_blur)
	- License: Collapse Index Open Model License v1.0 (permissive use + attribution)
	- Released: January 31, 2026
	- SHA256: `288520E28AEC14D1BFA2474E2694CAF612070DCA839AAECDA3B95F12FE418A11`

	Deployment-Ready: No A100 clusters, no multi-GPU setups, no waiting. Deploy on a basic server, edge device, or even in-browser with ONNX. Production inference costs pennies.

	### Performance

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Test Accuracy \| 95.6% \|
	\| Macro F1 \| 0.955 \|
	\| Collapse Index (CI) — Behavioral Stability \| 0.003 \|
	\| Structural Retention (SRI) — Decision Coherence \| 0.997 \|
	\| Type I Errors (Stable + Confident + Wrong) \| 0 \|
	\| Calibration (ECE) \| 0.263 \|
	\| Confidence on Errors \| 0.673 max, 0.0% ≥ 0.8 \|
	\| Accuracy vs Mean Confidence \| 98.4% acc, 72.5% conf (-26% gap) \|

	### Training Impact Demonstration (Unlabeled Transfer)

	The Question: Is ProBERT just a renamed DistilBERT, or did training actually matter?

	The Test: ProBERT trained on 450 synthetic examples vs. vanilla DistilBERT with a random 3-class classification head (untrained baseline). Both tested on three real-world datasets they never saw during training (zero-shot, no fine-tuning).

	Important: These datasets are unlabeled for this 3-class taxonomy (process_clarity, rhetorical_confidence, scope_blur). We report confidence and agreement as behavioral transfer signals, not accuracy. No ground-truth labels exist, so no accuracy/F1/ECE can be computed per domain. Calibration (ECE) and accuracy are measured on the labeled 450-sample test split.

	\| Dataset \| Domain \| Mean max-prob (ProBERT, unlabeled) \| Mean max-prob (Base, unlabeled) \| Top-1 agreement (unlabeled) \| Training Impact \|
	\|---------\|--------\|-------------------------------------\|-------------------------------------\|------------------------------\|-----------------\|
	\| Python Code \| Clear technical \| 0.744 \| 0.359 \| 94% \| 2x confidence boost - Base has weak signal, ProBERT makes it decisive \|
	\| Dolly-15k \| Mixed instructions \| 0.413 \| 0.361 \| 43% \| Pattern recognition - Training teaches structure on general content \|
	\| Yelp Reviews \| Ambiguous narrative \| 0.412 \| 0.356 \| 16% \| Essential learning - Base completely lost, ProBERT learned the pattern \|

	### The Progression (94% → 43% → 16%)

	Training matters MORE as content gets more ambiguous:
	- Clear signal (Python code): Base model's embeddings capture some structure (94% agreement), but ProBERT doubles confidence (0.74 vs 0.36) and eliminates confusion
	- Mixed content (Dolly-15k): Moderate disagreement (43%) shows training teaches pattern recognition beyond embeddings alone
	- Ambiguous narratives (Yelp): Massive disagreement (16%) suggests training essential - base model predicts randomly, ProBERT learned scope_blur pattern

	Key Findings:
	1. ProBERT is demonstrably different from base DistilBERT - This isn't a renamed model. 450 synthetic examples produced strong behavioral transfer to completely unseen real-world domains
	2. Insane data efficiency - From 450 training examples to 84% disagreement with base model on Yelp (16% agreement = base is clueless, ProBERT learned the pattern)
	3. Self-calibrating confidence - High confidence (0.74) on clear signals, low confidence (0.40) on ambiguous data, no retraining required
	4. Training impact scales with ambiguity - On content where base models fail (16% agreement), ProBERT's training made the difference

	### Metrics Explained

	Standard Metrics:
	- Test Accuracy (95.6%): Correct predictions on held-out test set
	- Macro F1 (0.955): Balanced performance across all three classes

	Behavioral Stability Metrics (Collapse Index Protocol):

	- Collapse Index (CI): Measures prediction stability under benign perturbations (typos, reformatting, synonyms). Lower is better.
	- CI ≤ 0.15 = Stable ✅
	- CI > 0.45 = Unstable ⚠️
	- ProBERT: 0.003 (near-perfect stability)

	- Structural Retention Index (SRI): Measures decision coherence—whether the model holds its reasoning structure across input variants. Higher is better.
	- SRI ≥ 0.85 = Good coherence ✅
	- SRI < 0.40 = Breakdown 🚨
	- ProBERT: 0.997 (excellent coherence)

	- Type I Errors: Predictions that are stable (low CI), confident (high probability), but wrong. These are dangerous because they look like correct predictions behaviorally. Most models have 5-15% Type I errors. ProBERT: 0.

	Calibration Metrics (Post-Training Validation):

	- ECE (Expected Calibration Error): Measures alignment between confidence and accuracy. Range 0-1, lower is better.
	- ECE ≤ 0.05 = Well-calibrated ✅
	- ECE > 0.15 = Miscalibrated ⚠️
	- ProBERT: 0.263 (high, but from underconfidence—see below)

	- Confidence Gap: Accuracy minus mean confidence. Positive gap = underconfident (too cautious), negative gap = overconfident (too bold).
	- ProBERT: +26% (98.4% accuracy, 72.5% mean confidence)
	- Model doubts itself when RIGHT, not cocky when wrong

	- High-Confidence Error Rate: What % of high-confidence predictions (≥0.8) are wrong?
	- Most models: 3-10% errors at high confidence
	- ProBERT: 0% (perfect separation—all 7 errors below 0.7 confidence)

	What this means: ProBERT doesn't just predict accurately, it predicts consistently and coherently across different wordings of the same input. The high ECE comes from conservative calibration—ProBERT is less confident than it should be, which makes it safer for production (won't confidently output wrong answers). When combined with perturbation testing, you get a complete picture of model reliability.

	Transparency Note: Current calibration metrics measured on synthetic test set (450 samples). For production use cases requiring OOD calibration validation, we recommend evaluating on domain-specific held-out data to confirm conservative calibration holds.

	Reproducibility:

	All calibration metrics can be reproduced using the included evaluation script:

	```bash
	# Auto-detect mode (uses defaults)
	python eval_calibration.py --probert

	# Explicit paths (for custom locations)
	python eval_calibration.py \
	--model_dir probert_model \
	--csv probert_training_20260131_004706.csv
	```

	The `--probert` flag auto-detects the model directory and latest predictions CSV. The script computes ECE, confidence gaps, and high-confidence error rates. Full source included in the model repository for transparency.

	Evaluation Transparency:

	\| Component \| Status \|
	\|-----------\|--------\|
	\| Metric definitions (CI, SRI, Type I) \| Open (see case study) \|
	\| Perturbation protocol \| Proprietary \|
	\| Evaluation thresholds \| Fixed (documented above) \|
	\| Full methodology \| Available via evaluation services \|

	### What It Does

	ProBERT classifies text into three patterns:

	\| Class \| Description \| Example \|
	\|-------\|-------------\|---------\|
	\| process_clarity \| Step-by-step, testable reasoning \| "Step 1: Check input. Step 2: Validate schema. If invalid, return error." \|
	\| rhetorical_confidence \| Authority without process \| "This revolutionary approach will transform your business and guarantee results." \|
	\| scope_blur \| Vague generalizations \| "Trust your intuition and embrace the journey. The universe has a plan." \|

	Important: ProBERT flags `rhetorical_confidence` as a risk signal, not a truth judgment. Some domains (executive summaries, medical conclusions, legal holdings) legitimately require confident language without step-by-step exposition. Context determines appropriateness—ProBERT provides the signal, you provide the judgment.

	### Quick Start

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	model = AutoModelForSequenceClassification.from_pretrained("collapseindex/ProBERT-1.0")
	tokenizer = AutoTokenizer.from_pretrained("collapseindex/ProBERT-1.0")

	text = "This revolutionary AI will transform your business"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=1)[0]

	# [process_clarity, rhetorical_confidence, scope_blur]
	print(f"Scores: {probs}")
	# → rhetorical_confidence will be highest (~0.67)
	```

	### Proposed Use Cases

	Safety & Compliance:

	1. LLM Output Validation: Flag when your model makes assertions without showing its work
	2. Medical/Legal Documentation: Detect confident claims without explicit reasoning (liability risk)
	3. Prompt Injection Detection: Catch authority-without-reasoning attempts to override system instructions
	4. Regulatory Filing Review: Ensure procedures documented with how, not just mandates

	Output Quality:

	5. LLM Output Filtering: Keep only high-clarity responses, reject rhetorical patterns
	6. Chatbot Moderation: Flag confident hallucinations before deployment
	7. Customer Support Grading: Distinguish confident-but-vague responses from clear solutions
	8. Grant/Research Proposal Screening: Detect overclaims without methodology

	Data & Training:

	9. Training Data Cleaning: Filter instruction datasets for process-driven examples only
	10. Synthetic Data Detection: ML-generated text has rhetorical patterns + no process chain
	11. Code Review Automation: Flag comments that are rhetorical vs genuinely explanatory
	12. Resume Parsing: Detect buzzword-heavy claims vs specific accomplishments

	Measurement & Comparison:

	13. Safety Benchmarking: Compare models on their ability to avoid Type I failures
	14. CI Stability Anchor: Combine with behavior metrics (ProBERT scores + perturbation tests = definitive Type I measurement)

	### License

	Collapse Index Open Model License v1.0 - A permissive license designed to maximize adoption while protecting methodology and evaluation claims.

	What you CAN do (no cost, no permission needed):
	- ✅ Use commercially (including SaaS, products, internal tools)
	- ✅ Create derivatives (fine-tune, distill, ensemble, etc.)
	- ✅ Distribute and redistribute (including modified versions)
	- ✅ Use for research, education, or personal projects

	What you MUST do:
	- 📝 Attribution: Include "Built with ProBERT™" in documentation/UI
	- 📝 Provide copyright notice and link to license

	What you CAN'T do without authorization:
	- ❌ Claim "Collapse Index validated" or "CI-evaluated" without providing validation data OR obtaining official evaluation services
	- ❌ Remove or bypass safety/calibration mechanisms
	- ❌ Use ProBERT™, Collapse Index™, or Type I Ghost Detection™ trademarks to imply endorsement

	License terminates if you:
	- Sue us for patent infringement
	- Remove safety mechanisms from the model
	- Make false evaluation claims

	Key Protection: The license is permissive (like Apache 2.0) for model use, but protects the Collapse Index evaluation methodology. You can train derivatives freely, but can't claim they're "Type I ghost validated" without backing it up.

	Full license text: [LICENSE.md](LICENSE.md)

	### Citation

	```bibtex
	@software{kwon2026probert,
	author = {Kwon, Alex},
	title = {ProBERT: Process-First BERT for Rhetorical Confidence Detection},
	version = {1.0},
	year = {2026},
	month = jan,
	note = {66M-parameter specialist achieving 95.6\% accuracy with zero Type I ghosts},
	url = {https://huggingface.co/collapseindex/ProBERT-1.0},
	orcid = {0009-0002-2566-5538},
	}
	```

	### Attributions

	ProBERT is built on [DistilBERT](https://github.com/huggingface/transformers), which is distributed under the Apache 2.0 license. See [ATTRIBUTIONS.md](ATTRIBUTIONS.md) for full license text.

	### Design Choices

	Why Synthetic Training?

	Modern datasets are contaminated. Real LinkedIn posts have been through GPT/Claude. Customer support tickets got the "AI improve this" treatment. Grant proposals use the ChatGPT rewrite button. Research papers get polished by Anthropic's writing assistant.

	Training on clean synthetic data means ProBERT learned actual rhetorical patterns, not LLM artifacts. So when it detects `rhetorical_confidence`, you're getting signal about genuine overconfident reasoning—not just "this smells like ChatGPT polished it."

	The upside: Clean signal, zero LLM contamination, measures what matters.
	The tradeoff: May not generalize perfectly to highly domain-specific professional jargon (but that's a feature, not a bug—domain-specific jargon should be validated separately).

	### Limitations

	- English only: Trained on English text patterns
	- 128 token max: Longer documents will be truncated
	- 3 classes: Fine-grained pattern distinction within these categories not available

	### Maintenance & Updates

	ProBERT-1.0 is production frozen.

	- Bug reports - Submit via [GitHub issues](https://github.com/collapseindex/ProBERT-1.0/issues)
	- Feature requests - Accepted but evaluated for ProBERT-2.0 planning
	- Updates cadence - Quarterly or as-needed for critical fixes
	- Versions - All versions available on HuggingFace with full changelogs

	ProBERT prioritizes stability over rapid iteration. Once deployed, you can trust the weights won't change unexpectedly.

	Versioning:
	- ProBERT-1.0 - You are here (frozen)
	- ProBERT-1.1 - Bug fixes + minor improvements (if needed)
	- ProBERT-2.0 - Major retraining (multilingual, larger dataset, new architecture)

	About Derivatives & Model Evaluation:

	Planning to fine-tune ProBERT or improve your own model? We recommend validating on Collapse Index stability metrics, a methodology that measures Type I ghosts, coherence degradation, and behavioral stability.

	[Get your training evaluated](https://collapseindex.org/evals.html) - Whether you're fine-tuning ProBERT, benchmarking your own model, or validating a derivative, we offer custom evaluation using the same proprietary methodology that validated ProBERT.

	### Contact and Resources

	Collapse Index Labs

	For safety teams, research institutions, or labs building Type I ghost detection into your pipeline:

	ask@collapseindex.org

	Case Study: https://collapseindex.org/case-studies/template.html?s=probert-case-study

	GitHub: https://github.com/collapseindex/ProBERT-1.0

	HuggingFace: https://huggingface.co/collapseindex/ProBERT-1.0

	Website: https://collapseindex.org/

	### Support

	ProBERT is free and open-source. If you find it useful, consider supporting continued development:

	[☕ Buy me a coffee](https://ko-fi.com/collapseindex) - Help fund ProBERT maintenance and future versions.

	---