collapseindex commited on
Commit
02ebf5a
·
verified ·
1 Parent(s): 14a0518

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +315 -5
README.md CHANGED
@@ -1,5 +1,315 @@
1
- ---
2
- license: other
3
- license_name: collapse-index-open-model-license
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: collapse-index-open-model-license
4
+ license_link: LICENSE.md
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ tags:
9
+ - text-classification
10
+ - distilbert
11
+ - rhetorical-confidence
12
+ - behavioral-stability
13
+ - type-i-ghost-detection
14
+ - ai-safety
15
+ base_model: distilbert-base-uncased
16
+ datasets:
17
+ - synthetic
18
+ metrics:
19
+ - accuracy
20
+ - f1
21
+ pipeline_tag: text-classification
22
+ ---
23
+
24
+ # ProBERT v1.0
25
+
26
+ ![ProBERT Banner](probertbanner.png)
27
+
28
+ ## What ProBERT Does
29
+
30
+ **Detects rhetorical overconfidence in text.**
31
+
32
+ ProBERT classifies text into three patterns:
33
+ - ✅ **process_clarity** - Step-by-step reasoning you can verify
34
+ - ⚠️ **rhetorical_confidence** - Assertive claims without supporting process
35
+ - 🔄 **scope_blur** - Vague generalizations with ambiguous boundaries
36
+
37
+ Use it to flag risky language in LLM outputs, documentation, support tickets, or any text where confident assertions without reasoning could cause problems.
38
+
39
+ **Why safety teams care:** When you evaluate ProBERT itself under perturbation testing (the Collapse Index protocol), it exhibits **zero Type I errors**—predictions that are stable, confident, and wrong. Most models have 5-15% Type I errors. ProBERT: 0. This makes it a reliable signal for downstream safety systems.
40
+
41
+ ---
42
+
43
+ ## Table of Contents
44
+
45
+ - [Model Card](#model-card)
46
+ - [Model Details](#model-details)
47
+ - [Performance](#performance)
48
+ - [Metrics Explained](#metrics-explained)
49
+ - [What It Does](#what-it-does)
50
+ - [Quick Start](#quick-start)
51
+ - [Proposed Use Cases](#proposed-use-cases)
52
+ - [Design Choices](#design-choices)
53
+ - [Limitations](#limitations)
54
+ - [Maintenance & Updates](#maintenance--updates)
55
+ - [License](#license)
56
+ - [Citation](#citation)
57
+ - [Attributions](#attributions)
58
+ - [About Derivatives & Model Evaluation](#about-derivatives--model-evaluation)
59
+ - [Contact and Resources](#contact-and-resources)
60
+ - [Support](#support)
61
+
62
+ ---
63
+
64
+ ## Model Card
65
+
66
+ **ProBERT v1.0**
67
+
68
+ A 66M-parameter DistilBERT specialist trained to detect rhetorical overconfidence patterns. Fast, stable, and ready for production.
69
+
70
+ ### Model Details
71
+
72
+ - **Model Type**: DistilBERT-based sequence classifier
73
+ - **Parameters**: 66M (runs on CPU, no GPU required)
74
+ - **Inference Speed**: ~30ms per sample on CPU (Intel i5, 8GB), <5ms on GPU
75
+ - **Memory**: <500MB RAM required
76
+ - **Classes**: 3 (process_clarity, rhetorical_confidence, scope_blur)
77
+ - **License**: Collapse Index Open Model License v1.0 (permissive use + attribution)
78
+ - **Released**: January 31, 2026
79
+ - **SHA256**: `288520E28AEC14D1BFA2474E2694CAF612070DCA839AAECDA3B95F12FE418A11`
80
+
81
+ **Deployment-Ready:** No A100 clusters, no multi-GPU setups, no waiting. Deploy on a basic server, edge device, or even in-browser with ONNX. Production inference costs pennies.
82
+
83
+ ### Performance
84
+
85
+ | Metric | Score |
86
+ |--------|-------|
87
+ | Test Accuracy | 95.6% |
88
+ | Macro F1 | 0.955 |
89
+ | Collapse Index (CI) — Behavioral Stability | 0.003 |
90
+ | Structural Retention (SRI) — Decision Coherence | 0.997 |
91
+ | Type I Errors (Stable + Confident + Wrong) | 0 |
92
+
93
+ ### Baseline Comparison: ProBERT vs. Vanilla DistilBERT
94
+
95
+ **The Question:** Is ProBERT just a renamed DistilBERT, or did training actually matter?
96
+
97
+ **The Test:** ProBERT (trained specialist) vs. vanilla DistilBERT with a **random 3-class classification head** (untrained baseline) on three real-world datasets (zero-shot, no fine-tuning):
98
+
99
+ | Dataset | Domain | ProBERT Conf | Base Conf | Agreement | Training Impact |
100
+ |---------|--------|--------------|-----------|-----------|-----------------|
101
+ | **Python Code** | Clear technical | 0.744 | 0.359 | **94%** | 2x confidence boost - Base has weak signal, ProBERT makes it decisive |
102
+ | **Dolly-15k** | Mixed instructions | 0.413 | 0.361 | **43%** | Pattern recognition - Training teaches structure on general content |
103
+ | **Yelp Reviews** | Ambiguous narrative | 0.412 | 0.356 | **16%** | Essential learning - Base completely lost, ProBERT learned the pattern |
104
+
105
+ ### The Progression (94% → 43% → 16%)
106
+
107
+ **Training matters MORE as content gets more ambiguous:**
108
+ - **Clear signal (Python code):** Base model's embeddings capture some structure (94% agreement), but ProBERT doubles confidence (0.74 vs 0.36) and eliminates confusion
109
+ - **Mixed content (Dolly-15k):** Moderate disagreement (43%) shows training teaches pattern recognition beyond embeddings alone
110
+ - **Ambiguous narratives (Yelp):** Massive disagreement (16%) proves training essential - base model predicts randomly, ProBERT learned scope_blur pattern
111
+
112
+ **Key Findings:**
113
+ 1. **ProBERT is demonstrably different from base DistilBERT** - This isn't a renamed model, the training generalized perfectly from synthetic data to real-world domains
114
+ 2. **Self-calibrating confidence** - High confidence (0.74) on clear signals, low confidence (0.40) on ambiguous data, no retraining required
115
+ 3. **Training impact scales with ambiguity** - On content where base models fail (16% agreement), ProBERT's training made the difference
116
+
117
+ ### Metrics Explained
118
+
119
+ **Standard Metrics:**
120
+ - **Test Accuracy (95.6%)**: Correct predictions on held-out test set
121
+ - **Macro F1 (0.955)**: Balanced performance across all three classes
122
+
123
+ **Behavioral Stability Metrics (Collapse Index Protocol):**
124
+
125
+ - **Collapse Index (CI)**: Measures prediction stability under benign perturbations (typos, reformatting, synonyms). Lower is better.
126
+ - CI ≤ 0.15 = Stable ✅
127
+ - CI > 0.45 = Unstable ⚠️
128
+ - **ProBERT: 0.003** (near-perfect stability)
129
+
130
+ - **Structural Retention Index (SRI)**: Measures decision coherence—whether the model holds its reasoning structure across input variants. Higher is better.
131
+ - SRI ≥ 0.85 = Good coherence ✅
132
+ - SRI < 0.40 = Breakdown 🚨
133
+ - **ProBERT: 0.997** (excellent coherence)
134
+
135
+ - **Type I Errors**: Predictions that are stable (low CI), confident (high probability), but **wrong**. These are dangerous because they look like correct predictions behaviorally. Most models have 5-15% Type I errors. **ProBERT: 0**.
136
+
137
+ **What this means:** ProBERT doesn't just predict accurately, it predicts *consistently and coherently* across different wordings of the same input. When combined with perturbation testing, you get a complete picture of model reliability.
138
+
139
+ **Evaluation Transparency:**
140
+
141
+ | Component | Status |
142
+ |-----------|--------|
143
+ | Metric definitions (CI, SRI, Type I) | Open (see case study) |
144
+ | Perturbation protocol | Proprietary |
145
+ | Evaluation thresholds | Fixed (documented above) |
146
+ | Full methodology | Available via evaluation services |
147
+
148
+ ### What It Does
149
+
150
+ ProBERT classifies text into three patterns:
151
+
152
+ | Class | Description | Example |
153
+ |-------|-------------|---------|
154
+ | **process_clarity** | Step-by-step, testable reasoning | "Step 1: Check input. Step 2: Validate schema. If invalid, return error." |
155
+ | **rhetorical_confidence** | Authority without process | "This revolutionary approach will transform your business and guarantee results." |
156
+ | **scope_blur** | Vague generalizations | "Trust your intuition and embrace the journey. The universe has a plan." |
157
+
158
+ **Important:** ProBERT flags `rhetorical_confidence` as a **risk signal, not a truth judgment**. Some domains (executive summaries, medical conclusions, legal holdings) legitimately require confident language without step-by-step exposition. Context determines appropriateness—ProBERT provides the signal, you provide the judgment.
159
+
160
+ ### Quick Start
161
+
162
+ ```python
163
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
164
+ import torch
165
+
166
+ model = AutoModelForSequenceClassification.from_pretrained("collapseindex/ProBERT-1.0")
167
+ tokenizer = AutoTokenizer.from_pretrained("collapseindex/ProBERT-1.0")
168
+
169
+ text = "This revolutionary AI will transform your business"
170
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
171
+ outputs = model(**inputs)
172
+ probs = torch.softmax(outputs.logits, dim=1)[0]
173
+
174
+ # [process_clarity, rhetorical_confidence, scope_blur]
175
+ print(f"Scores: {probs}")
176
+ # → rhetorical_confidence will be highest (~0.67)
177
+ ```
178
+
179
+ ### Proposed Use Cases
180
+
181
+ **Safety & Compliance:**
182
+
183
+ 1. **LLM Output Validation**: Flag when your model makes assertions without showing its work
184
+ 2. **Medical/Legal Documentation**: Detect confident claims without explicit reasoning (liability risk)
185
+ 3. **Prompt Injection Detection**: Catch authority-without-reasoning attempts to override system instructions
186
+ 4. **Regulatory Filing Review**: Ensure procedures documented with *how*, not just mandates
187
+
188
+ **Output Quality:**
189
+
190
+ 5. **LLM Output Filtering**: Keep only high-clarity responses, reject rhetorical patterns
191
+ 6. **Chatbot Moderation**: Flag confident hallucinations before deployment
192
+ 7. **Customer Support Grading**: Distinguish confident-but-vague responses from clear solutions
193
+ 8. **Grant/Research Proposal Screening**: Detect overclaims without methodology
194
+
195
+ **Data & Training:**
196
+
197
+ 9. **Training Data Cleaning**: Filter instruction datasets for process-driven examples only
198
+ 10. **Synthetic Data Detection**: ML-generated text has rhetorical patterns + no process chain
199
+ 11. **Code Review Automation**: Flag comments that are rhetorical vs genuinely explanatory
200
+ 12. **Resume Parsing**: Detect buzzword-heavy claims vs specific accomplishments
201
+
202
+ **Measurement & Comparison:**
203
+
204
+ 13. **Safety Benchmarking**: Compare models on their ability to avoid Type I failures
205
+ 14. **CI Stability Anchor**: Combine with behavior metrics (ProBERT scores + perturbation tests = definitive Type I measurement)
206
+
207
+ ### License
208
+
209
+ **Collapse Index Open Model License v1.0** - A permissive license designed to maximize adoption while protecting methodology and evaluation claims.
210
+
211
+ **What you CAN do (no cost, no permission needed):**
212
+ - ✅ Use commercially (including SaaS, products, internal tools)
213
+ - ✅ Create derivatives (fine-tune, distill, ensemble, etc.)
214
+ - ✅ Distribute and redistribute (including modified versions)
215
+ - ✅ Use for research, education, or personal projects
216
+
217
+ **What you MUST do:**
218
+ - 📝 **Attribution**: Include "Built with ProBERT™" in documentation/UI
219
+ - 📝 Provide copyright notice and link to license
220
+
221
+ **What you CAN'T do without authorization:**
222
+ - ❌ Claim "Collapse Index validated" or "CI-evaluated" without providing validation data OR obtaining official evaluation services
223
+ - ❌ Remove or bypass safety/calibration mechanisms
224
+ - ❌ Use ProBERT™, Collapse Index™, or Type I Ghost Detection™ trademarks to imply endorsement
225
+
226
+ **License terminates if you:**
227
+ - Sue us for patent infringement
228
+ - Remove safety mechanisms from the model
229
+ - Make false evaluation claims
230
+
231
+ **Key Protection:** The license is permissive (like Apache 2.0) for model use, but protects the **Collapse Index evaluation methodology**. You can train derivatives freely, but can't claim they're "Type I ghost validated" without backing it up.
232
+
233
+ **Full license text:** [LICENSE.md](LICENSE.md)
234
+
235
+ ### Citation
236
+
237
+ ```bibtex
238
+ @software{kwon2026probert,
239
+ author = {Kwon, Alex},
240
+ title = {ProBERT: Process-First BERT for Rhetorical Confidence Detection},
241
+ version = {1.0},
242
+ year = {2026},
243
+ month = jan,
244
+ note = {66M-parameter specialist achieving 95.6\% accuracy with zero Type I ghosts},
245
+ url = {https://huggingface.co/collapseindex/ProBERT-1.0},
246
+ orcid = {0009-0002-2566-5538},
247
+ }
248
+ ```
249
+
250
+ ### Attributions
251
+
252
+ **ProBERT** is built on [DistilBERT](https://github.com/huggingface/transformers), which is distributed under the Apache 2.0 license. See [ATTRIBUTIONS.md](ATTRIBUTIONS.md) for full license text.
253
+
254
+ ### Design Choices
255
+
256
+ **Why Synthetic Training?**
257
+
258
+ Modern datasets are contaminated. Real LinkedIn posts have been through GPT/Claude. Customer support tickets got the "AI improve this" treatment. Grant proposals use the ChatGPT rewrite button. Research papers get polished by Anthropic's writing assistant.
259
+
260
+ Training on clean synthetic data means ProBERT learned *actual rhetorical patterns*, not LLM artifacts. So when it detects `rhetorical_confidence`, you're getting signal about genuine overconfident reasoning—not just "this smells like ChatGPT polished it."
261
+
262
+ **The upside**: Clean signal, zero LLM contamination, measures what matters.
263
+ **The tradeoff**: May not generalize perfectly to highly domain-specific professional jargon (but that's a feature, not a bug—domain-specific jargon *should* be validated separately).
264
+
265
+ ### Limitations
266
+
267
+ - **English only**: Trained on English text patterns
268
+ - **128 token max**: Longer documents will be truncated
269
+ - **3 classes**: Fine-grained pattern distinction within these categories not available
270
+
271
+ ### Maintenance & Updates
272
+
273
+ ProBERT-1.0 is production frozen.
274
+
275
+ - **Bug reports** - Submit via [GitHub issues](https://github.com/collapseindex/ProBERT-1.0/issues)
276
+ - **Feature requests** - Accepted but evaluated for ProBERT-2.0 planning
277
+ - **Updates cadence** - Quarterly or as-needed for critical fixes
278
+ - **Versions** - All versions available on HuggingFace with full changelogs
279
+
280
+ ProBERT prioritizes stability over rapid iteration. Once deployed, you can trust the weights won't change unexpectedly.
281
+
282
+ **Versioning:**
283
+ - **ProBERT-1.0** - You are here (frozen)
284
+ - **ProBERT-1.1** - Bug fixes + minor improvements (if needed)
285
+ - **ProBERT-2.0** - Major retraining (multilingual, larger dataset, new architecture)
286
+
287
+ **About Derivatives & Model Evaluation:**
288
+
289
+ Planning to fine-tune ProBERT or improve your own model? We recommend validating on Collapse Index stability metrics, a methodology that measures Type I ghosts, coherence degradation, and behavioral stability.
290
+
291
+ **[Get your training evaluated](https://collapseindex.org/evals.html)** - Whether you're fine-tuning ProBERT, benchmarking your own model, or validating a derivative, we offer custom evaluation using the same proprietary methodology that validated ProBERT.
292
+
293
+ ### Contact and Resources
294
+
295
+ **Collapse Index Labs**
296
+
297
+ For safety teams, research institutions, or labs building Type I ghost detection into your pipeline:
298
+
299
+ **ask@collapseindex.org**
300
+
301
+ **Case Study**: https://collapseindex.org/case-studies/template.html?s=probert-case-study
302
+
303
+ **GitHub**: https://github.com/collapseindex/ProBERT-1.0
304
+
305
+ **HuggingFace**: https://huggingface.co/collapseindex/ProBERT-1.0
306
+
307
+ **Website**: https://collapseindex.org/
308
+
309
+ ### Support
310
+
311
+ ProBERT is free and open-source. If you find it useful, consider supporting continued development:
312
+
313
+ **[☕ Buy me a coffee](https://ko-fi.com/collapseindex)** - Help fund ProBERT maintenance and future versions.
314
+
315
+ ---