repro path
Browse files
README.md
CHANGED
|
@@ -167,10 +167,16 @@ Key Findings:
|
|
| 167 |
All calibration metrics can be reproduced using the included evaluation script:
|
| 168 |
|
| 169 |
```bash
|
|
|
|
| 170 |
python eval_calibration.py --probert
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
```
|
| 172 |
|
| 173 |
-
|
| 174 |
|
| 175 |
**Evaluation Transparency:**
|
| 176 |
|
|
|
|
| 167 |
All calibration metrics can be reproduced using the included evaluation script:
|
| 168 |
|
| 169 |
```bash
|
| 170 |
+
# Auto-detect mode (uses defaults)
|
| 171 |
python eval_calibration.py --probert
|
| 172 |
+
|
| 173 |
+
# Explicit paths (for custom locations)
|
| 174 |
+
python eval_calibration.py \
|
| 175 |
+
--model_dir probert_model \
|
| 176 |
+
--csv probert_training_20260131_004706.csv
|
| 177 |
```
|
| 178 |
|
| 179 |
+
The `--probert` flag auto-detects the model directory and latest predictions CSV. The script computes ECE, confidence gaps, and high-confidence error rates. Full source included in the model repository for transparency.
|
| 180 |
|
| 181 |
**Evaluation Transparency:**
|
| 182 |
|