Spaces:

Ma-Ri-Ba-Ku
/

Picarones

Running

App Files Files Community

Picarones / docs /tutorials /reading-a-report.en.md

Claude

docs: refonte Diataxis + 8 documents institutionnels (S60)

d0a3fab unverified 14 days ago

preview code

raw

history blame contribute delete

4.2 kB

Reading a Picarones report

🇫🇷 Version française

This guide explains how to read a Picarones HTML report — the self-contained file produced by picarones report --output report.html. It is the primary deliverable of a benchmark and is intended to be read both by engineers and by domain experts (archivists, paleographers, project managers).

Anatomy

A report is structured as 5 main views (tabs in the navigation):

Ranking — sortable table of all engines with CER, WER, MER, WIL, ligature/diacritic scores, anchor score, etc.
Gallery — grid view of all documents with color-coded CER badges per engine.
Document — per-document detail with synchronized N-way diff between ground truth and each engine output.
Analyses — statistical charts: CER histogram, radar chart, correlation plots, calibration diagrams, Pareto front, etc.
Characters — Unicode confusion matrix and ligature analysis.

Above the tabs, you'll find:

The factual narrative synthesis (Sprint 19): 3–5 sentences summarizing the salient facts (global leader, statistical ties, outliers, regression flags). Every number cited in the synthesis is traceable to the underlying JSON data — no LLM, no hallucination risk.
The Critical Difference Diagram (Sprint 18, Demšar 2006): visual representation of which engines are statistically indistinguishable.
The Pareto front (Sprint 20): cost vs CER trade-off analysis.

Side panels

Two side panels enrich the report:

Glossary (? icon next to each metric) — definition, what it measures, usage, limits, primary reference. 25 bilingual entries, opens via click on ?.
Advanced mode (⚙ button in nav) — visible columns picker, per-stratum filters (script type), opt-in personal composite score with explicit "no universal weighting" warning. All settings are URL-stateful (shareable).

Export

A "⬇ CSV" button in the navigation exports the current view (with all customization filters applied) to CSV for Excel/LibreOffice. JSON, ALTO XML and PAGE XML exports are available via CLI flags on picarones run and picarones report.

`--lazy-images` mode for large corpora

Sprint A5 (item M-16). By default, the HTML report is a single file transportable: all images are embedded as base64 within the HTML. Convenient for sharing by email, but the file becomes heavy beyond ~50 documents:

Corpus size	Inline HTML	Lazy HTML
10 docs	~5 MB	~3 MB + ~2 MB assets
50 docs	~50 MB	~3 MB + ~10 MB assets
500 docs	~250 MB (slow to load)	~3 MB + ~100 MB lazy-loaded

For digital libraries benchmarking thousands of documents, enable the lazy mode:

picarones report --results results.json --output report.html --lazy-images

The report stays self-contained: copy report.html AND the report-assets/ folder side by side. Images are referenced by relative path and loaded by the browser on-demand (loading="lazy" HTML5).

Spaces:

Ma-Ri-Ba-Ku
/

Picarones

Running

Reading a Picarones report

Anatomy

Suggested reading order

Side panels

Export

`--lazy-images` mode for large corpora

Further reading

Reading a Picarones report

Anatomy

Suggested reading order

Side panels

Export

--lazy-images mode for large corpora

Further reading

`--lazy-images` mode for large corpora