Spaces:
Sleeping
refactor(report): split generator.py (1063 → 431 lines) by concern
Browse filesSprint « découpage de generator.py ». Le fichier orchestre désormais
uniquement le rendu Jinja et la classe ReportGenerator ; toute la
construction de données et l'I/O image sont extraites vers des
sous-modules dédiés.
Ce qui change physiquement :
- generator.py : 1063 → 431 lignes (-60 %).
Conserve : la classe ReportGenerator (init + generate + from_json
+ nouvelle méthode _build_section_html qui regroupe les 12 appels
aux renderers conditionnels), _build_jinja_env, _TEMPLATES_DIR.
Réexporte en alias rétrocompat : _build_report_data, _cer_color,
_cer_bg, _externalize_images_to_dir, _encode_image_b64,
_encode_images_b64_from_result, _load_vendor_js, _pct, _safe.
- picarones/report/assets.py (nouveau, 179 lignes) :
load_vendor_js, encode_image_b64, encode_images_b64_from_result,
externalize_images_to_dir. Tout l'I/O binaire image et vendor.
- picarones/report/report_data/ (nouveau package) :
• __init__.py (102 lignes) — orchestrateur build_report_data.
• _helpers.py (30) — safe_round, percent_string.
• engines.py (103) — résumé par moteur (engines_summary).
• documents.py (167) — galerie + détail + difficulté Sprint 7.
• statistics.py (216) — Wilcoxon, Friedman/Nemenyi, bootstrap,
reliability curves, Venn, error clusters, corrélations.
• scatter.py (56) — Sprint 10 : Gini vs CER, ratio vs anchor.
• pareto.py (123) — Sprint 19 : 3 fronts Pareto + pricing meta.
- render_helpers.py +60 lignes (332 → 392) : ajoute cer_step_color
et cer_step_bg (barème CER discret à 4 paliers).
Frontières conceptuelles (pas arbitraires) : chaque sous-module
correspond à un bloc indépendant qui changera indépendamment des
autres. La construction de pareto a un effet de bord documenté
(annotation in-place de engines_summary avec mean_duration_seconds
et cost) — c'est la seule dépendance d'ordre du package.
Calibration des invariants :
- FILE_BUDGETS : generator.py budget serré 1250 → 500 (verrouille
le gain ; 431 + ~15 % de marge). pipeline_render et
philological_render également un peu rétrécis grâce aux helpers
consolidés au commit précédent.
- test_views.py::test_generator_imports_three_views : assertions
élargies pour accepter les deux conventions de câblage (argument
nommé OU clé de dict splatée via **section_html).
Non-régression :
- 3830 passed, 2 skipped (identique au commit précédent).
- 1 échec pré-existant (tests/docs/test_readme_dual_lang.py) sans
rapport.
- Tous les tests qui importent _build_report_data, _cer_color,
_externalize_images_to_dir depuis picarones.report.generator
continuent de fonctionner via les alias rétrocompat.
- picarones/report/assets.py +179 -0
- picarones/report/generator.py +145 -777
- picarones/report/render_helpers.py +56 -0
- picarones/report/report_data/__init__.py +102 -0
- picarones/report/report_data/_helpers.py +30 -0
- picarones/report/report_data/documents.py +167 -0
- picarones/report/report_data/engines.py +103 -0
- picarones/report/report_data/pareto.py +123 -0
- picarones/report/report_data/scatter.py +56 -0
- picarones/report/report_data/statistics.py +216 -0
- tests/architecture/test_file_budgets.py +8 -3
- tests/report/test_views.py +21 -5
|
@@ -0,0 +1,179 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Chargement et préparation des assets du rapport HTML.
|
| 2 |
+
|
| 3 |
+
Ce module concentre tout ce qui touche aux ressources binaires
|
| 4 |
+
embarquées ou référencées par le rapport :
|
| 5 |
+
|
| 6 |
+
- ``load_vendor_js`` lit un fichier JS vendorisé (Chart.js, etc.).
|
| 7 |
+
- ``encode_image_b64`` redimensionne et encode une image en data-URI.
|
| 8 |
+
- ``encode_images_b64_from_result`` itère sur un BenchmarkResult.
|
| 9 |
+
- ``externalize_images_to_dir`` écrit les images sur disque à côté
|
| 10 |
+
du HTML (mode ``--lazy-images`` du Sprint A5).
|
| 11 |
+
|
| 12 |
+
Extrait de ``picarones/report/generator.py`` lors du sprint de
|
| 13 |
+
découpage : isole l'I/O image et vendor du reste de l'orchestration.
|
| 14 |
+
"""
|
| 15 |
+
|
| 16 |
+
from __future__ import annotations
|
| 17 |
+
|
| 18 |
+
import base64
|
| 19 |
+
import io
|
| 20 |
+
import logging
|
| 21 |
+
from pathlib import Path
|
| 22 |
+
from typing import TYPE_CHECKING
|
| 23 |
+
|
| 24 |
+
if TYPE_CHECKING:
|
| 25 |
+
from picarones.core.results import BenchmarkResult
|
| 26 |
+
|
| 27 |
+
logger = logging.getLogger(__name__)
|
| 28 |
+
|
| 29 |
+
#: Dossier où sont stockées les ressources JS embarquées.
|
| 30 |
+
_VENDOR_DIR = Path(__file__).parent / "vendor"
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
def load_vendor_js(name: str) -> str:
|
| 34 |
+
"""Lit un fichier JS vendorisé et retourne son contenu.
|
| 35 |
+
|
| 36 |
+
Si le fichier n'existe pas, retourne un commentaire JS qui
|
| 37 |
+
garde le rapport valide (pas de SyntaxError côté navigateur).
|
| 38 |
+
"""
|
| 39 |
+
p = _VENDOR_DIR / name
|
| 40 |
+
if p.exists():
|
| 41 |
+
return p.read_text(encoding="utf-8")
|
| 42 |
+
return f"/* vendor/{name} non trouvé */"
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
def encode_image_b64(image_path: str, max_width: int = 1200) -> str:
|
| 46 |
+
"""Lit une image, la redimensionne si besoin, et retourne un data-URI base64."""
|
| 47 |
+
try:
|
| 48 |
+
from PIL import Image
|
| 49 |
+
|
| 50 |
+
p = Path(image_path)
|
| 51 |
+
if not p.exists():
|
| 52 |
+
return ""
|
| 53 |
+
with Image.open(p) as img:
|
| 54 |
+
if img.width > max_width:
|
| 55 |
+
ratio = max_width / img.width
|
| 56 |
+
new_h = max(1, int(img.height * ratio))
|
| 57 |
+
img = img.resize((max_width, new_h), Image.LANCZOS)
|
| 58 |
+
# Convertir en RGB pour éviter les problèmes de mode (RGBA, palette…)
|
| 59 |
+
if img.mode not in ("RGB", "L"):
|
| 60 |
+
img = img.convert("RGB")
|
| 61 |
+
buf = io.BytesIO()
|
| 62 |
+
fmt = "JPEG" if p.suffix.lower() in (".jpg", ".jpeg") else "PNG"
|
| 63 |
+
img.save(buf, format=fmt, optimize=True, quality=85)
|
| 64 |
+
b64 = base64.b64encode(buf.getvalue()).decode("ascii")
|
| 65 |
+
mime = "image/jpeg" if fmt == "JPEG" else "image/png"
|
| 66 |
+
return f"data:{mime};base64,{b64}"
|
| 67 |
+
except Exception: # noqa: BLE001 — fallback silencieux côté report
|
| 68 |
+
return ""
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
def encode_images_b64_from_result(
|
| 72 |
+
benchmark: "BenchmarkResult", max_width: int = 1200,
|
| 73 |
+
) -> dict[str, str]:
|
| 74 |
+
"""Encode toutes les images d'un BenchmarkResult en base64.
|
| 75 |
+
|
| 76 |
+
Returns
|
| 77 |
+
-------
|
| 78 |
+
dict
|
| 79 |
+
``{doc_id: data_uri}``
|
| 80 |
+
"""
|
| 81 |
+
images: dict[str, str] = {}
|
| 82 |
+
if not benchmark.engine_reports:
|
| 83 |
+
return images
|
| 84 |
+
for dr in benchmark.engine_reports[0].document_results:
|
| 85 |
+
if dr.image_path and dr.doc_id not in images:
|
| 86 |
+
uri = encode_image_b64(dr.image_path, max_width=max_width)
|
| 87 |
+
if uri:
|
| 88 |
+
images[dr.doc_id] = uri
|
| 89 |
+
return images
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
def externalize_images_to_dir(
|
| 93 |
+
benchmark: "BenchmarkResult",
|
| 94 |
+
output_dir: Path,
|
| 95 |
+
max_width: int = 1200,
|
| 96 |
+
asset_subdir: str = "report-assets",
|
| 97 |
+
) -> dict[str, str]:
|
| 98 |
+
"""Sprint A5 (item M-16) — écrit les images sur disque dans un
|
| 99 |
+
sous-dossier à côté du HTML, et retourne ``{doc_id: url_relative}``.
|
| 100 |
+
|
| 101 |
+
Mode « lazy loading » : au lieu d'embarquer chaque image en
|
| 102 |
+
base64 dans le HTML (50 MB+ pour un corpus de 100 documents,
|
| 103 |
+
~200 MB+ pour 1 000 documents), on les externalise en fichiers
|
| 104 |
+
PNG/JPEG locaux. Le HTML les référence via
|
| 105 |
+
``<img src="report-assets/…">`` avec ``loading="lazy"`` côté
|
| 106 |
+
navigateur.
|
| 107 |
+
|
| 108 |
+
Le rapport reste auto-portant si l'utilisateur copie le dossier
|
| 109 |
+
``report-assets/`` à côté du HTML (cf. CLI ``--lazy-images``).
|
| 110 |
+
|
| 111 |
+
Parameters
|
| 112 |
+
----------
|
| 113 |
+
benchmark:
|
| 114 |
+
Résultat de benchmark (lit ``image_path`` de chaque DocumentResult).
|
| 115 |
+
output_dir:
|
| 116 |
+
Dossier où le HTML sera écrit ; le sous-dossier d'assets sera
|
| 117 |
+
créé à côté.
|
| 118 |
+
max_width:
|
| 119 |
+
Largeur max du redimensionnement (cohérent avec
|
| 120 |
+
``encode_image_b64``).
|
| 121 |
+
asset_subdir:
|
| 122 |
+
Nom du sous-dossier d'assets (défaut ``"report-assets"``).
|
| 123 |
+
|
| 124 |
+
Returns
|
| 125 |
+
-------
|
| 126 |
+
dict[str, str]
|
| 127 |
+
``{doc_id: "report-assets/<doc_id>.png"}`` (URL relative
|
| 128 |
+
consommable directement dans un attribut HTML ``src``).
|
| 129 |
+
"""
|
| 130 |
+
from PIL import Image
|
| 131 |
+
|
| 132 |
+
assets_dir = output_dir / asset_subdir
|
| 133 |
+
assets_dir.mkdir(parents=True, exist_ok=True)
|
| 134 |
+
out: dict[str, str] = {}
|
| 135 |
+
|
| 136 |
+
seen_ids: set[str] = set()
|
| 137 |
+
for engine_report in benchmark.engine_reports:
|
| 138 |
+
for dr in engine_report.document_results:
|
| 139 |
+
doc_id = dr.doc_id
|
| 140 |
+
if doc_id in seen_ids:
|
| 141 |
+
continue
|
| 142 |
+
seen_ids.add(doc_id)
|
| 143 |
+
try:
|
| 144 |
+
src = Path(dr.image_path)
|
| 145 |
+
if not src.exists():
|
| 146 |
+
continue
|
| 147 |
+
# Nom de fichier dérivé du doc_id, normalisé sans
|
| 148 |
+
# caractères dangereux pour le filesystem.
|
| 149 |
+
safe_id = "".join(
|
| 150 |
+
c if c.isalnum() or c in "._-" else "_" for c in doc_id
|
| 151 |
+
)
|
| 152 |
+
dest = assets_dir / f"{safe_id}{src.suffix.lower() or '.png'}"
|
| 153 |
+
with Image.open(src) as img:
|
| 154 |
+
if img.width > max_width:
|
| 155 |
+
ratio = max_width / img.width
|
| 156 |
+
new_h = max(1, int(img.height * ratio))
|
| 157 |
+
img = img.resize((max_width, new_h), Image.LANCZOS)
|
| 158 |
+
if img.mode not in ("RGB", "L"):
|
| 159 |
+
img = img.convert("RGB")
|
| 160 |
+
fmt = "JPEG" if dest.suffix in (".jpg", ".jpeg") else "PNG"
|
| 161 |
+
img.save(dest, format=fmt, optimize=True, quality=85)
|
| 162 |
+
# URL relative (POSIX style même sur Windows pour HTML).
|
| 163 |
+
out[doc_id] = f"{asset_subdir}/{dest.name}"
|
| 164 |
+
except Exception as exc: # noqa: BLE001 — fallback silencieux + warning
|
| 165 |
+
logger.warning(
|
| 166 |
+
"[report] échec d'externalisation de l'image %s : %s — "
|
| 167 |
+
"le rapport ignorera cette image",
|
| 168 |
+
dr.image_path,
|
| 169 |
+
exc,
|
| 170 |
+
)
|
| 171 |
+
return out
|
| 172 |
+
|
| 173 |
+
|
| 174 |
+
__all__ = [
|
| 175 |
+
"load_vendor_js",
|
| 176 |
+
"encode_image_b64",
|
| 177 |
+
"encode_images_b64_from_result",
|
| 178 |
+
"externalize_images_to_dir",
|
| 179 |
+
]
|
|
@@ -11,667 +11,51 @@ Vues disponibles
|
|
| 11 |
2. Galerie — grille d'images avec badge CER coloré
|
| 12 |
3. Document — image zoomable + diff coloré GT / OCR par moteur
|
| 13 |
4. Analyses — histogramme CER + graphique radar
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
"""
|
| 15 |
|
| 16 |
from __future__ import annotations
|
| 17 |
|
| 18 |
-
import base64
|
| 19 |
-
import io
|
| 20 |
import json
|
| 21 |
import logging
|
| 22 |
from pathlib import Path
|
| 23 |
from typing import Any, Optional
|
| 24 |
|
| 25 |
-
logger = logging.getLogger(__name__)
|
| 26 |
-
|
| 27 |
-
# ---------------------------------------------------------------------------
|
| 28 |
-
# Ressources vendor (embarquées dans le rapport HTML)
|
| 29 |
-
# ---------------------------------------------------------------------------
|
| 30 |
-
|
| 31 |
-
_VENDOR_DIR = Path(__file__).parent / "vendor"
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
def _load_vendor_js(name: str) -> str:
|
| 35 |
-
"""Lit un fichier JS vendorisé et retourne son contenu."""
|
| 36 |
-
p = _VENDOR_DIR / name
|
| 37 |
-
if p.exists():
|
| 38 |
-
return p.read_text(encoding="utf-8")
|
| 39 |
-
return f"/* vendor/{name} non trouvé */"
|
| 40 |
-
|
| 41 |
from picarones.core.results import BenchmarkResult
|
| 42 |
-
from picarones.
|
| 43 |
-
from picarones.
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
|
|
|
|
|
|
|
|
|
| 54 |
)
|
| 55 |
-
from picarones.measurements.pricing import build_costs_for_benchmark, load_pricing_database
|
| 56 |
-
from picarones.measurements.difficulty import compute_all_difficulties, difficulty_label
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
# ---------------------------------------------------------------------------
|
| 60 |
-
# Helpers
|
| 61 |
-
# ---------------------------------------------------------------------------
|
| 62 |
-
|
| 63 |
-
def _encode_image_b64(image_path: str, max_width: int = 1200) -> str:
|
| 64 |
-
"""Lit une image, la redimensionne si besoin, et retourne un data-URI base64."""
|
| 65 |
-
try:
|
| 66 |
-
from PIL import Image
|
| 67 |
-
p = Path(image_path)
|
| 68 |
-
if not p.exists():
|
| 69 |
-
return ""
|
| 70 |
-
with Image.open(p) as img:
|
| 71 |
-
if img.width > max_width:
|
| 72 |
-
ratio = max_width / img.width
|
| 73 |
-
new_h = max(1, int(img.height * ratio))
|
| 74 |
-
img = img.resize((max_width, new_h), Image.LANCZOS)
|
| 75 |
-
# Convertir en RGB pour éviter les problèmes de mode (RGBA, palette…)
|
| 76 |
-
if img.mode not in ("RGB", "L"):
|
| 77 |
-
img = img.convert("RGB")
|
| 78 |
-
buf = io.BytesIO()
|
| 79 |
-
fmt = "JPEG" if p.suffix.lower() in (".jpg", ".jpeg") else "PNG"
|
| 80 |
-
img.save(buf, format=fmt, optimize=True, quality=85)
|
| 81 |
-
b64 = base64.b64encode(buf.getvalue()).decode("ascii")
|
| 82 |
-
mime = "image/jpeg" if fmt == "JPEG" else "image/png"
|
| 83 |
-
return f"data:{mime};base64,{b64}"
|
| 84 |
-
except Exception:
|
| 85 |
-
return ""
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
def _externalize_images_to_dir(
|
| 89 |
-
benchmark: "BenchmarkResult",
|
| 90 |
-
output_dir: Path,
|
| 91 |
-
max_width: int = 1200,
|
| 92 |
-
asset_subdir: str = "report-assets",
|
| 93 |
-
) -> dict[str, str]:
|
| 94 |
-
"""Sprint A5 (item M-16) — écrit les images sur disque dans un
|
| 95 |
-
sous-dossier à côté du HTML, et retourne ``{doc_id: url_relative}``.
|
| 96 |
-
|
| 97 |
-
Mode « lazy loading » : au lieu d'embarquer chaque image en
|
| 98 |
-
base64 dans le HTML (50 MB+ pour un corpus de 100 documents,
|
| 99 |
-
~200 MB+ pour 1 000 documents), on les externalise en fichiers
|
| 100 |
-
PNG/JPEG locaux. Le HTML les référence via ``<img src="report-assets/…">``
|
| 101 |
-
avec ``loading="lazy"`` côté navigateur.
|
| 102 |
-
|
| 103 |
-
Le rapport reste auto-portant si l'utilisateur copie le dossier
|
| 104 |
-
``report-assets/`` à côté du HTML (cf. CLI ``--lazy-images``).
|
| 105 |
-
|
| 106 |
-
Parameters
|
| 107 |
-
----------
|
| 108 |
-
benchmark:
|
| 109 |
-
Résultat de benchmark (lit ``image_path`` de chaque DocumentResult).
|
| 110 |
-
output_dir:
|
| 111 |
-
Dossier où le HTML sera écrit ; le sous-dossier d'assets sera
|
| 112 |
-
créé à côté.
|
| 113 |
-
max_width:
|
| 114 |
-
Largeur max du redimensionnement (cohérent avec
|
| 115 |
-
``_encode_image_b64``).
|
| 116 |
-
asset_subdir:
|
| 117 |
-
Nom du sous-dossier d'assets (défaut ``"report-assets"``).
|
| 118 |
-
|
| 119 |
-
Returns
|
| 120 |
-
-------
|
| 121 |
-
dict[str, str]
|
| 122 |
-
``{doc_id: "report-assets/<doc_id>.png"}`` (URL relative
|
| 123 |
-
consommable directement dans un attribut HTML ``src``).
|
| 124 |
-
"""
|
| 125 |
-
from PIL import Image
|
| 126 |
-
|
| 127 |
-
assets_dir = output_dir / asset_subdir
|
| 128 |
-
assets_dir.mkdir(parents=True, exist_ok=True)
|
| 129 |
-
out: dict[str, str] = {}
|
| 130 |
-
|
| 131 |
-
seen_ids: set[str] = set()
|
| 132 |
-
for engine_report in benchmark.engine_reports:
|
| 133 |
-
for dr in engine_report.document_results:
|
| 134 |
-
doc_id = dr.doc_id
|
| 135 |
-
if doc_id in seen_ids:
|
| 136 |
-
continue
|
| 137 |
-
seen_ids.add(doc_id)
|
| 138 |
-
try:
|
| 139 |
-
src = Path(dr.image_path)
|
| 140 |
-
if not src.exists():
|
| 141 |
-
continue
|
| 142 |
-
# Nom de fichier dérivé du doc_id, normalisé sans
|
| 143 |
-
# caractères dangereux pour le filesystem.
|
| 144 |
-
safe_id = "".join(
|
| 145 |
-
c if c.isalnum() or c in "._-" else "_" for c in doc_id
|
| 146 |
-
)
|
| 147 |
-
dest = assets_dir / f"{safe_id}{src.suffix.lower() or '.png'}"
|
| 148 |
-
with Image.open(src) as img:
|
| 149 |
-
if img.width > max_width:
|
| 150 |
-
ratio = max_width / img.width
|
| 151 |
-
new_h = max(1, int(img.height * ratio))
|
| 152 |
-
img = img.resize((max_width, new_h), Image.LANCZOS)
|
| 153 |
-
if img.mode not in ("RGB", "L"):
|
| 154 |
-
img = img.convert("RGB")
|
| 155 |
-
fmt = "JPEG" if dest.suffix in (".jpg", ".jpeg") else "PNG"
|
| 156 |
-
img.save(dest, format=fmt, optimize=True, quality=85)
|
| 157 |
-
# URL relative (POSIX style même sur Windows pour HTML).
|
| 158 |
-
out[doc_id] = f"{asset_subdir}/{dest.name}"
|
| 159 |
-
except Exception as exc: # noqa: BLE001 — fallback silencieux + warning
|
| 160 |
-
logger.warning(
|
| 161 |
-
"[report] échec d'externalisation de l'image %s : %s — "
|
| 162 |
-
"le rapport ignorera cette image",
|
| 163 |
-
dr.image_path,
|
| 164 |
-
exc,
|
| 165 |
-
)
|
| 166 |
-
return out
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
def _encode_images_b64_from_result(benchmark: "BenchmarkResult", max_width: int = 1200) -> dict[str, str]:
|
| 170 |
-
"""Encode toutes les images d'un BenchmarkResult en base64.
|
| 171 |
-
|
| 172 |
-
Returns
|
| 173 |
-
-------
|
| 174 |
-
dict
|
| 175 |
-
``{doc_id: data_uri}``
|
| 176 |
-
"""
|
| 177 |
-
images: dict[str, str] = {}
|
| 178 |
-
if not benchmark.engine_reports:
|
| 179 |
-
return images
|
| 180 |
-
for dr in benchmark.engine_reports[0].document_results:
|
| 181 |
-
if dr.image_path and dr.doc_id not in images:
|
| 182 |
-
uri = _encode_image_b64(dr.image_path, max_width=max_width)
|
| 183 |
-
if uri:
|
| 184 |
-
images[dr.doc_id] = uri
|
| 185 |
-
return images
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
def _cer_color(cer: float) -> str:
|
| 189 |
-
"""Retourne une couleur CSS pour un score CER donné (0→vert, 1→rouge)."""
|
| 190 |
-
from picarones.report.colors import COLOR_GREEN, COLOR_YELLOW, COLOR_ORANGE, COLOR_RED
|
| 191 |
-
if cer < 0.05:
|
| 192 |
-
return COLOR_GREEN
|
| 193 |
-
if cer < 0.15:
|
| 194 |
-
return COLOR_YELLOW
|
| 195 |
-
if cer < 0.30:
|
| 196 |
-
return COLOR_ORANGE
|
| 197 |
-
return COLOR_RED
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
def _cer_bg(cer: float) -> str:
|
| 201 |
-
from picarones.report.colors import BG_GREEN, BG_YELLOW, BG_ORANGE, BG_RED
|
| 202 |
-
if cer < 0.05:
|
| 203 |
-
return BG_GREEN
|
| 204 |
-
if cer < 0.15:
|
| 205 |
-
return BG_YELLOW
|
| 206 |
-
if cer < 0.30:
|
| 207 |
-
return BG_ORANGE
|
| 208 |
-
return BG_RED
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
def _pct(v: Optional[float], decimals: int = 2) -> str:
|
| 212 |
-
if v is None:
|
| 213 |
-
return "—"
|
| 214 |
-
return f"{v * 100:.{decimals}f} %"
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
def _safe(v: Optional[float], decimals: int = 4) -> float:
|
| 218 |
-
return round(v or 0.0, decimals)
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
# ---------------------------------------------------------------------------
|
| 222 |
-
# Préparation des données
|
| 223 |
-
# ---------------------------------------------------------------------------
|
| 224 |
-
|
| 225 |
-
def _build_report_data(benchmark: BenchmarkResult, images_b64: dict[str, str]) -> dict:
|
| 226 |
-
"""Transforme un BenchmarkResult en dict JSON pour le rapport HTML."""
|
| 227 |
-
|
| 228 |
-
engines_summary = []
|
| 229 |
-
for report in benchmark.engine_reports:
|
| 230 |
-
agg = report.aggregated_metrics
|
| 231 |
-
diplo_agg = agg.get("cer_diplomatic", {})
|
| 232 |
-
entry: dict = {
|
| 233 |
-
"name": report.engine_name,
|
| 234 |
-
"version": report.engine_version,
|
| 235 |
-
"cer": _safe(agg.get("cer", {}).get("mean")),
|
| 236 |
-
"wer": _safe(agg.get("wer", {}).get("mean")),
|
| 237 |
-
"mer": _safe(agg.get("mer", {}).get("mean")),
|
| 238 |
-
"wil": _safe(agg.get("wil", {}).get("mean")),
|
| 239 |
-
"cer_median": _safe(agg.get("cer", {}).get("median")),
|
| 240 |
-
"cer_min": _safe(agg.get("cer", {}).get("min")),
|
| 241 |
-
"cer_max": _safe(agg.get("cer", {}).get("max")),
|
| 242 |
-
"doc_count": agg.get("document_count", 0),
|
| 243 |
-
"failed": agg.get("failed_count", 0),
|
| 244 |
-
# CER diplomatique (après normalisation historique : ſ=s, u=v, i=j…)
|
| 245 |
-
"cer_diplomatic": _safe(diplo_agg.get("mean")) if diplo_agg else None,
|
| 246 |
-
"cer_diplomatic_profile": diplo_agg.get("profile"),
|
| 247 |
-
# Distribution pour l'histogramme : liste des CER individuels
|
| 248 |
-
"cer_values": [
|
| 249 |
-
_safe(dr.metrics.cer)
|
| 250 |
-
for dr in report.document_results
|
| 251 |
-
if dr.metrics.error is None
|
| 252 |
-
],
|
| 253 |
-
"cer_diplomatic_values": [
|
| 254 |
-
_safe(dr.metrics.cer_diplomatic)
|
| 255 |
-
for dr in report.document_results
|
| 256 |
-
if dr.metrics.error is None and dr.metrics.cer_diplomatic is not None
|
| 257 |
-
],
|
| 258 |
-
# Champs pipeline OCR+LLM (vides pour les moteurs OCR seuls)
|
| 259 |
-
"is_pipeline": report.is_pipeline,
|
| 260 |
-
"pipeline_info": report.pipeline_info,
|
| 261 |
-
# Sprint 5 — métriques avancées patrimoniales
|
| 262 |
-
"ligature_score": _safe(report.ligature_score) if report.ligature_score is not None else None,
|
| 263 |
-
"diacritic_score": _safe(report.diacritic_score) if report.diacritic_score is not None else None,
|
| 264 |
-
"aggregated_confusion": report.aggregated_confusion,
|
| 265 |
-
"aggregated_taxonomy": report.aggregated_taxonomy,
|
| 266 |
-
"aggregated_structure": report.aggregated_structure,
|
| 267 |
-
"aggregated_image_quality": report.aggregated_image_quality,
|
| 268 |
-
# Sprint 10 — distribution des erreurs + hallucinations VLM
|
| 269 |
-
"gini": _safe(report.aggregated_line_metrics.get("gini_mean")) if report.aggregated_line_metrics else None,
|
| 270 |
-
"cer_p90": _safe(report.aggregated_line_metrics.get("percentiles", {}).get("p90")) if report.aggregated_line_metrics else None,
|
| 271 |
-
"cer_p99": _safe(report.aggregated_line_metrics.get("percentiles", {}).get("p99")) if report.aggregated_line_metrics else None,
|
| 272 |
-
"catastrophic_rate_30": _safe(report.aggregated_line_metrics.get("catastrophic_rate", {}).get("0.3")) if report.aggregated_line_metrics else None,
|
| 273 |
-
"aggregated_line_metrics": report.aggregated_line_metrics,
|
| 274 |
-
"anchor_score": _safe(report.aggregated_hallucination.get("anchor_score_mean")) if report.aggregated_hallucination else None,
|
| 275 |
-
"length_ratio": _safe(report.aggregated_hallucination.get("length_ratio_mean")) if report.aggregated_hallucination else None,
|
| 276 |
-
"hallucinating_doc_rate": _safe(report.aggregated_hallucination.get("hallucinating_doc_rate")) if report.aggregated_hallucination else None,
|
| 277 |
-
"aggregated_hallucination": report.aggregated_hallucination,
|
| 278 |
-
# Sprint 41 — NER agrégé (None si aucun calcul effectué)
|
| 279 |
-
"aggregated_ner": report.aggregated_ner,
|
| 280 |
-
# Sprint 43 — calibration agrégée (None si aucune confidence
|
| 281 |
-
# n'a été exposée par le moteur sur ce corpus)
|
| 282 |
-
"aggregated_calibration": report.aggregated_calibration,
|
| 283 |
-
# Sprint 62 — profil philologique agrégé (None si aucun
|
| 284 |
-
# signal philologique sur le corpus pour ce moteur)
|
| 285 |
-
"aggregated_philological": report.aggregated_philological,
|
| 286 |
-
# Sprint 86 — A.II.5 (recherchabilité fuzzy + séquences
|
| 287 |
-
# numériques). None si aucun document n'a de signal.
|
| 288 |
-
"aggregated_searchability": report.aggregated_searchability,
|
| 289 |
-
"aggregated_numerical_sequences": (
|
| 290 |
-
report.aggregated_numerical_sequences
|
| 291 |
-
),
|
| 292 |
-
# Sprint 87 — A.II.2 (delta Flesch agrégé)
|
| 293 |
-
"aggregated_readability": report.aggregated_readability,
|
| 294 |
-
"is_vlm": report.pipeline_info.get("is_vlm", False) if report.pipeline_info else False,
|
| 295 |
-
}
|
| 296 |
-
engines_summary.append(entry)
|
| 297 |
-
|
| 298 |
-
# Documents (vue galerie + vue détail)
|
| 299 |
-
# On collecte tous les doc_ids depuis l'union de tous les moteurs,
|
| 300 |
-
# en préservant l'ordre d'apparition (premier moteur d'abord, puis compléments).
|
| 301 |
-
seen_doc_ids: set[str] = set()
|
| 302 |
-
doc_ids_ordered: list[str] = []
|
| 303 |
-
for report in benchmark.engine_reports:
|
| 304 |
-
for dr in report.document_results:
|
| 305 |
-
if dr.doc_id not in seen_doc_ids:
|
| 306 |
-
seen_doc_ids.add(dr.doc_id)
|
| 307 |
-
doc_ids_ordered.append(dr.doc_id)
|
| 308 |
-
|
| 309 |
-
# Index croisé : doc_id → {engine_name → DocumentResult}
|
| 310 |
-
doc_engine_map: dict[str, dict] = {did: {} for did in doc_ids_ordered}
|
| 311 |
-
for report in benchmark.engine_reports:
|
| 312 |
-
for dr in report.document_results:
|
| 313 |
-
doc_engine_map.setdefault(dr.doc_id, {})[report.engine_name] = dr
|
| 314 |
-
|
| 315 |
-
documents = []
|
| 316 |
-
for doc_id in doc_ids_ordered:
|
| 317 |
-
engine_results = []
|
| 318 |
-
gt = ""
|
| 319 |
-
image_path = ""
|
| 320 |
-
for engine_name in [r.engine_name for r in benchmark.engine_reports]:
|
| 321 |
-
dr = doc_engine_map[doc_id].get(engine_name)
|
| 322 |
-
if dr is None:
|
| 323 |
-
continue
|
| 324 |
-
gt = dr.ground_truth
|
| 325 |
-
image_path = dr.image_path
|
| 326 |
-
diff_ops = compute_char_diff(dr.ground_truth, dr.hypothesis)
|
| 327 |
-
er_entry: dict = {
|
| 328 |
-
"engine": engine_name,
|
| 329 |
-
"hypothesis": dr.hypothesis,
|
| 330 |
-
"cer": _safe(dr.metrics.cer),
|
| 331 |
-
"cer_diplomatic": _safe(dr.metrics.cer_diplomatic) if dr.metrics.cer_diplomatic is not None else None,
|
| 332 |
-
"wer": _safe(dr.metrics.wer),
|
| 333 |
-
"mer": _safe(dr.metrics.mer),
|
| 334 |
-
"wil": _safe(dr.metrics.wil),
|
| 335 |
-
"duration": dr.duration_seconds,
|
| 336 |
-
"error": dr.engine_error,
|
| 337 |
-
"diff": diff_ops,
|
| 338 |
-
}
|
| 339 |
-
# Champs spécifiques aux pipelines OCR+LLM
|
| 340 |
-
if dr.ocr_intermediate is not None:
|
| 341 |
-
er_entry["ocr_intermediate"] = dr.ocr_intermediate
|
| 342 |
-
er_entry["ocr_diff"] = compute_word_diff(dr.ground_truth, dr.ocr_intermediate)
|
| 343 |
-
er_entry["llm_correction_diff"] = compute_word_diff(dr.ocr_intermediate, dr.hypothesis)
|
| 344 |
-
if dr.pipeline_metadata:
|
| 345 |
-
on = dr.pipeline_metadata.get("over_normalization")
|
| 346 |
-
if on is not None:
|
| 347 |
-
er_entry["over_normalization"] = on
|
| 348 |
-
er_entry["pipeline_mode"] = dr.pipeline_metadata.get("pipeline_mode")
|
| 349 |
-
# Sprint 5 — métriques avancées par document
|
| 350 |
-
if dr.char_scores is not None:
|
| 351 |
-
er_entry["ligature_score"] = _safe(dr.char_scores.get("ligature", {}).get("score"))
|
| 352 |
-
er_entry["diacritic_score"] = _safe(dr.char_scores.get("diacritic", {}).get("score"))
|
| 353 |
-
if dr.taxonomy is not None:
|
| 354 |
-
er_entry["taxonomy"] = dr.taxonomy
|
| 355 |
-
if dr.structure is not None:
|
| 356 |
-
er_entry["structure"] = dr.structure
|
| 357 |
-
if dr.image_quality is not None:
|
| 358 |
-
er_entry["image_quality"] = dr.image_quality
|
| 359 |
-
# Sprint 10
|
| 360 |
-
if dr.line_metrics is not None:
|
| 361 |
-
er_entry["line_metrics"] = dr.line_metrics
|
| 362 |
-
if dr.hallucination_metrics is not None:
|
| 363 |
-
er_entry["hallucination_metrics"] = dr.hallucination_metrics
|
| 364 |
-
engine_results.append(er_entry)
|
| 365 |
-
|
| 366 |
-
# CER moyen sur ce document (pour le badge galerie)
|
| 367 |
-
cer_values = [er["cer"] for er in engine_results if er["error"] is None]
|
| 368 |
-
mean_cer = sum(cer_values) / len(cer_values) if cer_values else 1.0
|
| 369 |
-
best_engine = min(engine_results, key=lambda x: x["cer"], default=None)
|
| 370 |
-
|
| 371 |
-
# Script type (depuis metadata par document si disponible)
|
| 372 |
-
script_type = ""
|
| 373 |
-
first_dr = doc_engine_map[doc_id].get(
|
| 374 |
-
benchmark.engine_reports[0].engine_name if benchmark.engine_reports else None
|
| 375 |
-
)
|
| 376 |
-
if first_dr and first_dr.image_quality:
|
| 377 |
-
script_type = first_dr.image_quality.get("script_type", "")
|
| 378 |
-
|
| 379 |
-
documents.append({
|
| 380 |
-
"doc_id": doc_id,
|
| 381 |
-
"image_path": image_path,
|
| 382 |
-
"image_b64": images_b64.get(doc_id, ""),
|
| 383 |
-
"ground_truth": gt,
|
| 384 |
-
"mean_cer": _safe(mean_cer),
|
| 385 |
-
"best_engine": best_engine["engine"] if best_engine else "",
|
| 386 |
-
"engine_results": engine_results,
|
| 387 |
-
"script_type": script_type,
|
| 388 |
-
})
|
| 389 |
-
|
| 390 |
-
# ── Sprint 7 — Score de difficulté intrinsèque ───────────────────────
|
| 391 |
-
gt_map = {d["doc_id"]: d["ground_truth"] for d in documents}
|
| 392 |
-
cer_map: dict[str, dict[str, float]] = {d["doc_id"]: {} for d in documents}
|
| 393 |
-
iq_map: dict[str, float] = {}
|
| 394 |
-
for report in benchmark.engine_reports:
|
| 395 |
-
for dr in report.document_results:
|
| 396 |
-
cer_map.setdefault(dr.doc_id, {})[report.engine_name] = _safe(dr.metrics.cer)
|
| 397 |
-
if dr.image_quality and "quality_score" in dr.image_quality:
|
| 398 |
-
iq_map[dr.doc_id] = dr.image_quality["quality_score"]
|
| 399 |
-
difficulty_scores = compute_all_difficulties(
|
| 400 |
-
doc_ids=doc_ids_ordered,
|
| 401 |
-
ground_truths=gt_map,
|
| 402 |
-
cer_map=cer_map,
|
| 403 |
-
image_quality_map=iq_map or None,
|
| 404 |
-
)
|
| 405 |
-
# Ajouter difficulty_score à chaque document
|
| 406 |
-
for doc in documents:
|
| 407 |
-
ds = difficulty_scores.get(doc["doc_id"])
|
| 408 |
-
if ds:
|
| 409 |
-
doc["difficulty_score"] = _safe(ds.score)
|
| 410 |
-
doc["difficulty_label"] = difficulty_label(ds.score)
|
| 411 |
-
else:
|
| 412 |
-
doc["difficulty_score"] = 0.5
|
| 413 |
-
doc["difficulty_label"] = "Modéré"
|
| 414 |
-
|
| 415 |
-
# ── Sprint 7 — Tests statistiques (Wilcoxon pairwise + bootstrap CI) ─
|
| 416 |
-
engine_cer_map_stats: dict[str, list[float]] = {}
|
| 417 |
-
for report in benchmark.engine_reports:
|
| 418 |
-
vals = [_safe(dr.metrics.cer) for dr in report.document_results if dr.metrics.error is None]
|
| 419 |
-
if vals:
|
| 420 |
-
engine_cer_map_stats[report.engine_name] = vals
|
| 421 |
-
|
| 422 |
-
pairwise_stats = compute_pairwise_stats(engine_cer_map_stats)
|
| 423 |
-
|
| 424 |
-
# ── Sprint 17 — Friedman + Nemenyi ──────────────────────────────────
|
| 425 |
-
# Alignement strict sur le même ordre de documents : on reconstruit la
|
| 426 |
-
# map à partir des documents communs à tous les moteurs, sinon Friedman
|
| 427 |
-
# n'est pas applicable.
|
| 428 |
-
engine_cer_aligned: dict[str, list[float]] = {}
|
| 429 |
-
common_doc_ids: Optional[set[str]] = None
|
| 430 |
-
for report in benchmark.engine_reports:
|
| 431 |
-
doc_ids = {dr.doc_id for dr in report.document_results if dr.metrics.error is None}
|
| 432 |
-
common_doc_ids = doc_ids if common_doc_ids is None else common_doc_ids & doc_ids
|
| 433 |
-
if common_doc_ids:
|
| 434 |
-
ordered_common = [d for d in doc_ids_ordered if d in common_doc_ids]
|
| 435 |
-
for report in benchmark.engine_reports:
|
| 436 |
-
dr_by_id = {dr.doc_id: dr for dr in report.document_results}
|
| 437 |
-
engine_cer_aligned[report.engine_name] = [
|
| 438 |
-
_safe(dr_by_id[d].metrics.cer) for d in ordered_common
|
| 439 |
-
]
|
| 440 |
-
|
| 441 |
-
friedman = friedman_test(engine_cer_aligned) if engine_cer_aligned else {
|
| 442 |
-
"statistic": 0.0, "p_value": 1.0, "significant": False,
|
| 443 |
-
"df": 0, "n_blocks": 0, "n_engines": 0, "mean_ranks": {},
|
| 444 |
-
"interpretation": "Test de Friedman non calculé — aucun document commun.",
|
| 445 |
-
"error": "no_common_documents",
|
| 446 |
-
}
|
| 447 |
-
nemenyi = nemenyi_posthoc(engine_cer_aligned) if engine_cer_aligned else {
|
| 448 |
-
"alpha": 0.05, "critical_distance": 0.0, "q_alpha": 0.0,
|
| 449 |
-
"n_blocks": 0, "n_engines": 0, "mean_ranks": {},
|
| 450 |
-
"engines_sorted": [], "significant_matrix": [], "tied_groups": [],
|
| 451 |
-
"error": "no_common_documents",
|
| 452 |
-
}
|
| 453 |
-
|
| 454 |
-
bootstrap_cis: list[dict] = []
|
| 455 |
-
for engine_name, vals in engine_cer_map_stats.items():
|
| 456 |
-
lo, hi = bootstrap_ci(vals)
|
| 457 |
-
mean_v = sum(vals) / len(vals) if vals else 0.0
|
| 458 |
-
bootstrap_cis.append({
|
| 459 |
-
"engine": engine_name,
|
| 460 |
-
"mean": _safe(mean_v),
|
| 461 |
-
"ci_lower": _safe(lo),
|
| 462 |
-
"ci_upper": _safe(hi),
|
| 463 |
-
})
|
| 464 |
-
|
| 465 |
-
# ── Sprint 7 — Courbes de fiabilité ──────────────────────────────────
|
| 466 |
-
reliability_curves: list[dict] = []
|
| 467 |
-
for report in benchmark.engine_reports:
|
| 468 |
-
vals = [_safe(dr.metrics.cer) for dr in report.document_results if dr.metrics.error is None]
|
| 469 |
-
curve = compute_reliability_curve(vals)
|
| 470 |
-
reliability_curves.append({
|
| 471 |
-
"engine": report.engine_name,
|
| 472 |
-
"points": curve,
|
| 473 |
-
})
|
| 474 |
-
|
| 475 |
-
# ── Sprint 7 — Venn des erreurs communes / exclusives ────────────────
|
| 476 |
-
# Construire les ensembles d'erreurs par moteur : {engine → set(doc_id:gt_tok:hyp_tok)}
|
| 477 |
-
venn_error_sets: dict[str, set[str]] = {}
|
| 478 |
-
for report in benchmark.engine_reports:
|
| 479 |
-
error_set: set[str] = set()
|
| 480 |
-
for dr in report.document_results:
|
| 481 |
-
ops = compute_word_diff(dr.ground_truth, dr.hypothesis)
|
| 482 |
-
for op in ops:
|
| 483 |
-
if op["op"] in ("replace", "delete", "insert"):
|
| 484 |
-
key = f"{dr.doc_id}:{op.get('old', op.get('text',''))}:{op.get('new', op.get('text',''))}"
|
| 485 |
-
error_set.add(key)
|
| 486 |
-
venn_error_sets[report.engine_name] = error_set
|
| 487 |
-
|
| 488 |
-
venn_data = compute_venn_data(venn_error_sets)
|
| 489 |
-
|
| 490 |
-
# ── Sprint 7 — Clustering des patterns d'erreurs ─────────────────────
|
| 491 |
-
error_data_all: list[dict] = []
|
| 492 |
-
for report in benchmark.engine_reports:
|
| 493 |
-
for dr in report.document_results:
|
| 494 |
-
error_data_all.append({
|
| 495 |
-
"engine": report.engine_name,
|
| 496 |
-
"gt": dr.ground_truth,
|
| 497 |
-
"hypothesis": dr.hypothesis,
|
| 498 |
-
})
|
| 499 |
-
error_clusters_raw = cluster_errors(error_data_all, max_clusters=8)
|
| 500 |
-
error_clusters = [c.as_dict() for c in error_clusters_raw]
|
| 501 |
-
|
| 502 |
-
# ── Sprint 7 — Matrice de corrélation ────────────────────────────────
|
| 503 |
-
# Pour chaque moteur : une liste de dicts métriques par document
|
| 504 |
-
correlation_per_engine: list[dict] = []
|
| 505 |
-
for report in benchmark.engine_reports:
|
| 506 |
-
metrics_list = []
|
| 507 |
-
for dr in report.document_results:
|
| 508 |
-
if dr.metrics.error is not None:
|
| 509 |
-
continue
|
| 510 |
-
entry: dict[str, float] = {
|
| 511 |
-
"cer": _safe(dr.metrics.cer),
|
| 512 |
-
"wer": _safe(dr.metrics.wer),
|
| 513 |
-
"mer": _safe(dr.metrics.mer),
|
| 514 |
-
"wil": _safe(dr.metrics.wil),
|
| 515 |
-
}
|
| 516 |
-
if dr.image_quality:
|
| 517 |
-
entry["quality_score"] = _safe(dr.image_quality.get("quality_score", 0.5))
|
| 518 |
-
entry["sharpness"] = _safe(dr.image_quality.get("sharpness_score", 0.5))
|
| 519 |
-
if dr.char_scores:
|
| 520 |
-
entry["ligature"] = _safe(dr.char_scores.get("ligature", {}).get("score", 0.5))
|
| 521 |
-
entry["diacritic"] = _safe(dr.char_scores.get("diacritic", {}).get("score", 0.5))
|
| 522 |
-
metrics_list.append(entry)
|
| 523 |
-
if metrics_list:
|
| 524 |
-
corr = compute_correlation_matrix(metrics_list)
|
| 525 |
-
correlation_per_engine.append({
|
| 526 |
-
"engine": report.engine_name,
|
| 527 |
-
**corr,
|
| 528 |
-
})
|
| 529 |
-
|
| 530 |
-
# ── Sprint 10 — Données scatter plots ─────────────────────────────────
|
| 531 |
-
# Scatter 1 : Gini vs CER moyen (moteurs)
|
| 532 |
-
gini_vs_cer = []
|
| 533 |
-
for report in benchmark.engine_reports:
|
| 534 |
-
gini_val = report.aggregated_line_metrics.get("gini_mean") if report.aggregated_line_metrics else None
|
| 535 |
-
cer_val = report.mean_cer
|
| 536 |
-
if gini_val is not None and cer_val is not None:
|
| 537 |
-
gini_vs_cer.append({
|
| 538 |
-
"engine": report.engine_name,
|
| 539 |
-
"cer": _safe(cer_val),
|
| 540 |
-
"gini": _safe(gini_val),
|
| 541 |
-
"is_pipeline": report.is_pipeline,
|
| 542 |
-
})
|
| 543 |
-
|
| 544 |
-
# ── Sprint 19 — Coûts et frontière de Pareto ────────────────────────
|
| 545 |
-
# Durée moyenne mesurée par moteur sur le benchmark courant (sec/page)
|
| 546 |
-
durations_by_engine: dict[str, float] = {}
|
| 547 |
-
for report in benchmark.engine_reports:
|
| 548 |
-
durs = [dr.duration_seconds for dr in report.document_results
|
| 549 |
-
if dr.duration_seconds is not None]
|
| 550 |
-
if durs:
|
| 551 |
-
durations_by_engine[report.engine_name] = sum(durs) / len(durs)
|
| 552 |
-
|
| 553 |
-
pricing_defaults, _ = load_pricing_database()
|
| 554 |
-
costs_by_engine = build_costs_for_benchmark(
|
| 555 |
-
engines_summary, durations_by_engine,
|
| 556 |
-
)
|
| 557 |
-
# Annoter chaque résumé moteur avec son coût et sa durée
|
| 558 |
-
for entry in engines_summary:
|
| 559 |
-
name = entry["name"]
|
| 560 |
-
entry["mean_duration_seconds"] = round(durations_by_engine.get(name, 0.0), 4) \
|
| 561 |
-
if name in durations_by_engine else None
|
| 562 |
-
entry["cost"] = costs_by_engine.get(name)
|
| 563 |
-
|
| 564 |
-
# Front Pareto sur (CER moyen, coût €/1000 pages) — moteurs avec les deux dispos
|
| 565 |
-
pareto_points = []
|
| 566 |
-
for entry in engines_summary:
|
| 567 |
-
cer = entry.get("cer")
|
| 568 |
-
cost = (entry.get("cost") or {}).get("cost_per_1k_pages_eur")
|
| 569 |
-
if cer is None or cost is None:
|
| 570 |
-
continue
|
| 571 |
-
pareto_points.append({"engine": entry["name"], "cer": cer, "cost": cost})
|
| 572 |
-
pareto_front_engines = compute_pareto_front(
|
| 573 |
-
pareto_points, objectives=("cer", "cost"),
|
| 574 |
-
)
|
| 575 |
-
|
| 576 |
-
# Front Pareto secondaire (CER, vitesse) pour le toggle "vitesse"
|
| 577 |
-
pareto_speed_points = []
|
| 578 |
-
for entry in engines_summary:
|
| 579 |
-
cer = entry.get("cer")
|
| 580 |
-
dur = entry.get("mean_duration_seconds")
|
| 581 |
-
if cer is None or dur is None:
|
| 582 |
-
continue
|
| 583 |
-
pareto_speed_points.append({"engine": entry["name"], "cer": cer, "dur": dur})
|
| 584 |
-
pareto_front_speed = compute_pareto_front(
|
| 585 |
-
pareto_speed_points, objectives=("cer", "dur"),
|
| 586 |
-
)
|
| 587 |
-
|
| 588 |
-
# Front Pareto carbone (CER, g CO2 / 1000 pages) — étiqueté expérimental
|
| 589 |
-
pareto_co2_points = []
|
| 590 |
-
for entry in engines_summary:
|
| 591 |
-
cer = entry.get("cer")
|
| 592 |
-
co2 = (entry.get("cost") or {}).get("co2_per_1k_pages_g")
|
| 593 |
-
if cer is None or co2 is None:
|
| 594 |
-
continue
|
| 595 |
-
pareto_co2_points.append({"engine": entry["name"], "cer": cer, "co2": co2})
|
| 596 |
-
pareto_front_co2 = compute_pareto_front(
|
| 597 |
-
pareto_co2_points, objectives=("cer", "co2"),
|
| 598 |
-
)
|
| 599 |
-
|
| 600 |
-
pareto_data = {
|
| 601 |
-
"cost": {
|
| 602 |
-
"points": pareto_points,
|
| 603 |
-
"front": pareto_front_engines,
|
| 604 |
-
"axis_label": "Coût (€ / 1000 pages)",
|
| 605 |
-
},
|
| 606 |
-
"speed": {
|
| 607 |
-
"points": pareto_speed_points,
|
| 608 |
-
"front": pareto_front_speed,
|
| 609 |
-
"axis_label": "Temps moyen (s / page)",
|
| 610 |
-
},
|
| 611 |
-
"co2": {
|
| 612 |
-
"points": pareto_co2_points,
|
| 613 |
-
"front": pareto_front_co2,
|
| 614 |
-
"axis_label": "Empreinte carbone (g CO₂ / 1000 pages, expérimental)",
|
| 615 |
-
},
|
| 616 |
-
"pricing_meta": {
|
| 617 |
-
"last_updated": pricing_defaults.last_updated,
|
| 618 |
-
"currency": pricing_defaults.currency,
|
| 619 |
-
"hourly_rate_local_cpu_eur": pricing_defaults.hourly_rate_local_cpu_eur,
|
| 620 |
-
"hourly_rate_local_gpu_eur": pricing_defaults.hourly_rate_local_gpu_eur,
|
| 621 |
-
"grid_intensity_local": pricing_defaults.grid_intensity_local,
|
| 622 |
-
"grid_intensity_cloud": pricing_defaults.grid_intensity_cloud,
|
| 623 |
-
},
|
| 624 |
-
}
|
| 625 |
-
|
| 626 |
-
# Scatter 2 : ratio longueur vs score d'ancrage (moteurs)
|
| 627 |
-
ratio_vs_anchor = []
|
| 628 |
-
for report in benchmark.engine_reports:
|
| 629 |
-
if report.aggregated_hallucination:
|
| 630 |
-
ratio_vs_anchor.append({
|
| 631 |
-
"engine": report.engine_name,
|
| 632 |
-
"length_ratio": _safe(report.aggregated_hallucination.get("length_ratio_mean", 1.0)),
|
| 633 |
-
"anchor_score": _safe(report.aggregated_hallucination.get("anchor_score_mean", 1.0)),
|
| 634 |
-
"hallucinating_rate": _safe(report.aggregated_hallucination.get("hallucinating_doc_rate", 0.0)),
|
| 635 |
-
"is_vlm": report.pipeline_info.get("is_vlm", False) if report.pipeline_info else False,
|
| 636 |
-
})
|
| 637 |
|
| 638 |
-
|
| 639 |
-
"meta": {
|
| 640 |
-
"corpus_name": benchmark.corpus_name,
|
| 641 |
-
"corpus_source": benchmark.corpus_source,
|
| 642 |
-
"document_count": benchmark.document_count,
|
| 643 |
-
"run_date": benchmark.run_date,
|
| 644 |
-
"picarones_version": benchmark.picarones_version,
|
| 645 |
-
"metadata": benchmark.metadata,
|
| 646 |
-
},
|
| 647 |
-
"ranking": benchmark.ranking(),
|
| 648 |
-
"engines": engines_summary,
|
| 649 |
-
"documents": documents,
|
| 650 |
-
# Sprint 7
|
| 651 |
-
"statistics": {
|
| 652 |
-
"pairwise_wilcoxon": pairwise_stats,
|
| 653 |
-
"bootstrap_cis": bootstrap_cis,
|
| 654 |
-
# Sprint 17 — Friedman multi-moteurs + post-hoc Nemenyi + CDD
|
| 655 |
-
"friedman": friedman,
|
| 656 |
-
"nemenyi": nemenyi,
|
| 657 |
-
},
|
| 658 |
-
"reliability_curves": reliability_curves,
|
| 659 |
-
"venn_data": venn_data,
|
| 660 |
-
"error_clusters": error_clusters,
|
| 661 |
-
"correlation_per_engine": correlation_per_engine,
|
| 662 |
-
# Sprint 10
|
| 663 |
-
"gini_vs_cer": gini_vs_cer,
|
| 664 |
-
"ratio_vs_anchor": ratio_vs_anchor,
|
| 665 |
-
# Sprint 19 — vue Pareto coût/qualité avec variantes d'axe
|
| 666 |
-
"pareto": pareto_data,
|
| 667 |
-
# Sprint 36 — analyse inter-moteurs (divergence taxonomique +
|
| 668 |
-
# complémentarité / oracle). ``None`` si moins de 2 moteurs.
|
| 669 |
-
"inter_engine_analysis": benchmark.inter_engine_analysis,
|
| 670 |
-
# Sprint 45-46 — stratification par script_type
|
| 671 |
-
"available_strata": benchmark.available_strata(),
|
| 672 |
-
"stratified_ranking": benchmark.stratified_ranking() or None,
|
| 673 |
-
"corpus_homogeneity": benchmark.corpus_homogeneity(),
|
| 674 |
-
}
|
| 675 |
|
| 676 |
|
| 677 |
# ---------------------------------------------------------------------------
|
|
@@ -691,8 +75,8 @@ def _build_jinja_env():
|
|
| 691 |
Autoescape désactivé : le comportement est équivalent à celui du
|
| 692 |
``_HTML_TEMPLATE.format()`` historique. Les variables injectées
|
| 693 |
(JSON embarqué, SVG généré, synthèse narrative issue de templates
|
| 694 |
-
internes) sont toutes produites par le code Picarones et ne
|
| 695 |
-
pas d'échappement HTML.
|
| 696 |
"""
|
| 697 |
from jinja2 import Environment, FileSystemLoader
|
| 698 |
env = Environment(
|
|
@@ -834,174 +218,158 @@ class ReportGenerator:
|
|
| 834 |
glossary = load_glossary(self.lang)
|
| 835 |
glossary_json = json.dumps(glossary, ensure_ascii=False, separators=(",", ":"))
|
| 836 |
|
| 837 |
-
|
| 838 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 839 |
from picarones.report.inter_engine_render import (
|
| 840 |
build_divergence_matrix_html,
|
| 841 |
build_oracle_gap_html,
|
| 842 |
)
|
| 843 |
-
|
| 844 |
-
report_data.get("inter_engine_analysis"),
|
| 845 |
-
labels=labels,
|
| 846 |
-
)
|
| 847 |
-
oracle_gap_html = build_oracle_gap_html(
|
| 848 |
-
report_data.get("inter_engine_analysis"),
|
| 849 |
-
labels=labels,
|
| 850 |
-
)
|
| 851 |
-
|
| 852 |
-
# Sprint 41 — section NER (résumé F1 par moteur + heatmap par
|
| 853 |
-
# catégorie). Vide si aucun moteur n'a de aggregated_ner.
|
| 854 |
from picarones.report.ner_render import (
|
| 855 |
build_ner_per_category_html,
|
| 856 |
build_ner_summary_html,
|
| 857 |
)
|
| 858 |
-
ner_summary_html = build_ner_summary_html(
|
| 859 |
-
report_data.get("engines", []),
|
| 860 |
-
labels=labels,
|
| 861 |
-
)
|
| 862 |
-
ner_per_category_html = build_ner_per_category_html(
|
| 863 |
-
report_data.get("engines", []),
|
| 864 |
-
labels=labels,
|
| 865 |
-
)
|
| 866 |
-
|
| 867 |
# Sprint 43 — section calibration (tableau ECE/MCE + grille de
|
| 868 |
-
# reliability diagrams par moteur).
|
| 869 |
-
# de aggregated_calibration.
|
| 870 |
from picarones.report.calibration_render import (
|
| 871 |
build_calibration_summary_html,
|
| 872 |
build_reliability_diagrams_grid_html,
|
| 873 |
)
|
| 874 |
-
|
| 875 |
-
report_data.get("engines", []),
|
| 876 |
-
labels=labels,
|
| 877 |
-
)
|
| 878 |
-
reliability_diagrams_html = build_reliability_diagrams_grid_html(
|
| 879 |
-
report_data.get("engines", []),
|
| 880 |
-
labels=labels,
|
| 881 |
-
)
|
| 882 |
-
|
| 883 |
-
# Sprint 46 — section stratifiée (tableau par strate). Vide si
|
| 884 |
-
# aucune strate disponible.
|
| 885 |
from picarones.report.stratification_render import (
|
| 886 |
build_stratified_ranking_html,
|
| 887 |
)
|
| 888 |
-
|
| 889 |
-
report_data.get("stratified_ranking"),
|
| 890 |
-
report_data.get("available_strata"),
|
| 891 |
-
report_data.get("corpus_homogeneity"),
|
| 892 |
-
labels=labels,
|
| 893 |
-
)
|
| 894 |
-
|
| 895 |
-
# Sprint 62 — profil philologique (6 sections adaptive sur les
|
| 896 |
-
# modules philologiques Sprints 55-60). Vide si aucun moteur
|
| 897 |
-
# n'a de aggregated_philological.
|
| 898 |
from picarones.report.philological_render import (
|
| 899 |
build_philological_profile_html,
|
| 900 |
)
|
| 901 |
-
|
| 902 |
-
report_data.get("engines", []),
|
| 903 |
-
labels=labels,
|
| 904 |
-
)
|
| 905 |
-
|
| 906 |
-
# Sprint 86 — A.II.5 : recherchabilité fuzzy +
|
| 907 |
-
# séquences numériques. Adaptive : "" si aucun signal.
|
| 908 |
from picarones.report.searchability_render import (
|
| 909 |
build_searchability_summary_html,
|
| 910 |
)
|
| 911 |
from picarones.report.numerical_sequences_render import (
|
| 912 |
build_numerical_sequences_html,
|
| 913 |
)
|
| 914 |
-
searchability_html = build_searchability_summary_html(
|
| 915 |
-
report_data.get("engines", []), labels=labels,
|
| 916 |
-
)
|
| 917 |
-
numerical_sequences_html = build_numerical_sequences_html(
|
| 918 |
-
report_data.get("engines", []), labels=labels,
|
| 919 |
-
)
|
| 920 |
-
|
| 921 |
# Sprint 87 — A.II.2 : lisibilité (delta Flesch).
|
| 922 |
-
# Adaptive : "" si aucun moteur n'a de signal.
|
| 923 |
from picarones.report.readability_render import (
|
| 924 |
build_readability_summary_html,
|
| 925 |
)
|
| 926 |
-
readability_html = build_readability_summary_html(
|
| 927 |
-
report_data.get("engines", []), labels=labels,
|
| 928 |
-
)
|
| 929 |
-
|
| 930 |
# Sprint 89 — A.II.8b : spécialisation inter-moteurs.
|
| 931 |
-
# Adaptive : "" si moins de 2 moteurs avec taxonomie.
|
| 932 |
from picarones.report.specialization_render import (
|
| 933 |
build_specialization_html,
|
| 934 |
)
|
| 935 |
-
#
|
| 936 |
-
|
| 937 |
-
|
| 938 |
-
|
| 939 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 940 |
tax = eng.get("aggregated_taxonomy")
|
| 941 |
if isinstance(tax, dict):
|
| 942 |
counts = tax.get("counts") if "counts" in tax else tax
|
| 943 |
if isinstance(counts, dict) and counts:
|
| 944 |
-
|
| 945 |
k: float(v) for k, v in counts.items()
|
| 946 |
if isinstance(v, (int, float))
|
| 947 |
}
|
| 948 |
-
specialization_html = build_specialization_html(
|
| 949 |
-
_taxos, labels=labels,
|
| 950 |
-
)
|
| 951 |
|
| 952 |
-
|
| 953 |
-
|
| 954 |
-
|
| 955 |
-
|
| 956 |
-
|
| 957 |
-
|
| 958 |
-
|
| 959 |
-
|
| 960 |
-
|
| 961 |
-
|
| 962 |
-
|
| 963 |
-
|
| 964 |
-
|
| 965 |
-
|
| 966 |
-
|
| 967 |
-
|
| 968 |
-
|
| 969 |
-
|
| 970 |
-
|
| 971 |
-
|
| 972 |
-
|
| 973 |
-
|
| 974 |
-
|
| 975 |
-
|
| 976 |
-
|
| 977 |
-
|
| 978 |
-
|
| 979 |
-
|
| 980 |
-
|
| 981 |
-
|
| 982 |
-
|
| 983 |
-
|
| 984 |
-
|
| 985 |
-
|
| 986 |
-
|
| 987 |
-
|
| 988 |
-
|
| 989 |
-
|
| 990 |
-
|
| 991 |
-
|
| 992 |
-
|
| 993 |
-
|
| 994 |
-
numerical_sequences_html=numerical_sequences_html,
|
| 995 |
-
readability_html=readability_html,
|
| 996 |
-
specialization_html=specialization_html,
|
| 997 |
# Chantier 3 — vues thématiques composées
|
| 998 |
-
economics_view_html
|
| 999 |
-
|
| 1000 |
-
|
| 1001 |
-
|
| 1002 |
-
|
| 1003 |
-
|
| 1004 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1005 |
|
| 1006 |
@classmethod
|
| 1007 |
def from_json(cls, json_path: str | Path, **kwargs) -> "ReportGenerator":
|
|
|
|
| 11 |
2. Galerie — grille d'images avec badge CER coloré
|
| 12 |
3. Document — image zoomable + diff coloré GT / OCR par moteur
|
| 13 |
4. Analyses — histogramme CER + graphique radar
|
| 14 |
+
|
| 15 |
+
Architecture
|
| 16 |
+
------------
|
| 17 |
+
Ce module est l'**orchestrateur**. Les responsabilités lourdes sont
|
| 18 |
+
découpées en sous-modules :
|
| 19 |
+
|
| 20 |
+
- :mod:`picarones.report.assets` — chargement vendor.js, encodage
|
| 21 |
+
base64 d'images, externalisation lazy.
|
| 22 |
+
- :mod:`picarones.report.report_data` — construction du dict JSON
|
| 23 |
+
passé au template (engines, documents, statistiques, Pareto, etc.).
|
| 24 |
+
- :mod:`picarones.report.render_helpers` — couleurs / SVG mutualisés.
|
| 25 |
+
|
| 26 |
+
Les noms ``_build_report_data``, ``_cer_color``, ``_cer_bg``,
|
| 27 |
+
``_externalize_images_to_dir``, ``_encode_image_b64``,
|
| 28 |
+
``_encode_images_b64_from_result``, ``_load_vendor_js``, ``_pct``,
|
| 29 |
+
``_safe`` sont conservés en alias rétrocompat — plusieurs tests les
|
| 30 |
+
importent directement.
|
| 31 |
"""
|
| 32 |
|
| 33 |
from __future__ import annotations
|
| 34 |
|
|
|
|
|
|
|
| 35 |
import json
|
| 36 |
import logging
|
| 37 |
from pathlib import Path
|
| 38 |
from typing import Any, Optional
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
from picarones.core.results import BenchmarkResult
|
| 41 |
+
from picarones.measurements.statistics import build_critical_difference_svg
|
| 42 |
+
from picarones.report.assets import (
|
| 43 |
+
encode_image_b64 as _encode_image_b64,
|
| 44 |
+
encode_images_b64_from_result as _encode_images_b64_from_result,
|
| 45 |
+
externalize_images_to_dir as _externalize_images_to_dir,
|
| 46 |
+
load_vendor_js as _load_vendor_js,
|
| 47 |
+
)
|
| 48 |
+
from picarones.report.render_helpers import (
|
| 49 |
+
cer_step_bg as _cer_bg,
|
| 50 |
+
cer_step_color as _cer_color,
|
| 51 |
+
)
|
| 52 |
+
from picarones.report.report_data import build_report_data as _build_report_data
|
| 53 |
+
from picarones.report.report_data._helpers import (
|
| 54 |
+
percent_string as _pct,
|
| 55 |
+
safe_round as _safe,
|
| 56 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
+
logger = logging.getLogger(__name__)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
|
| 61 |
# ---------------------------------------------------------------------------
|
|
|
|
| 75 |
Autoescape désactivé : le comportement est équivalent à celui du
|
| 76 |
``_HTML_TEMPLATE.format()`` historique. Les variables injectées
|
| 77 |
(JSON embarqué, SVG généré, synthèse narrative issue de templates
|
| 78 |
+
internes) sont toutes produites par le code Picarones et ne
|
| 79 |
+
nécessitent pas d'échappement HTML.
|
| 80 |
"""
|
| 81 |
from jinja2 import Environment, FileSystemLoader
|
| 82 |
env = Environment(
|
|
|
|
| 218 |
glossary = load_glossary(self.lang)
|
| 219 |
glossary_json = json.dumps(glossary, ensure_ascii=False, separators=(",", ":"))
|
| 220 |
|
| 221 |
+
section_html = self._build_section_html(report_data, labels)
|
| 222 |
+
|
| 223 |
+
env = _build_jinja_env()
|
| 224 |
+
template = env.get_template("base.html.j2")
|
| 225 |
+
html = template.render(
|
| 226 |
+
corpus_name=self.benchmark.corpus_name,
|
| 227 |
+
picarones_version=self.benchmark.picarones_version,
|
| 228 |
+
report_data_json=report_json,
|
| 229 |
+
i18n_json=i18n_json,
|
| 230 |
+
html_lang=labels.get("html_lang", "fr"),
|
| 231 |
+
chartjs_inline=chartjs_js,
|
| 232 |
+
critical_difference_svg=cdd_svg,
|
| 233 |
+
friedman=report_data.get("statistics", {}).get("friedman", {}),
|
| 234 |
+
synthesis=synthesis,
|
| 235 |
+
glossary_json=glossary_json,
|
| 236 |
+
**section_html,
|
| 237 |
+
)
|
| 238 |
+
|
| 239 |
+
output_path.write_text(html, encoding="utf-8")
|
| 240 |
+
return output_path.resolve()
|
| 241 |
+
|
| 242 |
+
def _build_section_html(
|
| 243 |
+
self, report_data: dict, labels: dict[str, str],
|
| 244 |
+
) -> dict[str, str]:
|
| 245 |
+
"""Construit toutes les sections HTML conditionnelles du rapport.
|
| 246 |
+
|
| 247 |
+
Chaque renderer (NER, calibration, philologie, etc.) est appelé
|
| 248 |
+
de manière indépendante. Une section retourne ``""`` si aucun
|
| 249 |
+
moteur n'a de signal pour elle — le template gère l'affichage
|
| 250 |
+
conditionnel.
|
| 251 |
+
|
| 252 |
+
Returns
|
| 253 |
+
-------
|
| 254 |
+
dict[str, str]
|
| 255 |
+
Map ``{nom_de_section: html}`` à splatter dans
|
| 256 |
+
``template.render(**section_html)``.
|
| 257 |
+
"""
|
| 258 |
+
engines = report_data.get("engines", [])
|
| 259 |
+
|
| 260 |
+
# Sprint 37 — section inter-moteurs (matrice de divergence + oracle).
|
| 261 |
from picarones.report.inter_engine_render import (
|
| 262 |
build_divergence_matrix_html,
|
| 263 |
build_oracle_gap_html,
|
| 264 |
)
|
| 265 |
+
# Sprint 41 — section NER (résumé F1 par moteur + heatmap par catégorie).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 266 |
from picarones.report.ner_render import (
|
| 267 |
build_ner_per_category_html,
|
| 268 |
build_ner_summary_html,
|
| 269 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 270 |
# Sprint 43 — section calibration (tableau ECE/MCE + grille de
|
| 271 |
+
# reliability diagrams par moteur).
|
|
|
|
| 272 |
from picarones.report.calibration_render import (
|
| 273 |
build_calibration_summary_html,
|
| 274 |
build_reliability_diagrams_grid_html,
|
| 275 |
)
|
| 276 |
+
# Sprint 46 — section stratifiée (tableau par strate).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 277 |
from picarones.report.stratification_render import (
|
| 278 |
build_stratified_ranking_html,
|
| 279 |
)
|
| 280 |
+
# Sprint 62 — profil philologique (6 sections adaptive).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 281 |
from picarones.report.philological_render import (
|
| 282 |
build_philological_profile_html,
|
| 283 |
)
|
| 284 |
+
# Sprint 86 — A.II.5 : recherchabilité fuzzy + séquences numériques.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 285 |
from picarones.report.searchability_render import (
|
| 286 |
build_searchability_summary_html,
|
| 287 |
)
|
| 288 |
from picarones.report.numerical_sequences_render import (
|
| 289 |
build_numerical_sequences_html,
|
| 290 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 291 |
# Sprint 87 — A.II.2 : lisibilité (delta Flesch).
|
|
|
|
| 292 |
from picarones.report.readability_render import (
|
| 293 |
build_readability_summary_html,
|
| 294 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 295 |
# Sprint 89 — A.II.8b : spécialisation inter-moteurs.
|
|
|
|
| 296 |
from picarones.report.specialization_render import (
|
| 297 |
build_specialization_html,
|
| 298 |
)
|
| 299 |
+
# Chantier 3 (post-Sprint 97) — 3 vues thématiques composées.
|
| 300 |
+
from picarones.report.views import (
|
| 301 |
+
build_advanced_taxonomy_view_html,
|
| 302 |
+
build_diagnostics_view_html,
|
| 303 |
+
build_economics_view_html,
|
| 304 |
+
)
|
| 305 |
+
|
| 306 |
+
# Spécialisation : construit une map {engine: counts} depuis les
|
| 307 |
+
# ``aggregated_taxonomy`` ; un moteur sans taxonomie est exclu.
|
| 308 |
+
taxos: dict = {}
|
| 309 |
+
for eng in engines:
|
| 310 |
tax = eng.get("aggregated_taxonomy")
|
| 311 |
if isinstance(tax, dict):
|
| 312 |
counts = tax.get("counts") if "counts" in tax else tax
|
| 313 |
if isinstance(counts, dict) and counts:
|
| 314 |
+
taxos[eng.get("name", "?")] = {
|
| 315 |
k: float(v) for k, v in counts.items()
|
| 316 |
if isinstance(v, (int, float))
|
| 317 |
}
|
|
|
|
|
|
|
|
|
|
| 318 |
|
| 319 |
+
return {
|
| 320 |
+
# Sprint 37
|
| 321 |
+
"divergence_matrix_html": build_divergence_matrix_html(
|
| 322 |
+
report_data.get("inter_engine_analysis"), labels=labels,
|
| 323 |
+
),
|
| 324 |
+
"oracle_gap_html": build_oracle_gap_html(
|
| 325 |
+
report_data.get("inter_engine_analysis"), labels=labels,
|
| 326 |
+
),
|
| 327 |
+
# Sprint 41
|
| 328 |
+
"ner_summary_html": build_ner_summary_html(engines, labels=labels),
|
| 329 |
+
"ner_per_category_html": build_ner_per_category_html(engines, labels=labels),
|
| 330 |
+
# Sprint 43
|
| 331 |
+
"calibration_summary_html": build_calibration_summary_html(
|
| 332 |
+
engines, labels=labels,
|
| 333 |
+
),
|
| 334 |
+
"reliability_diagrams_html": build_reliability_diagrams_grid_html(
|
| 335 |
+
engines, labels=labels,
|
| 336 |
+
),
|
| 337 |
+
# Sprint 46
|
| 338 |
+
"stratified_ranking_html": build_stratified_ranking_html(
|
| 339 |
+
report_data.get("stratified_ranking"),
|
| 340 |
+
report_data.get("available_strata"),
|
| 341 |
+
report_data.get("corpus_homogeneity"),
|
| 342 |
+
labels=labels,
|
| 343 |
+
),
|
| 344 |
+
# Sprint 62
|
| 345 |
+
"philological_profile_html": build_philological_profile_html(
|
| 346 |
+
engines, labels=labels,
|
| 347 |
+
),
|
| 348 |
+
# Sprint 86
|
| 349 |
+
"searchability_html": build_searchability_summary_html(
|
| 350 |
+
engines, labels=labels,
|
| 351 |
+
),
|
| 352 |
+
"numerical_sequences_html": build_numerical_sequences_html(
|
| 353 |
+
engines, labels=labels,
|
| 354 |
+
),
|
| 355 |
+
# Sprint 87
|
| 356 |
+
"readability_html": build_readability_summary_html(
|
| 357 |
+
engines, labels=labels,
|
| 358 |
+
),
|
| 359 |
+
# Sprint 89
|
| 360 |
+
"specialization_html": build_specialization_html(taxos, labels=labels),
|
|
|
|
|
|
|
|
|
|
| 361 |
# Chantier 3 — vues thématiques composées
|
| 362 |
+
"economics_view_html": build_economics_view_html(
|
| 363 |
+
report_data, labels=labels,
|
| 364 |
+
engine_reports=self.benchmark.engine_reports,
|
| 365 |
+
),
|
| 366 |
+
"advanced_taxonomy_view_html": build_advanced_taxonomy_view_html(
|
| 367 |
+
report_data, labels=labels,
|
| 368 |
+
),
|
| 369 |
+
"diagnostics_view_html": build_diagnostics_view_html(
|
| 370 |
+
report_data, labels=labels,
|
| 371 |
+
),
|
| 372 |
+
}
|
| 373 |
|
| 374 |
@classmethod
|
| 375 |
def from_json(cls, json_path: str | Path, **kwargs) -> "ReportGenerator":
|
|
@@ -187,6 +187,60 @@ def text_color_for_bg(intensity: float, *, threshold: float = 0.55) -> str:
|
|
| 187 |
return "#fff" if intensity > threshold else "#222"
|
| 188 |
|
| 189 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
# ──────────────────────────────────────────────────────────────────
|
| 191 |
# API publique : grille SVG
|
| 192 |
# ──────────────────────────────────────────────────────────────────
|
|
@@ -328,6 +382,8 @@ __all__ = [
|
|
| 328 |
"DIVERGING_NEGATIVE_RGB",
|
| 329 |
"DIVERGING_NEUTRAL_RGB",
|
| 330 |
"DIVERGING_POSITIVE_RGB",
|
|
|
|
|
|
|
| 331 |
"color_traffic_light",
|
| 332 |
"color_single_gradient",
|
| 333 |
"color_diverging",
|
|
|
|
| 187 |
return "#fff" if intensity > threshold else "#222"
|
| 188 |
|
| 189 |
|
| 190 |
+
# ──────────────────────────────────────────────────────────────────
|
| 191 |
+
# API publique : barème CER par paliers (badges du rapport)
|
| 192 |
+
# ──────────────────────────────────────────────────────────────────
|
| 193 |
+
#
|
| 194 |
+
# Les badges de qualité du rapport (galerie, tableau de classement)
|
| 195 |
+
# n'utilisent pas un dégradé continu mais un barème discret à 4
|
| 196 |
+
# paliers calibrés sur les seuils éditoriaux usuels :
|
| 197 |
+
#
|
| 198 |
+
# < 5 % : vert (qualité publication directe)
|
| 199 |
+
# < 15 % : jaune (relecture humaine légère)
|
| 200 |
+
# < 30 % : orange (relecture humaine systématique)
|
| 201 |
+
# ≥ 30 % : rouge (catastrophique, à reprendre)
|
| 202 |
+
#
|
| 203 |
+
# Les couleurs sont importées de :mod:`picarones.report.colors`
|
| 204 |
+
# (palette Okabe-Ito daltonien-friendly active par défaut).
|
| 205 |
+
|
| 206 |
+
|
| 207 |
+
def cer_step_color(cer: float) -> str:
|
| 208 |
+
"""Couleur de texte CSS pour un score CER, par paliers.
|
| 209 |
+
|
| 210 |
+
Voir le barème dans le bloc de documentation ci-dessus.
|
| 211 |
+
"""
|
| 212 |
+
from picarones.report.colors import (
|
| 213 |
+
COLOR_GREEN,
|
| 214 |
+
COLOR_ORANGE,
|
| 215 |
+
COLOR_RED,
|
| 216 |
+
COLOR_YELLOW,
|
| 217 |
+
)
|
| 218 |
+
if cer < 0.05:
|
| 219 |
+
return COLOR_GREEN
|
| 220 |
+
if cer < 0.15:
|
| 221 |
+
return COLOR_YELLOW
|
| 222 |
+
if cer < 0.30:
|
| 223 |
+
return COLOR_ORANGE
|
| 224 |
+
return COLOR_RED
|
| 225 |
+
|
| 226 |
+
|
| 227 |
+
def cer_step_bg(cer: float) -> str:
|
| 228 |
+
"""Couleur de fond CSS associée à :func:`cer_step_color`."""
|
| 229 |
+
from picarones.report.colors import (
|
| 230 |
+
BG_GREEN,
|
| 231 |
+
BG_ORANGE,
|
| 232 |
+
BG_RED,
|
| 233 |
+
BG_YELLOW,
|
| 234 |
+
)
|
| 235 |
+
if cer < 0.05:
|
| 236 |
+
return BG_GREEN
|
| 237 |
+
if cer < 0.15:
|
| 238 |
+
return BG_YELLOW
|
| 239 |
+
if cer < 0.30:
|
| 240 |
+
return BG_ORANGE
|
| 241 |
+
return BG_RED
|
| 242 |
+
|
| 243 |
+
|
| 244 |
# ──────────────────────────────────────────────────────────────────
|
| 245 |
# API publique : grille SVG
|
| 246 |
# ──────────────────────────────────────────────────────────────────
|
|
|
|
| 382 |
"DIVERGING_NEGATIVE_RGB",
|
| 383 |
"DIVERGING_NEUTRAL_RGB",
|
| 384 |
"DIVERGING_POSITIVE_RGB",
|
| 385 |
+
"cer_step_color",
|
| 386 |
+
"cer_step_bg",
|
| 387 |
"color_traffic_light",
|
| 388 |
"color_single_gradient",
|
| 389 |
"color_diverging",
|
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Construction du dict de données consommé par le template Jinja.
|
| 2 |
+
|
| 3 |
+
Avant le découpage, ``picarones.report.generator._build_report_data``
|
| 4 |
+
faisait 463 lignes pour transformer un :class:`BenchmarkResult` en
|
| 5 |
+
dict prêt pour Jinja. Cette fonction empilait par sprint des blocs
|
| 6 |
+
indépendants — engines, documents, statistiques, scatter plots,
|
| 7 |
+
front Pareto, etc.
|
| 8 |
+
|
| 9 |
+
Ce sous-package éclate la construction en modules thématiques :
|
| 10 |
+
|
| 11 |
+
- :mod:`engines` — résumé par moteur (``engines_summary``).
|
| 12 |
+
- :mod:`documents` — vue galerie + détail + difficulté Sprint 7.
|
| 13 |
+
- :mod:`statistics` — Wilcoxon, Friedman, Nemenyi, bootstrap CIs,
|
| 14 |
+
reliability curves, Venn, error clusters, corrélations.
|
| 15 |
+
- :mod:`scatter` — Sprint 10 : Gini vs CER, ratio vs anchor.
|
| 16 |
+
- :mod:`pareto` — Sprint 19 : 3 fronts Pareto + métadonnées pricing.
|
| 17 |
+
|
| 18 |
+
L'API publique :func:`build_report_data` orchestre ces modules dans
|
| 19 |
+
le bon ordre (les coûts du module Pareto enrichissent en place le
|
| 20 |
+
``engines_summary`` produit par :mod:`engines`).
|
| 21 |
+
"""
|
| 22 |
+
|
| 23 |
+
from __future__ import annotations
|
| 24 |
+
|
| 25 |
+
from typing import TYPE_CHECKING
|
| 26 |
+
|
| 27 |
+
if TYPE_CHECKING:
|
| 28 |
+
from picarones.core.results import BenchmarkResult
|
| 29 |
+
|
| 30 |
+
from picarones.report.report_data.documents import (
|
| 31 |
+
annotate_documents_with_difficulty,
|
| 32 |
+
build_documents,
|
| 33 |
+
)
|
| 34 |
+
from picarones.report.report_data.engines import build_engines_summary
|
| 35 |
+
from picarones.report.report_data.pareto import build_pareto_section
|
| 36 |
+
from picarones.report.report_data.scatter import (
|
| 37 |
+
build_gini_vs_cer,
|
| 38 |
+
build_ratio_vs_anchor,
|
| 39 |
+
)
|
| 40 |
+
from picarones.report.report_data.statistics import (
|
| 41 |
+
build_bootstrap_cis,
|
| 42 |
+
build_correlation_per_engine,
|
| 43 |
+
build_error_clusters,
|
| 44 |
+
build_friedman_and_nemenyi,
|
| 45 |
+
build_pairwise_wilcoxon,
|
| 46 |
+
build_reliability_curves,
|
| 47 |
+
build_venn_data,
|
| 48 |
+
)
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
def build_report_data(
|
| 52 |
+
benchmark: "BenchmarkResult", images_b64: dict[str, str],
|
| 53 |
+
) -> dict:
|
| 54 |
+
"""Transforme un :class:`BenchmarkResult` en dict pour le rapport HTML.
|
| 55 |
+
|
| 56 |
+
L'ordre est important : :mod:`pareto` lit et enrichit en place
|
| 57 |
+
le ``engines_summary`` produit par :mod:`engines`.
|
| 58 |
+
"""
|
| 59 |
+
engines_summary = build_engines_summary(benchmark)
|
| 60 |
+
documents = build_documents(benchmark, images_b64)
|
| 61 |
+
annotate_documents_with_difficulty(benchmark, documents)
|
| 62 |
+
|
| 63 |
+
pareto_data = build_pareto_section(engines_summary, benchmark)
|
| 64 |
+
|
| 65 |
+
return {
|
| 66 |
+
"meta": {
|
| 67 |
+
"corpus_name": benchmark.corpus_name,
|
| 68 |
+
"corpus_source": benchmark.corpus_source,
|
| 69 |
+
"document_count": benchmark.document_count,
|
| 70 |
+
"run_date": benchmark.run_date,
|
| 71 |
+
"picarones_version": benchmark.picarones_version,
|
| 72 |
+
"metadata": benchmark.metadata,
|
| 73 |
+
},
|
| 74 |
+
"ranking": benchmark.ranking(),
|
| 75 |
+
"engines": engines_summary,
|
| 76 |
+
"documents": documents,
|
| 77 |
+
# Sprint 7
|
| 78 |
+
"statistics": {
|
| 79 |
+
"pairwise_wilcoxon": build_pairwise_wilcoxon(benchmark),
|
| 80 |
+
"bootstrap_cis": build_bootstrap_cis(benchmark),
|
| 81 |
+
**build_friedman_and_nemenyi(benchmark),
|
| 82 |
+
},
|
| 83 |
+
"reliability_curves": build_reliability_curves(benchmark),
|
| 84 |
+
"venn_data": build_venn_data(benchmark),
|
| 85 |
+
"error_clusters": build_error_clusters(benchmark),
|
| 86 |
+
"correlation_per_engine": build_correlation_per_engine(benchmark),
|
| 87 |
+
# Sprint 10
|
| 88 |
+
"gini_vs_cer": build_gini_vs_cer(benchmark),
|
| 89 |
+
"ratio_vs_anchor": build_ratio_vs_anchor(benchmark),
|
| 90 |
+
# Sprint 19 — vue Pareto coût/qualité avec variantes d'axe
|
| 91 |
+
"pareto": pareto_data,
|
| 92 |
+
# Sprint 36 — analyse inter-moteurs (divergence taxonomique +
|
| 93 |
+
# complémentarité / oracle). ``None`` si moins de 2 moteurs.
|
| 94 |
+
"inter_engine_analysis": benchmark.inter_engine_analysis,
|
| 95 |
+
# Sprint 45-46 — stratification par script_type
|
| 96 |
+
"available_strata": benchmark.available_strata(),
|
| 97 |
+
"stratified_ranking": benchmark.stratified_ranking() or None,
|
| 98 |
+
"corpus_homogeneity": benchmark.corpus_homogeneity(),
|
| 99 |
+
}
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
__all__ = ["build_report_data"]
|
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Helpers numériques internes au sous-package report_data.
|
| 2 |
+
|
| 3 |
+
Petites fonctions utilitaires partagées par tous les builders de
|
| 4 |
+
sections (engines, documents, statistics, scatter, pareto). Ne pas
|
| 5 |
+
importer depuis l'extérieur du sous-package — ces helpers sont
|
| 6 |
+
spécifiques aux conventions du dict JSON consommé par le template.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
from __future__ import annotations
|
| 10 |
+
|
| 11 |
+
from typing import Optional
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def safe_round(v: Optional[float], decimals: int = 4) -> float:
|
| 15 |
+
"""Arrondit un float optionnel ; ``None`` devient ``0.0``."""
|
| 16 |
+
return round(v or 0.0, decimals)
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def percent_string(v: Optional[float], decimals: int = 2) -> str:
|
| 20 |
+
"""Formate un ratio ∈ [0, 1] en chaîne pourcentage : ``0.4723 → "47.23 %"``.
|
| 21 |
+
|
| 22 |
+
``None`` → ``"—"``. Conservé pour rétrocompat avec d'éventuels
|
| 23 |
+
callers externes (Sprint 7 historique).
|
| 24 |
+
"""
|
| 25 |
+
if v is None:
|
| 26 |
+
return "—"
|
| 27 |
+
return f"{v * 100:.{decimals}f} %"
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
__all__ = ["safe_round", "percent_string"]
|
|
@@ -0,0 +1,167 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Construction de la liste ``documents`` (vue galerie + vue détail).
|
| 2 |
+
|
| 3 |
+
Pour chaque document du corpus, agrège les hypothèses de tous les
|
| 4 |
+
moteurs avec leurs métriques, le diff caractère par caractère, et
|
| 5 |
+
les champs spécifiques aux pipelines OCR+LLM (intermédiaire, mode,
|
| 6 |
+
sur-normalisation).
|
| 7 |
+
|
| 8 |
+
:func:`annotate_documents_with_difficulty` enrichit ensuite chaque
|
| 9 |
+
document avec son score de difficulté intrinsèque (Sprint 7).
|
| 10 |
+
"""
|
| 11 |
+
|
| 12 |
+
from __future__ import annotations
|
| 13 |
+
|
| 14 |
+
from typing import TYPE_CHECKING
|
| 15 |
+
|
| 16 |
+
from picarones.core.diff_utils import compute_char_diff, compute_word_diff
|
| 17 |
+
from picarones.measurements.difficulty import (
|
| 18 |
+
compute_all_difficulties,
|
| 19 |
+
difficulty_label,
|
| 20 |
+
)
|
| 21 |
+
from picarones.report.report_data._helpers import safe_round
|
| 22 |
+
|
| 23 |
+
if TYPE_CHECKING:
|
| 24 |
+
from picarones.core.results import BenchmarkResult
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
def build_documents(
|
| 28 |
+
benchmark: "BenchmarkResult", images_b64: dict[str, str],
|
| 29 |
+
) -> list[dict]:
|
| 30 |
+
"""Retourne la liste ordonnée des documents prêts pour le template.
|
| 31 |
+
|
| 32 |
+
L'ordre des documents préserve l'ordre d'apparition (premier moteur
|
| 33 |
+
d'abord, puis compléments depuis les moteurs suivants si certains
|
| 34 |
+
documents ne sont pas couverts par tous les moteurs).
|
| 35 |
+
"""
|
| 36 |
+
seen_doc_ids: set[str] = set()
|
| 37 |
+
doc_ids_ordered: list[str] = []
|
| 38 |
+
for report in benchmark.engine_reports:
|
| 39 |
+
for dr in report.document_results:
|
| 40 |
+
if dr.doc_id not in seen_doc_ids:
|
| 41 |
+
seen_doc_ids.add(dr.doc_id)
|
| 42 |
+
doc_ids_ordered.append(dr.doc_id)
|
| 43 |
+
|
| 44 |
+
# Index croisé : doc_id → {engine_name → DocumentResult}
|
| 45 |
+
doc_engine_map: dict[str, dict] = {did: {} for did in doc_ids_ordered}
|
| 46 |
+
for report in benchmark.engine_reports:
|
| 47 |
+
for dr in report.document_results:
|
| 48 |
+
doc_engine_map.setdefault(dr.doc_id, {})[report.engine_name] = dr
|
| 49 |
+
|
| 50 |
+
documents: list[dict] = []
|
| 51 |
+
engine_names = [r.engine_name for r in benchmark.engine_reports]
|
| 52 |
+
for doc_id in doc_ids_ordered:
|
| 53 |
+
engine_results: list[dict] = []
|
| 54 |
+
gt = ""
|
| 55 |
+
image_path = ""
|
| 56 |
+
for engine_name in engine_names:
|
| 57 |
+
dr = doc_engine_map[doc_id].get(engine_name)
|
| 58 |
+
if dr is None:
|
| 59 |
+
continue
|
| 60 |
+
gt = dr.ground_truth
|
| 61 |
+
image_path = dr.image_path
|
| 62 |
+
er_entry = _build_engine_result_entry(engine_name, dr)
|
| 63 |
+
engine_results.append(er_entry)
|
| 64 |
+
|
| 65 |
+
# CER moyen sur ce document (pour le badge galerie)
|
| 66 |
+
cer_values = [er["cer"] for er in engine_results if er["error"] is None]
|
| 67 |
+
mean_cer = sum(cer_values) / len(cer_values) if cer_values else 1.0
|
| 68 |
+
best_engine = min(engine_results, key=lambda x: x["cer"], default=None)
|
| 69 |
+
|
| 70 |
+
# Script type (depuis metadata par document si disponible)
|
| 71 |
+
script_type = ""
|
| 72 |
+
first_engine = engine_names[0] if engine_names else None
|
| 73 |
+
first_dr = doc_engine_map[doc_id].get(first_engine)
|
| 74 |
+
if first_dr and first_dr.image_quality:
|
| 75 |
+
script_type = first_dr.image_quality.get("script_type", "")
|
| 76 |
+
|
| 77 |
+
documents.append({
|
| 78 |
+
"doc_id": doc_id,
|
| 79 |
+
"image_path": image_path,
|
| 80 |
+
"image_b64": images_b64.get(doc_id, ""),
|
| 81 |
+
"ground_truth": gt,
|
| 82 |
+
"mean_cer": safe_round(mean_cer),
|
| 83 |
+
"best_engine": best_engine["engine"] if best_engine else "",
|
| 84 |
+
"engine_results": engine_results,
|
| 85 |
+
"script_type": script_type,
|
| 86 |
+
})
|
| 87 |
+
return documents
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
def _build_engine_result_entry(engine_name: str, dr) -> dict:
|
| 91 |
+
"""Construit une entrée moteur pour un document donné (extrait pour lisibilité)."""
|
| 92 |
+
diff_ops = compute_char_diff(dr.ground_truth, dr.hypothesis)
|
| 93 |
+
er_entry: dict = {
|
| 94 |
+
"engine": engine_name,
|
| 95 |
+
"hypothesis": dr.hypothesis,
|
| 96 |
+
"cer": safe_round(dr.metrics.cer),
|
| 97 |
+
"cer_diplomatic": safe_round(dr.metrics.cer_diplomatic) if dr.metrics.cer_diplomatic is not None else None,
|
| 98 |
+
"wer": safe_round(dr.metrics.wer),
|
| 99 |
+
"mer": safe_round(dr.metrics.mer),
|
| 100 |
+
"wil": safe_round(dr.metrics.wil),
|
| 101 |
+
"duration": dr.duration_seconds,
|
| 102 |
+
"error": dr.engine_error,
|
| 103 |
+
"diff": diff_ops,
|
| 104 |
+
}
|
| 105 |
+
# Champs spécifiques aux pipelines OCR+LLM
|
| 106 |
+
if dr.ocr_intermediate is not None:
|
| 107 |
+
er_entry["ocr_intermediate"] = dr.ocr_intermediate
|
| 108 |
+
er_entry["ocr_diff"] = compute_word_diff(dr.ground_truth, dr.ocr_intermediate)
|
| 109 |
+
er_entry["llm_correction_diff"] = compute_word_diff(dr.ocr_intermediate, dr.hypothesis)
|
| 110 |
+
if dr.pipeline_metadata:
|
| 111 |
+
on = dr.pipeline_metadata.get("over_normalization")
|
| 112 |
+
if on is not None:
|
| 113 |
+
er_entry["over_normalization"] = on
|
| 114 |
+
er_entry["pipeline_mode"] = dr.pipeline_metadata.get("pipeline_mode")
|
| 115 |
+
# Sprint 5 — métriques avancées par document
|
| 116 |
+
if dr.char_scores is not None:
|
| 117 |
+
er_entry["ligature_score"] = safe_round(dr.char_scores.get("ligature", {}).get("score"))
|
| 118 |
+
er_entry["diacritic_score"] = safe_round(dr.char_scores.get("diacritic", {}).get("score"))
|
| 119 |
+
if dr.taxonomy is not None:
|
| 120 |
+
er_entry["taxonomy"] = dr.taxonomy
|
| 121 |
+
if dr.structure is not None:
|
| 122 |
+
er_entry["structure"] = dr.structure
|
| 123 |
+
if dr.image_quality is not None:
|
| 124 |
+
er_entry["image_quality"] = dr.image_quality
|
| 125 |
+
# Sprint 10
|
| 126 |
+
if dr.line_metrics is not None:
|
| 127 |
+
er_entry["line_metrics"] = dr.line_metrics
|
| 128 |
+
if dr.hallucination_metrics is not None:
|
| 129 |
+
er_entry["hallucination_metrics"] = dr.hallucination_metrics
|
| 130 |
+
return er_entry
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
def annotate_documents_with_difficulty(
|
| 134 |
+
benchmark: "BenchmarkResult", documents: list[dict],
|
| 135 |
+
) -> None:
|
| 136 |
+
"""Annote chaque document du dict avec son score de difficulté (Sprint 7).
|
| 137 |
+
|
| 138 |
+
Modifie ``documents`` en place. Les valeurs par défaut ``0.5`` /
|
| 139 |
+
``"Modéré"`` sont retournées si la difficulté n'a pas pu être
|
| 140 |
+
calculée (par exemple corpus dégénéré).
|
| 141 |
+
"""
|
| 142 |
+
doc_ids_ordered = [d["doc_id"] for d in documents]
|
| 143 |
+
gt_map = {d["doc_id"]: d["ground_truth"] for d in documents}
|
| 144 |
+
cer_map: dict[str, dict[str, float]] = {d["doc_id"]: {} for d in documents}
|
| 145 |
+
iq_map: dict[str, float] = {}
|
| 146 |
+
for report in benchmark.engine_reports:
|
| 147 |
+
for dr in report.document_results:
|
| 148 |
+
cer_map.setdefault(dr.doc_id, {})[report.engine_name] = safe_round(dr.metrics.cer)
|
| 149 |
+
if dr.image_quality and "quality_score" in dr.image_quality:
|
| 150 |
+
iq_map[dr.doc_id] = dr.image_quality["quality_score"]
|
| 151 |
+
difficulty_scores = compute_all_difficulties(
|
| 152 |
+
doc_ids=doc_ids_ordered,
|
| 153 |
+
ground_truths=gt_map,
|
| 154 |
+
cer_map=cer_map,
|
| 155 |
+
image_quality_map=iq_map or None,
|
| 156 |
+
)
|
| 157 |
+
for doc in documents:
|
| 158 |
+
ds = difficulty_scores.get(doc["doc_id"])
|
| 159 |
+
if ds:
|
| 160 |
+
doc["difficulty_score"] = safe_round(ds.score)
|
| 161 |
+
doc["difficulty_label"] = difficulty_label(ds.score)
|
| 162 |
+
else:
|
| 163 |
+
doc["difficulty_score"] = 0.5
|
| 164 |
+
doc["difficulty_label"] = "Modéré"
|
| 165 |
+
|
| 166 |
+
|
| 167 |
+
__all__ = ["build_documents", "annotate_documents_with_difficulty"]
|
|
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Construction du résumé par moteur (``engines_summary``).
|
| 2 |
+
|
| 3 |
+
Pour chaque ``EngineReport``, accumule métriques agrégées (CER, WER,
|
| 4 |
+
MER, WIL), distribution CER pour l'histogramme, métriques avancées
|
| 5 |
+
patrimoniales (Sprint 5), distribution d'erreurs (Sprint 10), NER
|
| 6 |
+
(Sprint 41), calibration (Sprint 43), profil philologique (Sprint
|
| 7 |
+
62), recherchabilité + séquences numériques (Sprint 86), lisibilité
|
| 8 |
+
(Sprint 87) et indicateurs pipeline OCR+LLM.
|
| 9 |
+
|
| 10 |
+
Les coûts (durée moyenne, prix par 1k pages, CO₂) sont ajoutés
|
| 11 |
+
ultérieurement par :mod:`picarones.report.report_data.pareto` qui
|
| 12 |
+
en a besoin pour calculer les fronts.
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
from __future__ import annotations
|
| 16 |
+
|
| 17 |
+
from typing import TYPE_CHECKING
|
| 18 |
+
|
| 19 |
+
from picarones.report.report_data._helpers import safe_round
|
| 20 |
+
|
| 21 |
+
if TYPE_CHECKING:
|
| 22 |
+
from picarones.core.results import BenchmarkResult
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
def build_engines_summary(benchmark: "BenchmarkResult") -> list[dict]:
|
| 26 |
+
"""Retourne la liste des dicts moteur, une entrée par ``EngineReport``."""
|
| 27 |
+
engines_summary: list[dict] = []
|
| 28 |
+
for report in benchmark.engine_reports:
|
| 29 |
+
agg = report.aggregated_metrics
|
| 30 |
+
diplo_agg = agg.get("cer_diplomatic", {})
|
| 31 |
+
|
| 32 |
+
line_metrics = report.aggregated_line_metrics
|
| 33 |
+
halluc = report.aggregated_hallucination
|
| 34 |
+
|
| 35 |
+
entry: dict = {
|
| 36 |
+
"name": report.engine_name,
|
| 37 |
+
"version": report.engine_version,
|
| 38 |
+
"cer": safe_round(agg.get("cer", {}).get("mean")),
|
| 39 |
+
"wer": safe_round(agg.get("wer", {}).get("mean")),
|
| 40 |
+
"mer": safe_round(agg.get("mer", {}).get("mean")),
|
| 41 |
+
"wil": safe_round(agg.get("wil", {}).get("mean")),
|
| 42 |
+
"cer_median": safe_round(agg.get("cer", {}).get("median")),
|
| 43 |
+
"cer_min": safe_round(agg.get("cer", {}).get("min")),
|
| 44 |
+
"cer_max": safe_round(agg.get("cer", {}).get("max")),
|
| 45 |
+
"doc_count": agg.get("document_count", 0),
|
| 46 |
+
"failed": agg.get("failed_count", 0),
|
| 47 |
+
# CER diplomatique (après normalisation historique : ſ=s, u=v, i=j…)
|
| 48 |
+
"cer_diplomatic": safe_round(diplo_agg.get("mean")) if diplo_agg else None,
|
| 49 |
+
"cer_diplomatic_profile": diplo_agg.get("profile"),
|
| 50 |
+
# Distribution pour l'histogramme : liste des CER individuels
|
| 51 |
+
"cer_values": [
|
| 52 |
+
safe_round(dr.metrics.cer)
|
| 53 |
+
for dr in report.document_results
|
| 54 |
+
if dr.metrics.error is None
|
| 55 |
+
],
|
| 56 |
+
"cer_diplomatic_values": [
|
| 57 |
+
safe_round(dr.metrics.cer_diplomatic)
|
| 58 |
+
for dr in report.document_results
|
| 59 |
+
if dr.metrics.error is None and dr.metrics.cer_diplomatic is not None
|
| 60 |
+
],
|
| 61 |
+
# Champs pipeline OCR+LLM (vides pour les moteurs OCR seuls)
|
| 62 |
+
"is_pipeline": report.is_pipeline,
|
| 63 |
+
"pipeline_info": report.pipeline_info,
|
| 64 |
+
# Sprint 5 — métriques avancées patrimoniales
|
| 65 |
+
"ligature_score": safe_round(report.ligature_score) if report.ligature_score is not None else None,
|
| 66 |
+
"diacritic_score": safe_round(report.diacritic_score) if report.diacritic_score is not None else None,
|
| 67 |
+
"aggregated_confusion": report.aggregated_confusion,
|
| 68 |
+
"aggregated_taxonomy": report.aggregated_taxonomy,
|
| 69 |
+
"aggregated_structure": report.aggregated_structure,
|
| 70 |
+
"aggregated_image_quality": report.aggregated_image_quality,
|
| 71 |
+
# Sprint 10 — distribution des erreurs + hallucinations VLM
|
| 72 |
+
"gini": safe_round(line_metrics.get("gini_mean")) if line_metrics else None,
|
| 73 |
+
"cer_p90": safe_round(line_metrics.get("percentiles", {}).get("p90")) if line_metrics else None,
|
| 74 |
+
"cer_p99": safe_round(line_metrics.get("percentiles", {}).get("p99")) if line_metrics else None,
|
| 75 |
+
"catastrophic_rate_30": safe_round(line_metrics.get("catastrophic_rate", {}).get("0.3")) if line_metrics else None,
|
| 76 |
+
"aggregated_line_metrics": line_metrics,
|
| 77 |
+
"anchor_score": safe_round(halluc.get("anchor_score_mean")) if halluc else None,
|
| 78 |
+
"length_ratio": safe_round(halluc.get("length_ratio_mean")) if halluc else None,
|
| 79 |
+
"hallucinating_doc_rate": safe_round(halluc.get("hallucinating_doc_rate")) if halluc else None,
|
| 80 |
+
"aggregated_hallucination": halluc,
|
| 81 |
+
# Sprint 41 — NER agrégé (None si aucun calcul effectué)
|
| 82 |
+
"aggregated_ner": report.aggregated_ner,
|
| 83 |
+
# Sprint 43 — calibration agrégée (None si aucune confidence
|
| 84 |
+
# n'a été exposée par le moteur sur ce corpus)
|
| 85 |
+
"aggregated_calibration": report.aggregated_calibration,
|
| 86 |
+
# Sprint 62 — profil philologique agrégé (None si aucun
|
| 87 |
+
# signal philologique sur le corpus pour ce moteur)
|
| 88 |
+
"aggregated_philological": report.aggregated_philological,
|
| 89 |
+
# Sprint 86 — A.II.5 (recherchabilité fuzzy + séquences
|
| 90 |
+
# numériques). None si aucun document n'a de signal.
|
| 91 |
+
"aggregated_searchability": report.aggregated_searchability,
|
| 92 |
+
"aggregated_numerical_sequences": (
|
| 93 |
+
report.aggregated_numerical_sequences
|
| 94 |
+
),
|
| 95 |
+
# Sprint 87 — A.II.2 (delta Flesch agrégé)
|
| 96 |
+
"aggregated_readability": report.aggregated_readability,
|
| 97 |
+
"is_vlm": report.pipeline_info.get("is_vlm", False) if report.pipeline_info else False,
|
| 98 |
+
}
|
| 99 |
+
engines_summary.append(entry)
|
| 100 |
+
return engines_summary
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
__all__ = ["build_engines_summary"]
|
|
@@ -0,0 +1,123 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Front Pareto coût/qualité (Sprint 19).
|
| 2 |
+
|
| 3 |
+
Construit trois fronts Pareto avec des axes alternatifs :
|
| 4 |
+
|
| 5 |
+
- ``cost`` — CER vs coût € / 1000 pages.
|
| 6 |
+
- ``speed`` — CER vs durée moyenne par page.
|
| 7 |
+
- ``co2`` — CER vs empreinte carbone (g CO₂ / 1000 pages, expérimental).
|
| 8 |
+
|
| 9 |
+
**Effet de bord** : :func:`build_pareto_section` enrichit en place
|
| 10 |
+
le ``engines_summary`` reçu en argument avec les champs
|
| 11 |
+
``mean_duration_seconds`` et ``cost`` (coût par 1000 pages + détail
|
| 12 |
+
de pricing). Cette responsabilité partagée est documentée dans le
|
| 13 |
+
module ``__init__.py`` du sous-package.
|
| 14 |
+
"""
|
| 15 |
+
|
| 16 |
+
from __future__ import annotations
|
| 17 |
+
|
| 18 |
+
from typing import TYPE_CHECKING
|
| 19 |
+
|
| 20 |
+
from picarones.measurements.pricing import (
|
| 21 |
+
build_costs_for_benchmark,
|
| 22 |
+
load_pricing_database,
|
| 23 |
+
)
|
| 24 |
+
from picarones.measurements.statistics import compute_pareto_front
|
| 25 |
+
|
| 26 |
+
if TYPE_CHECKING:
|
| 27 |
+
from picarones.core.results import BenchmarkResult
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
def build_pareto_section(
|
| 31 |
+
engines_summary: list[dict], benchmark: "BenchmarkResult",
|
| 32 |
+
) -> dict:
|
| 33 |
+
"""Construit le bloc ``pareto`` du dict de rapport.
|
| 34 |
+
|
| 35 |
+
Annote en place chaque entrée de ``engines_summary`` avec
|
| 36 |
+
``mean_duration_seconds`` et ``cost``.
|
| 37 |
+
"""
|
| 38 |
+
durations_by_engine: dict[str, float] = {}
|
| 39 |
+
for report in benchmark.engine_reports:
|
| 40 |
+
durs = [
|
| 41 |
+
dr.duration_seconds
|
| 42 |
+
for dr in report.document_results
|
| 43 |
+
if dr.duration_seconds is not None
|
| 44 |
+
]
|
| 45 |
+
if durs:
|
| 46 |
+
durations_by_engine[report.engine_name] = sum(durs) / len(durs)
|
| 47 |
+
|
| 48 |
+
pricing_defaults, _ = load_pricing_database()
|
| 49 |
+
costs_by_engine = build_costs_for_benchmark(
|
| 50 |
+
engines_summary, durations_by_engine,
|
| 51 |
+
)
|
| 52 |
+
# Annoter en place chaque résumé moteur avec son coût et sa durée.
|
| 53 |
+
for entry in engines_summary:
|
| 54 |
+
name = entry["name"]
|
| 55 |
+
entry["mean_duration_seconds"] = (
|
| 56 |
+
round(durations_by_engine.get(name, 0.0), 4)
|
| 57 |
+
if name in durations_by_engine else None
|
| 58 |
+
)
|
| 59 |
+
entry["cost"] = costs_by_engine.get(name)
|
| 60 |
+
|
| 61 |
+
pareto_points = []
|
| 62 |
+
for entry in engines_summary:
|
| 63 |
+
cer = entry.get("cer")
|
| 64 |
+
cost = (entry.get("cost") or {}).get("cost_per_1k_pages_eur")
|
| 65 |
+
if cer is None or cost is None:
|
| 66 |
+
continue
|
| 67 |
+
pareto_points.append({"engine": entry["name"], "cer": cer, "cost": cost})
|
| 68 |
+
pareto_front_engines = compute_pareto_front(
|
| 69 |
+
pareto_points, objectives=("cer", "cost"),
|
| 70 |
+
)
|
| 71 |
+
|
| 72 |
+
pareto_speed_points = []
|
| 73 |
+
for entry in engines_summary:
|
| 74 |
+
cer = entry.get("cer")
|
| 75 |
+
dur = entry.get("mean_duration_seconds")
|
| 76 |
+
if cer is None or dur is None:
|
| 77 |
+
continue
|
| 78 |
+
pareto_speed_points.append({"engine": entry["name"], "cer": cer, "dur": dur})
|
| 79 |
+
pareto_front_speed = compute_pareto_front(
|
| 80 |
+
pareto_speed_points, objectives=("cer", "dur"),
|
| 81 |
+
)
|
| 82 |
+
|
| 83 |
+
pareto_co2_points = []
|
| 84 |
+
for entry in engines_summary:
|
| 85 |
+
cer = entry.get("cer")
|
| 86 |
+
co2 = (entry.get("cost") or {}).get("co2_per_1k_pages_g")
|
| 87 |
+
if cer is None or co2 is None:
|
| 88 |
+
continue
|
| 89 |
+
pareto_co2_points.append({"engine": entry["name"], "cer": cer, "co2": co2})
|
| 90 |
+
pareto_front_co2 = compute_pareto_front(
|
| 91 |
+
pareto_co2_points, objectives=("cer", "co2"),
|
| 92 |
+
)
|
| 93 |
+
|
| 94 |
+
return {
|
| 95 |
+
"cost": {
|
| 96 |
+
"points": pareto_points,
|
| 97 |
+
"front": pareto_front_engines,
|
| 98 |
+
"axis_label": "Coût (€ / 1000 pages)",
|
| 99 |
+
},
|
| 100 |
+
"speed": {
|
| 101 |
+
"points": pareto_speed_points,
|
| 102 |
+
"front": pareto_front_speed,
|
| 103 |
+
"axis_label": "Temps moyen (s / page)",
|
| 104 |
+
},
|
| 105 |
+
"co2": {
|
| 106 |
+
"points": pareto_co2_points,
|
| 107 |
+
"front": pareto_front_co2,
|
| 108 |
+
"axis_label": (
|
| 109 |
+
"Empreinte carbone (g CO₂ / 1000 pages, expérimental)"
|
| 110 |
+
),
|
| 111 |
+
},
|
| 112 |
+
"pricing_meta": {
|
| 113 |
+
"last_updated": pricing_defaults.last_updated,
|
| 114 |
+
"currency": pricing_defaults.currency,
|
| 115 |
+
"hourly_rate_local_cpu_eur": pricing_defaults.hourly_rate_local_cpu_eur,
|
| 116 |
+
"hourly_rate_local_gpu_eur": pricing_defaults.hourly_rate_local_gpu_eur,
|
| 117 |
+
"grid_intensity_local": pricing_defaults.grid_intensity_local,
|
| 118 |
+
"grid_intensity_cloud": pricing_defaults.grid_intensity_cloud,
|
| 119 |
+
},
|
| 120 |
+
}
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
__all__ = ["build_pareto_section"]
|
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Scatter plots du rapport (Sprint 10).
|
| 2 |
+
|
| 3 |
+
- ``gini_vs_cer`` — corrélation Gini (concentration des erreurs)
|
| 4 |
+
vs CER moyen, par moteur.
|
| 5 |
+
- ``ratio_vs_anchor`` — ratio de longueur OCR/GT vs score d'ancrage,
|
| 6 |
+
par moteur (révèle les hallucinations VLM).
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
from __future__ import annotations
|
| 10 |
+
|
| 11 |
+
from typing import TYPE_CHECKING
|
| 12 |
+
|
| 13 |
+
from picarones.report.report_data._helpers import safe_round
|
| 14 |
+
|
| 15 |
+
if TYPE_CHECKING:
|
| 16 |
+
from picarones.core.results import BenchmarkResult
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def build_gini_vs_cer(benchmark: "BenchmarkResult") -> list[dict]:
|
| 20 |
+
"""Scatter Gini de la distribution d'erreurs vs CER moyen."""
|
| 21 |
+
gini_vs_cer: list[dict] = []
|
| 22 |
+
for report in benchmark.engine_reports:
|
| 23 |
+
line_metrics = report.aggregated_line_metrics
|
| 24 |
+
gini_val = line_metrics.get("gini_mean") if line_metrics else None
|
| 25 |
+
cer_val = report.mean_cer
|
| 26 |
+
if gini_val is not None and cer_val is not None:
|
| 27 |
+
gini_vs_cer.append({
|
| 28 |
+
"engine": report.engine_name,
|
| 29 |
+
"cer": safe_round(cer_val),
|
| 30 |
+
"gini": safe_round(gini_val),
|
| 31 |
+
"is_pipeline": report.is_pipeline,
|
| 32 |
+
})
|
| 33 |
+
return gini_vs_cer
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def build_ratio_vs_anchor(benchmark: "BenchmarkResult") -> list[dict]:
|
| 37 |
+
"""Scatter ratio de longueur vs score d'ancrage (détection VLM)."""
|
| 38 |
+
ratio_vs_anchor: list[dict] = []
|
| 39 |
+
for report in benchmark.engine_reports:
|
| 40 |
+
halluc = report.aggregated_hallucination
|
| 41 |
+
if not halluc:
|
| 42 |
+
continue
|
| 43 |
+
ratio_vs_anchor.append({
|
| 44 |
+
"engine": report.engine_name,
|
| 45 |
+
"length_ratio": safe_round(halluc.get("length_ratio_mean", 1.0)),
|
| 46 |
+
"anchor_score": safe_round(halluc.get("anchor_score_mean", 1.0)),
|
| 47 |
+
"hallucinating_rate": safe_round(halluc.get("hallucinating_doc_rate", 0.0)),
|
| 48 |
+
"is_vlm": (
|
| 49 |
+
report.pipeline_info.get("is_vlm", False)
|
| 50 |
+
if report.pipeline_info else False
|
| 51 |
+
),
|
| 52 |
+
})
|
| 53 |
+
return ratio_vs_anchor
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
__all__ = ["build_gini_vs_cer", "build_ratio_vs_anchor"]
|
|
@@ -0,0 +1,216 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Sections statistiques du rapport (Sprint 7 + Sprint 17).
|
| 2 |
+
|
| 3 |
+
Construit les blocs :
|
| 4 |
+
|
| 5 |
+
- ``pairwise_wilcoxon`` — tests de Wilcoxon par paire de moteurs.
|
| 6 |
+
- ``bootstrap_cis`` — intervalles de confiance bootstrap par moteur.
|
| 7 |
+
- ``friedman`` + ``nemenyi`` — Sprint 17, multi-moteurs.
|
| 8 |
+
- ``reliability_curves`` — courbes de fiabilité par moteur.
|
| 9 |
+
- ``venn_data`` — diagramme de Venn des erreurs communes/exclusives.
|
| 10 |
+
- ``error_clusters`` — clustering des patterns d'erreurs.
|
| 11 |
+
- ``correlation_per_engine`` — matrice de corrélation par moteur.
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
from __future__ import annotations
|
| 15 |
+
|
| 16 |
+
from typing import TYPE_CHECKING, Optional
|
| 17 |
+
|
| 18 |
+
from picarones.core.diff_utils import compute_word_diff
|
| 19 |
+
from picarones.measurements.statistics import (
|
| 20 |
+
bootstrap_ci,
|
| 21 |
+
cluster_errors,
|
| 22 |
+
compute_correlation_matrix,
|
| 23 |
+
compute_pairwise_stats,
|
| 24 |
+
compute_reliability_curve,
|
| 25 |
+
compute_venn_data,
|
| 26 |
+
friedman_test,
|
| 27 |
+
nemenyi_posthoc,
|
| 28 |
+
)
|
| 29 |
+
from picarones.report.report_data._helpers import safe_round
|
| 30 |
+
|
| 31 |
+
if TYPE_CHECKING:
|
| 32 |
+
from picarones.core.results import BenchmarkResult
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def _engine_cer_values(benchmark: "BenchmarkResult") -> dict[str, list[float]]:
|
| 36 |
+
"""Map ``engine_name → [cer_individuels valides]``."""
|
| 37 |
+
out: dict[str, list[float]] = {}
|
| 38 |
+
for report in benchmark.engine_reports:
|
| 39 |
+
vals = [
|
| 40 |
+
safe_round(dr.metrics.cer)
|
| 41 |
+
for dr in report.document_results
|
| 42 |
+
if dr.metrics.error is None
|
| 43 |
+
]
|
| 44 |
+
if vals:
|
| 45 |
+
out[report.engine_name] = vals
|
| 46 |
+
return out
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def build_pairwise_wilcoxon(benchmark: "BenchmarkResult") -> list[dict]:
|
| 50 |
+
"""Tests de Wilcoxon par paire de moteurs (Sprint 7)."""
|
| 51 |
+
return compute_pairwise_stats(_engine_cer_values(benchmark))
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def build_bootstrap_cis(benchmark: "BenchmarkResult") -> list[dict]:
|
| 55 |
+
"""Intervalles de confiance bootstrap par moteur (Sprint 7)."""
|
| 56 |
+
bootstrap_cis: list[dict] = []
|
| 57 |
+
for engine_name, vals in _engine_cer_values(benchmark).items():
|
| 58 |
+
lo, hi = bootstrap_ci(vals)
|
| 59 |
+
mean_v = sum(vals) / len(vals) if vals else 0.0
|
| 60 |
+
bootstrap_cis.append({
|
| 61 |
+
"engine": engine_name,
|
| 62 |
+
"mean": safe_round(mean_v),
|
| 63 |
+
"ci_lower": safe_round(lo),
|
| 64 |
+
"ci_upper": safe_round(hi),
|
| 65 |
+
})
|
| 66 |
+
return bootstrap_cis
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
def build_friedman_and_nemenyi(benchmark: "BenchmarkResult") -> dict:
|
| 70 |
+
"""Test de Friedman + post-hoc Nemenyi (Sprint 17, multi-moteurs).
|
| 71 |
+
|
| 72 |
+
Alignement strict sur le même ordre de documents : on reconstruit
|
| 73 |
+
la map à partir des documents communs à tous les moteurs, sinon
|
| 74 |
+
Friedman n'est pas applicable.
|
| 75 |
+
|
| 76 |
+
Returns
|
| 77 |
+
-------
|
| 78 |
+
dict
|
| 79 |
+
``{"friedman": {...}, "nemenyi": {...}}`` à fusionner dans
|
| 80 |
+
la section ``statistics`` du rapport.
|
| 81 |
+
"""
|
| 82 |
+
# Liste ordonnée des doc_ids selon l'ordre d'apparition.
|
| 83 |
+
seen: set[str] = set()
|
| 84 |
+
doc_ids_ordered: list[str] = []
|
| 85 |
+
for report in benchmark.engine_reports:
|
| 86 |
+
for dr in report.document_results:
|
| 87 |
+
if dr.doc_id not in seen:
|
| 88 |
+
seen.add(dr.doc_id)
|
| 89 |
+
doc_ids_ordered.append(dr.doc_id)
|
| 90 |
+
|
| 91 |
+
common_doc_ids: Optional[set[str]] = None
|
| 92 |
+
for report in benchmark.engine_reports:
|
| 93 |
+
doc_ids = {dr.doc_id for dr in report.document_results if dr.metrics.error is None}
|
| 94 |
+
common_doc_ids = doc_ids if common_doc_ids is None else common_doc_ids & doc_ids
|
| 95 |
+
|
| 96 |
+
engine_cer_aligned: dict[str, list[float]] = {}
|
| 97 |
+
if common_doc_ids:
|
| 98 |
+
ordered_common = [d for d in doc_ids_ordered if d in common_doc_ids]
|
| 99 |
+
for report in benchmark.engine_reports:
|
| 100 |
+
dr_by_id = {dr.doc_id: dr for dr in report.document_results}
|
| 101 |
+
engine_cer_aligned[report.engine_name] = [
|
| 102 |
+
safe_round(dr_by_id[d].metrics.cer) for d in ordered_common
|
| 103 |
+
]
|
| 104 |
+
|
| 105 |
+
if engine_cer_aligned:
|
| 106 |
+
friedman = friedman_test(engine_cer_aligned)
|
| 107 |
+
nemenyi = nemenyi_posthoc(engine_cer_aligned)
|
| 108 |
+
else:
|
| 109 |
+
friedman = {
|
| 110 |
+
"statistic": 0.0, "p_value": 1.0, "significant": False,
|
| 111 |
+
"df": 0, "n_blocks": 0, "n_engines": 0, "mean_ranks": {},
|
| 112 |
+
"interpretation": "Test de Friedman non calculé — aucun document commun.",
|
| 113 |
+
"error": "no_common_documents",
|
| 114 |
+
}
|
| 115 |
+
nemenyi = {
|
| 116 |
+
"alpha": 0.05, "critical_distance": 0.0, "q_alpha": 0.0,
|
| 117 |
+
"n_blocks": 0, "n_engines": 0, "mean_ranks": {},
|
| 118 |
+
"engines_sorted": [], "significant_matrix": [], "tied_groups": [],
|
| 119 |
+
"error": "no_common_documents",
|
| 120 |
+
}
|
| 121 |
+
return {"friedman": friedman, "nemenyi": nemenyi}
|
| 122 |
+
|
| 123 |
+
|
| 124 |
+
def build_reliability_curves(benchmark: "BenchmarkResult") -> list[dict]:
|
| 125 |
+
"""Courbes de fiabilité par moteur (Sprint 7)."""
|
| 126 |
+
reliability_curves: list[dict] = []
|
| 127 |
+
for report in benchmark.engine_reports:
|
| 128 |
+
vals = [
|
| 129 |
+
safe_round(dr.metrics.cer)
|
| 130 |
+
for dr in report.document_results
|
| 131 |
+
if dr.metrics.error is None
|
| 132 |
+
]
|
| 133 |
+
curve = compute_reliability_curve(vals)
|
| 134 |
+
reliability_curves.append({
|
| 135 |
+
"engine": report.engine_name,
|
| 136 |
+
"points": curve,
|
| 137 |
+
})
|
| 138 |
+
return reliability_curves
|
| 139 |
+
|
| 140 |
+
|
| 141 |
+
def build_venn_data(benchmark: "BenchmarkResult") -> dict:
|
| 142 |
+
"""Venn des erreurs communes / exclusives (Sprint 7).
|
| 143 |
+
|
| 144 |
+
Construit les ensembles d'erreurs par moteur :
|
| 145 |
+
``{engine → set("doc_id:gt_tok:hyp_tok")}``.
|
| 146 |
+
"""
|
| 147 |
+
venn_error_sets: dict[str, set[str]] = {}
|
| 148 |
+
for report in benchmark.engine_reports:
|
| 149 |
+
error_set: set[str] = set()
|
| 150 |
+
for dr in report.document_results:
|
| 151 |
+
ops = compute_word_diff(dr.ground_truth, dr.hypothesis)
|
| 152 |
+
for op in ops:
|
| 153 |
+
if op["op"] in ("replace", "delete", "insert"):
|
| 154 |
+
key = (
|
| 155 |
+
f"{dr.doc_id}:"
|
| 156 |
+
f"{op.get('old', op.get('text', ''))}:"
|
| 157 |
+
f"{op.get('new', op.get('text', ''))}"
|
| 158 |
+
)
|
| 159 |
+
error_set.add(key)
|
| 160 |
+
venn_error_sets[report.engine_name] = error_set
|
| 161 |
+
return compute_venn_data(venn_error_sets)
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
def build_error_clusters(benchmark: "BenchmarkResult") -> list[dict]:
|
| 165 |
+
"""Clustering des patterns d'erreurs (Sprint 7)."""
|
| 166 |
+
error_data_all: list[dict] = []
|
| 167 |
+
for report in benchmark.engine_reports:
|
| 168 |
+
for dr in report.document_results:
|
| 169 |
+
error_data_all.append({
|
| 170 |
+
"engine": report.engine_name,
|
| 171 |
+
"gt": dr.ground_truth,
|
| 172 |
+
"hypothesis": dr.hypothesis,
|
| 173 |
+
})
|
| 174 |
+
error_clusters_raw = cluster_errors(error_data_all, max_clusters=8)
|
| 175 |
+
return [c.as_dict() for c in error_clusters_raw]
|
| 176 |
+
|
| 177 |
+
|
| 178 |
+
def build_correlation_per_engine(benchmark: "BenchmarkResult") -> list[dict]:
|
| 179 |
+
"""Matrice de corrélation par moteur entre métriques métiers (Sprint 7)."""
|
| 180 |
+
correlation_per_engine: list[dict] = []
|
| 181 |
+
for report in benchmark.engine_reports:
|
| 182 |
+
metrics_list: list[dict[str, float]] = []
|
| 183 |
+
for dr in report.document_results:
|
| 184 |
+
if dr.metrics.error is not None:
|
| 185 |
+
continue
|
| 186 |
+
entry: dict[str, float] = {
|
| 187 |
+
"cer": safe_round(dr.metrics.cer),
|
| 188 |
+
"wer": safe_round(dr.metrics.wer),
|
| 189 |
+
"mer": safe_round(dr.metrics.mer),
|
| 190 |
+
"wil": safe_round(dr.metrics.wil),
|
| 191 |
+
}
|
| 192 |
+
if dr.image_quality:
|
| 193 |
+
entry["quality_score"] = safe_round(dr.image_quality.get("quality_score", 0.5))
|
| 194 |
+
entry["sharpness"] = safe_round(dr.image_quality.get("sharpness_score", 0.5))
|
| 195 |
+
if dr.char_scores:
|
| 196 |
+
entry["ligature"] = safe_round(dr.char_scores.get("ligature", {}).get("score", 0.5))
|
| 197 |
+
entry["diacritic"] = safe_round(dr.char_scores.get("diacritic", {}).get("score", 0.5))
|
| 198 |
+
metrics_list.append(entry)
|
| 199 |
+
if metrics_list:
|
| 200 |
+
corr = compute_correlation_matrix(metrics_list)
|
| 201 |
+
correlation_per_engine.append({
|
| 202 |
+
"engine": report.engine_name,
|
| 203 |
+
**corr,
|
| 204 |
+
})
|
| 205 |
+
return correlation_per_engine
|
| 206 |
+
|
| 207 |
+
|
| 208 |
+
__all__ = [
|
| 209 |
+
"build_pairwise_wilcoxon",
|
| 210 |
+
"build_bootstrap_cis",
|
| 211 |
+
"build_friedman_and_nemenyi",
|
| 212 |
+
"build_reliability_curves",
|
| 213 |
+
"build_venn_data",
|
| 214 |
+
"build_error_clusters",
|
| 215 |
+
"build_correlation_per_engine",
|
| 216 |
+
]
|
|
@@ -36,13 +36,18 @@ FILE_BUDGETS: dict[str, int] = {
|
|
| 36 |
# --- God-modules : budget actuel + 15 % de marge.
|
| 37 |
# Le rétrécissement sera l'objet d'un sprint de refactor dédié.
|
| 38 |
"picarones/measurements/statistics.py": 1300, # actuel 1128
|
| 39 |
-
"picarones/report/generator.py": 1250, # actuel 1063
|
| 40 |
"picarones/measurements/runner.py": 1200, # actuel 1019
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
# --- Fichiers métier larges.
|
| 42 |
"picarones/measurements/robustness.py": 850, # actuel 731
|
| 43 |
-
"picarones/report/pipeline_render.py":
|
| 44 |
"picarones/core/results.py": 750, # actuel 636
|
| 45 |
-
"picarones/report/philological_render.py":
|
| 46 |
"picarones/measurements/history.py": 725, # actuel 615
|
| 47 |
"picarones/measurements/modern_archives.py": 700, # actuel 599
|
| 48 |
"picarones/measurements/builtin_hooks.py": 700, # actuel 590
|
|
|
|
| 36 |
# --- God-modules : budget actuel + 15 % de marge.
|
| 37 |
# Le rétrécissement sera l'objet d'un sprint de refactor dédié.
|
| 38 |
"picarones/measurements/statistics.py": 1300, # actuel 1128
|
|
|
|
| 39 |
"picarones/measurements/runner.py": 1200, # actuel 1019
|
| 40 |
+
# --- Refactor (sprint « découpage de generator.py ») : passé de
|
| 41 |
+
# 1063 à 431 lignes via extraction vers picarones/report/assets.py
|
| 42 |
+
# et le sous-package picarones/report/report_data/. Budget serré
|
| 43 |
+
# à 500 pour verrouiller le gain ; toute croissance > 500 sera
|
| 44 |
+
# un signal pour redécouper.
|
| 45 |
+
"picarones/report/generator.py": 500, # actuel 431
|
| 46 |
# --- Fichiers métier larges.
|
| 47 |
"picarones/measurements/robustness.py": 850, # actuel 731
|
| 48 |
+
"picarones/report/pipeline_render.py": 815, # actuel 707 (rétréci)
|
| 49 |
"picarones/core/results.py": 750, # actuel 636
|
| 50 |
+
"picarones/report/philological_render.py": 700, # actuel 595 (rétréci)
|
| 51 |
"picarones/measurements/history.py": 725, # actuel 615
|
| 52 |
"picarones/measurements/modern_archives.py": 700, # actuel 599
|
| 53 |
"picarones/measurements/builtin_hooks.py": 700, # actuel 590
|
|
@@ -333,7 +333,12 @@ class TestDetailsShell:
|
|
| 333 |
class TestGeneratorWiring:
|
| 334 |
def test_generator_imports_three_views(self):
|
| 335 |
"""generator.py doit importer les 3 vues automatiques (economics,
|
| 336 |
-
advanced_taxonomy, diagnostics) pour les passer au template.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 337 |
from pathlib import Path
|
| 338 |
|
| 339 |
gen_src = (
|
|
@@ -343,10 +348,21 @@ class TestGeneratorWiring:
|
|
| 343 |
assert "build_economics_view_html" in gen_src
|
| 344 |
assert "build_advanced_taxonomy_view_html" in gen_src
|
| 345 |
assert "build_diagnostics_view_html" in gen_src
|
| 346 |
-
# Et les 3 variables doivent être
|
| 347 |
-
|
| 348 |
-
|
| 349 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 350 |
|
| 351 |
def test_template_uses_three_views(self):
|
| 352 |
from pathlib import Path
|
|
|
|
| 333 |
class TestGeneratorWiring:
|
| 334 |
def test_generator_imports_three_views(self):
|
| 335 |
"""generator.py doit importer les 3 vues automatiques (economics,
|
| 336 |
+
advanced_taxonomy, diagnostics) pour les passer au template.
|
| 337 |
+
|
| 338 |
+
Tolère les deux conventions de câblage : argument nommé
|
| 339 |
+
``economics_view_html=...`` ou clé de dict ``"economics_view_html"``
|
| 340 |
+
splatée via ``**section_html`` (cf. ``_build_section_html``).
|
| 341 |
+
"""
|
| 342 |
from pathlib import Path
|
| 343 |
|
| 344 |
gen_src = (
|
|
|
|
| 348 |
assert "build_economics_view_html" in gen_src
|
| 349 |
assert "build_advanced_taxonomy_view_html" in gen_src
|
| 350 |
assert "build_diagnostics_view_html" in gen_src
|
| 351 |
+
# Et les 3 variables doivent être câblées vers le template, soit
|
| 352 |
+
# par argument explicite (``var=...``), soit par clé de dict
|
| 353 |
+
# splatée (``"var": ...``).
|
| 354 |
+
for name in (
|
| 355 |
+
"economics_view_html",
|
| 356 |
+
"advanced_taxonomy_view_html",
|
| 357 |
+
"diagnostics_view_html",
|
| 358 |
+
):
|
| 359 |
+
assert (
|
| 360 |
+
f"{name}=" in gen_src
|
| 361 |
+
or f'"{name}"' in gen_src
|
| 362 |
+
), (
|
| 363 |
+
f"variable {name!r} ni argument nommé ni clé de dict "
|
| 364 |
+
"dans generator.py"
|
| 365 |
+
)
|
| 366 |
|
| 367 |
def test_template_uses_three_views(self):
|
| 368 |
from pathlib import Path
|