Spaces:

Ma-Ri-Ba-Ku
/

Picarones

Sleeping

Claude commited on about 1 month ago

Commit

d641f6e

unverified ·

1 Parent(s): 2d6c41d

refactor(report): split generator.py (1063 → 431 lines) by concern

Sprint « découpage de generator.py ». Le fichier orchestre désormais
uniquement le rendu Jinja et la classe ReportGenerator ; toute la
construction de données et l'I/O image sont extraites vers des
sous-modules dédiés.

Ce qui change physiquement :

- generator.py : 1063 → 431 lignes (-60 %).
Conserve : la classe ReportGenerator (init + generate + from_json
+ nouvelle méthode _build_section_html qui regroupe les 12 appels
aux renderers conditionnels), _build_jinja_env, _TEMPLATES_DIR.
Réexporte en alias rétrocompat : _build_report_data, _cer_color,
_cer_bg, _externalize_images_to_dir, _encode_image_b64,
_encode_images_b64_from_result, _load_vendor_js, _pct, _safe.

- picarones/report/assets.py (nouveau, 179 lignes) :
load_vendor_js, encode_image_b64, encode_images_b64_from_result,
externalize_images_to_dir. Tout l'I/O binaire image et vendor.

- picarones/report/report_data/ (nouveau package) :
• __init__.py (102 lignes) — orchestrateur build_report_data.
• _helpers.py (30) — safe_round, percent_string.
• engines.py (103) — résumé par moteur (engines_summary).
• documents.py (167) — galerie + détail + difficulté Sprint 7.
• statistics.py (216) — Wilcoxon, Friedman/Nemenyi, bootstrap,
reliability curves, Venn, error clusters, corrélations.
• scatter.py (56) — Sprint 10 : Gini vs CER, ratio vs anchor.
• pareto.py (123) — Sprint 19 : 3 fronts Pareto + pricing meta.

- render_helpers.py +60 lignes (332 → 392) : ajoute cer_step_color
et cer_step_bg (barème CER discret à 4 paliers).

Frontières conceptuelles (pas arbitraires) : chaque sous-module
correspond à un bloc indépendant qui changera indépendamment des
autres. La construction de pareto a un effet de bord documenté
(annotation in-place de engines_summary avec mean_duration_seconds
et cost) — c'est la seule dépendance d'ordre du package.

Calibration des invariants :

- FILE_BUDGETS : generator.py budget serré 1250 → 500 (verrouille
le gain ; 431 + ~15 % de marge). pipeline_render et
philological_render également un peu rétrécis grâce aux helpers
consolidés au commit précédent.
- test_views.py::test_generator_imports_three_views : assertions
élargies pour accepter les deux conventions de câblage (argument
nommé OU clé de dict splatée via **section_html).

Non-régression :

- 3830 passed, 2 skipped (identique au commit précédent).
- 1 échec pré-existant (tests/docs/test_readme_dual_lang.py) sans
rapport.
- Tous les tests qui importent _build_report_data, _cer_color,
_externalize_images_to_dir depuis picarones.report.generator
continuent de fonctionner via les alias rétrocompat.

Files changed (12) hide show

picarones/report/assets.py +179 -0
picarones/report/generator.py +145 -777
picarones/report/render_helpers.py +56 -0
picarones/report/report_data/__init__.py +102 -0
picarones/report/report_data/_helpers.py +30 -0
picarones/report/report_data/documents.py +167 -0
picarones/report/report_data/engines.py +103 -0
picarones/report/report_data/pareto.py +123 -0
picarones/report/report_data/scatter.py +56 -0
picarones/report/report_data/statistics.py +216 -0
tests/architecture/test_file_budgets.py +8 -3
tests/report/test_views.py +21 -5

picarones/report/assets.py ADDED Viewed

	@@ -0,0 +1,179 @@

+"""Chargement et préparation des assets du rapport HTML.
+Ce module concentre tout ce qui touche aux ressources binaires
+embarquées ou référencées par le rapport :
+- ``load_vendor_js`` lit un fichier JS vendorisé (Chart.js, etc.).
+- ``encode_image_b64`` redimensionne et encode une image en data-URI.
+- ``encode_images_b64_from_result`` itère sur un BenchmarkResult.
+- ``externalize_images_to_dir`` écrit les images sur disque à côté
+  du HTML (mode ``--lazy-images`` du Sprint A5).
+Extrait de ``picarones/report/generator.py`` lors du sprint de
+découpage : isole l'I/O image et vendor du reste de l'orchestration.
+"""
+from __future__ import annotations
+import base64
+import io
+import logging
+from pathlib import Path
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from picarones.core.results import BenchmarkResult
+logger = logging.getLogger(__name__)
+#: Dossier où sont stockées les ressources JS embarquées.
+_VENDOR_DIR = Path(__file__).parent / "vendor"
+def load_vendor_js(name: str) -> str:
+    """Lit un fichier JS vendorisé et retourne son contenu.
+    Si le fichier n'existe pas, retourne un commentaire JS qui
+    garde le rapport valide (pas de SyntaxError côté navigateur).
+    """
+    p = _VENDOR_DIR / name
+    if p.exists():
+        return p.read_text(encoding="utf-8")
+    return f"/* vendor/{name} non trouvé */"
+def encode_image_b64(image_path: str, max_width: int = 1200) -> str:
+    """Lit une image, la redimensionne si besoin, et retourne un data-URI base64."""
+    try:
+        from PIL import Image
+        p = Path(image_path)
+        if not p.exists():
+            return ""
+        with Image.open(p) as img:
+            if img.width > max_width:
+                ratio = max_width / img.width
+                new_h = max(1, int(img.height * ratio))
+                img = img.resize((max_width, new_h), Image.LANCZOS)
+            # Convertir en RGB pour éviter les problèmes de mode (RGBA, palette…)
+            if img.mode not in ("RGB", "L"):
+                img = img.convert("RGB")
+            buf = io.BytesIO()
+            fmt = "JPEG" if p.suffix.lower() in (".jpg", ".jpeg") else "PNG"
+            img.save(buf, format=fmt, optimize=True, quality=85)
+            b64 = base64.b64encode(buf.getvalue()).decode("ascii")
+            mime = "image/jpeg" if fmt == "JPEG" else "image/png"
+            return f"data:{mime};base64,{b64}"
+    except Exception:  # noqa: BLE001 — fallback silencieux côté report
+        return ""
+def encode_images_b64_from_result(
+    benchmark: "BenchmarkResult", max_width: int = 1200,
+) -> dict[str, str]:
+    """Encode toutes les images d'un BenchmarkResult en base64.
+    Returns
+    -------
+    dict
+        ``{doc_id: data_uri}``
+    """
+    images: dict[str, str] = {}
+    if not benchmark.engine_reports:
+        return images
+    for dr in benchmark.engine_reports[0].document_results:
+        if dr.image_path and dr.doc_id not in images:
+            uri = encode_image_b64(dr.image_path, max_width=max_width)
+            if uri:
+                images[dr.doc_id] = uri
+    return images
+def externalize_images_to_dir(
+    benchmark: "BenchmarkResult",
+    output_dir: Path,
+    max_width: int = 1200,
+    asset_subdir: str = "report-assets",
+) -> dict[str, str]:
+    """Sprint A5 (item M-16) — écrit les images sur disque dans un
+    sous-dossier à côté du HTML, et retourne ``{doc_id: url_relative}``.
+    Mode « lazy loading » : au lieu d'embarquer chaque image en
+    base64 dans le HTML (50 MB+ pour un corpus de 100 documents,
+    ~200 MB+ pour 1 000 documents), on les externalise en fichiers
+    PNG/JPEG locaux. Le HTML les référence via
+    ``<img src="report-assets/…">`` avec ``loading="lazy"`` côté
+    navigateur.
+    Le rapport reste auto-portant si l'utilisateur copie le dossier
+    ``report-assets/`` à côté du HTML (cf. CLI ``--lazy-images``).
+    Parameters
+    ----------
+    benchmark:
+        Résultat de benchmark (lit ``image_path`` de chaque DocumentResult).
+    output_dir:
+        Dossier où le HTML sera écrit ; le sous-dossier d'assets sera
+        créé à côté.
+    max_width:
+        Largeur max du redimensionnement (cohérent avec
+        ``encode_image_b64``).
+    asset_subdir:
+        Nom du sous-dossier d'assets (défaut ``"report-assets"``).
+    Returns
+    -------
+    dict[str, str]
+        ``{doc_id: "report-assets/<doc_id>.png"}`` (URL relative
+        consommable directement dans un attribut HTML ``src``).
+    """
+    from PIL import Image
+    assets_dir = output_dir / asset_subdir
+    assets_dir.mkdir(parents=True, exist_ok=True)
+    out: dict[str, str] = {}
+    seen_ids: set[str] = set()
+    for engine_report in benchmark.engine_reports:
+        for dr in engine_report.document_results:
+            doc_id = dr.doc_id
+            if doc_id in seen_ids:
+                continue
+            seen_ids.add(doc_id)
+            try:
+                src = Path(dr.image_path)
+                if not src.exists():
+                    continue
+                # Nom de fichier dérivé du doc_id, normalisé sans
+                # caractères dangereux pour le filesystem.
+                safe_id = "".join(
+                    c if c.isalnum() or c in "._-" else "_" for c in doc_id
+                )
+                dest = assets_dir / f"{safe_id}{src.suffix.lower() or '.png'}"
+                with Image.open(src) as img:
+                    if img.width > max_width:
+                        ratio = max_width / img.width
+                        new_h = max(1, int(img.height * ratio))
+                        img = img.resize((max_width, new_h), Image.LANCZOS)
+                    if img.mode not in ("RGB", "L"):
+                        img = img.convert("RGB")
+                    fmt = "JPEG" if dest.suffix in (".jpg", ".jpeg") else "PNG"
+                    img.save(dest, format=fmt, optimize=True, quality=85)
+                # URL relative (POSIX style même sur Windows pour HTML).
+                out[doc_id] = f"{asset_subdir}/{dest.name}"
+            except Exception as exc:  # noqa: BLE001 — fallback silencieux + warning
+                logger.warning(
+                    "[report] échec d'externalisation de l'image %s : %s — "
+                    "le rapport ignorera cette image",
+                    dr.image_path,
+                    exc,
+                )
+    return out
+__all__ = [
+    "load_vendor_js",
+    "encode_image_b64",
+    "encode_images_b64_from_result",
+    "externalize_images_to_dir",
+]

picarones/report/generator.py CHANGED Viewed

@@ -11,667 +11,51 @@ Vues disponibles
 2. Galerie     — grille d'images avec badge CER coloré
 3. Document    — image zoomable + diff coloré GT / OCR par moteur
 4. Analyses    — histogramme CER + graphique radar
 """
 from __future__ import annotations
-import base64
-import io
 import json
 import logging
 from pathlib import Path
 from typing import Any, Optional
-logger = logging.getLogger(__name__)
-# ---------------------------------------------------------------------------
-# Ressources vendor (embarquées dans le rapport HTML)
-# ---------------------------------------------------------------------------
-_VENDOR_DIR = Path(__file__).parent / "vendor"
-def _load_vendor_js(name: str) -> str:
-    """Lit un fichier JS vendorisé et retourne son contenu."""
-    p = _VENDOR_DIR / name
-    if p.exists():
-        return p.read_text(encoding="utf-8")
-    return f"/* vendor/{name} non trouvé */"
 from picarones.core.results import BenchmarkResult
-from picarones.core.diff_utils import compute_char_diff, compute_word_diff
-from picarones.measurements.statistics import (
-    compute_pairwise_stats,
-    compute_reliability_curve,
-    compute_correlation_matrix,
-    compute_venn_data,
-    cluster_errors,
-    bootstrap_ci,
-    friedman_test,
-    nemenyi_posthoc,
-    build_critical_difference_svg,
-    compute_pareto_front,
 )
-from picarones.measurements.pricing import build_costs_for_benchmark, load_pricing_database
-from picarones.measurements.difficulty import compute_all_difficulties, difficulty_label
-# ---------------------------------------------------------------------------
-# Helpers
-# ---------------------------------------------------------------------------
-def _encode_image_b64(image_path: str, max_width: int = 1200) -> str:
-    """Lit une image, la redimensionne si besoin, et retourne un data-URI base64."""
-    try:
-        from PIL import Image
-        p = Path(image_path)
-        if not p.exists():
-            return ""
-        with Image.open(p) as img:
-            if img.width > max_width:
-                ratio = max_width / img.width
-                new_h = max(1, int(img.height * ratio))
-                img = img.resize((max_width, new_h), Image.LANCZOS)
-            # Convertir en RGB pour éviter les problèmes de mode (RGBA, palette…)
-            if img.mode not in ("RGB", "L"):
-                img = img.convert("RGB")
-            buf = io.BytesIO()
-            fmt = "JPEG" if p.suffix.lower() in (".jpg", ".jpeg") else "PNG"
-            img.save(buf, format=fmt, optimize=True, quality=85)
-            b64 = base64.b64encode(buf.getvalue()).decode("ascii")
-            mime = "image/jpeg" if fmt == "JPEG" else "image/png"
-            return f"data:{mime};base64,{b64}"
-    except Exception:
-        return ""
-def _externalize_images_to_dir(
-    benchmark: "BenchmarkResult",
-    output_dir: Path,
-    max_width: int = 1200,
-    asset_subdir: str = "report-assets",
-) -> dict[str, str]:
-    """Sprint A5 (item M-16) — écrit les images sur disque dans un
-    sous-dossier à côté du HTML, et retourne ``{doc_id: url_relative}``.
-    Mode « lazy loading » : au lieu d'embarquer chaque image en
-    base64 dans le HTML (50 MB+ pour un corpus de 100 documents,
-    ~200 MB+ pour 1 000 documents), on les externalise en fichiers
-    PNG/JPEG locaux. Le HTML les référence via ``<img src="report-assets/…">``
-    avec ``loading="lazy"`` côté navigateur.
-    Le rapport reste auto-portant si l'utilisateur copie le dossier
-    ``report-assets/`` à côté du HTML (cf. CLI ``--lazy-images``).
-    Parameters
-    ----------
-    benchmark:
-        Résultat de benchmark (lit ``image_path`` de chaque DocumentResult).
-    output_dir:
-        Dossier où le HTML sera écrit ; le sous-dossier d'assets sera
-        créé à côté.
-    max_width:
-        Largeur max du redimensionnement (cohérent avec
-        ``_encode_image_b64``).
-    asset_subdir:
-        Nom du sous-dossier d'assets (défaut ``"report-assets"``).
-    Returns
-    -------
-    dict[str, str]
-        ``{doc_id: "report-assets/<doc_id>.png"}`` (URL relative
-        consommable directement dans un attribut HTML ``src``).
-    """
-    from PIL import Image
-    assets_dir = output_dir / asset_subdir
-    assets_dir.mkdir(parents=True, exist_ok=True)
-    out: dict[str, str] = {}
-    seen_ids: set[str] = set()
-    for engine_report in benchmark.engine_reports:
-        for dr in engine_report.document_results:
-            doc_id = dr.doc_id
-            if doc_id in seen_ids:
-                continue
-            seen_ids.add(doc_id)
-            try:
-                src = Path(dr.image_path)
-                if not src.exists():
-                    continue
-                # Nom de fichier dérivé du doc_id, normalisé sans
-                # caractères dangereux pour le filesystem.
-                safe_id = "".join(
-                    c if c.isalnum() or c in "._-" else "_" for c in doc_id
-                )
-                dest = assets_dir / f"{safe_id}{src.suffix.lower() or '.png'}"
-                with Image.open(src) as img:
-                    if img.width > max_width:
-                        ratio = max_width / img.width
-                        new_h = max(1, int(img.height * ratio))
-                        img = img.resize((max_width, new_h), Image.LANCZOS)
-                    if img.mode not in ("RGB", "L"):
-                        img = img.convert("RGB")
-                    fmt = "JPEG" if dest.suffix in (".jpg", ".jpeg") else "PNG"
-                    img.save(dest, format=fmt, optimize=True, quality=85)
-                # URL relative (POSIX style même sur Windows pour HTML).
-                out[doc_id] = f"{asset_subdir}/{dest.name}"
-            except Exception as exc:  # noqa: BLE001 — fallback silencieux + warning
-                logger.warning(
-                    "[report] échec d'externalisation de l'image %s : %s — "
-                    "le rapport ignorera cette image",
-                    dr.image_path,
-                    exc,
-                )
-    return out
-def _encode_images_b64_from_result(benchmark: "BenchmarkResult", max_width: int = 1200) -> dict[str, str]:
-    """Encode toutes les images d'un BenchmarkResult en base64.
-    Returns
-    -------
-    dict
-        ``{doc_id: data_uri}``
-    """
-    images: dict[str, str] = {}
-    if not benchmark.engine_reports:
-        return images
-    for dr in benchmark.engine_reports[0].document_results:
-        if dr.image_path and dr.doc_id not in images:
-            uri = _encode_image_b64(dr.image_path, max_width=max_width)
-            if uri:
-                images[dr.doc_id] = uri
-    return images
-def _cer_color(cer: float) -> str:
-    """Retourne une couleur CSS pour un score CER donné (0→vert, 1→rouge)."""
-    from picarones.report.colors import COLOR_GREEN, COLOR_YELLOW, COLOR_ORANGE, COLOR_RED
-    if cer < 0.05:
-        return COLOR_GREEN
-    if cer < 0.15:
-        return COLOR_YELLOW
-    if cer < 0.30:
-        return COLOR_ORANGE
-    return COLOR_RED
-def _cer_bg(cer: float) -> str:
-    from picarones.report.colors import BG_GREEN, BG_YELLOW, BG_ORANGE, BG_RED
-    if cer < 0.05:
-        return BG_GREEN
-    if cer < 0.15:
-        return BG_YELLOW
-    if cer < 0.30:
-        return BG_ORANGE
-    return BG_RED
-def _pct(v: Optional[float], decimals: int = 2) -> str:
-    if v is None:
-        return "—"
-    return f"{v * 100:.{decimals}f} %"
-def _safe(v: Optional[float], decimals: int = 4) -> float:
-    return round(v or 0.0, decimals)
-# ---------------------------------------------------------------------------
-# Préparation des données
-# ---------------------------------------------------------------------------
-def _build_report_data(benchmark: BenchmarkResult, images_b64: dict[str, str]) -> dict:
-    """Transforme un BenchmarkResult en dict JSON pour le rapport HTML."""
-    engines_summary = []
-    for report in benchmark.engine_reports:
-        agg = report.aggregated_metrics
-        diplo_agg = agg.get("cer_diplomatic", {})
-        entry: dict = {
-            "name": report.engine_name,
-            "version": report.engine_version,
-            "cer":  _safe(agg.get("cer", {}).get("mean")),
-            "wer":  _safe(agg.get("wer", {}).get("mean")),
-            "mer":  _safe(agg.get("mer", {}).get("mean")),
-            "wil":  _safe(agg.get("wil", {}).get("mean")),
-            "cer_median": _safe(agg.get("cer", {}).get("median")),
-            "cer_min":    _safe(agg.get("cer", {}).get("min")),
-            "cer_max":    _safe(agg.get("cer", {}).get("max")),
-            "doc_count":  agg.get("document_count", 0),
-            "failed":     agg.get("failed_count", 0),
-            # CER diplomatique (après normalisation historique : ſ=s, u=v, i=j…)
-            "cer_diplomatic": _safe(diplo_agg.get("mean")) if diplo_agg else None,
-            "cer_diplomatic_profile": diplo_agg.get("profile"),
-            # Distribution pour l'histogramme : liste des CER individuels
-            "cer_values": [
-                _safe(dr.metrics.cer)
-                for dr in report.document_results
-                if dr.metrics.error is None
-            ],
-            "cer_diplomatic_values": [
-                _safe(dr.metrics.cer_diplomatic)
-                for dr in report.document_results
-                if dr.metrics.error is None and dr.metrics.cer_diplomatic is not None
-            ],
-            # Champs pipeline OCR+LLM (vides pour les moteurs OCR seuls)
-            "is_pipeline": report.is_pipeline,
-            "pipeline_info": report.pipeline_info,
-            # Sprint 5 — métriques avancées patrimoniales
-            "ligature_score": _safe(report.ligature_score) if report.ligature_score is not None else None,
-            "diacritic_score": _safe(report.diacritic_score) if report.diacritic_score is not None else None,
-            "aggregated_confusion": report.aggregated_confusion,
-            "aggregated_taxonomy": report.aggregated_taxonomy,
-            "aggregated_structure": report.aggregated_structure,
-            "aggregated_image_quality": report.aggregated_image_quality,
-            # Sprint 10 — distribution des erreurs + hallucinations VLM
-            "gini": _safe(report.aggregated_line_metrics.get("gini_mean")) if report.aggregated_line_metrics else None,
-            "cer_p90": _safe(report.aggregated_line_metrics.get("percentiles", {}).get("p90")) if report.aggregated_line_metrics else None,
-            "cer_p99": _safe(report.aggregated_line_metrics.get("percentiles", {}).get("p99")) if report.aggregated_line_metrics else None,
-            "catastrophic_rate_30": _safe(report.aggregated_line_metrics.get("catastrophic_rate", {}).get("0.3")) if report.aggregated_line_metrics else None,
-            "aggregated_line_metrics": report.aggregated_line_metrics,
-            "anchor_score": _safe(report.aggregated_hallucination.get("anchor_score_mean")) if report.aggregated_hallucination else None,
-            "length_ratio": _safe(report.aggregated_hallucination.get("length_ratio_mean")) if report.aggregated_hallucination else None,
-            "hallucinating_doc_rate": _safe(report.aggregated_hallucination.get("hallucinating_doc_rate")) if report.aggregated_hallucination else None,
-            "aggregated_hallucination": report.aggregated_hallucination,
-            # Sprint 41 — NER agrégé (None si aucun calcul effectué)
-            "aggregated_ner": report.aggregated_ner,
-            # Sprint 43 — calibration agrégée (None si aucune confidence
-            # n'a été exposée par le moteur sur ce corpus)
-            "aggregated_calibration": report.aggregated_calibration,
-            # Sprint 62 — profil philologique agrégé (None si aucun
-            # signal philologique sur le corpus pour ce moteur)
-            "aggregated_philological": report.aggregated_philological,
-            # Sprint 86 — A.II.5 (recherchabilité fuzzy + séquences
-            # numériques). None si aucun document n'a de signal.
-            "aggregated_searchability": report.aggregated_searchability,
-            "aggregated_numerical_sequences": (
-                report.aggregated_numerical_sequences
-            ),
-            # Sprint 87 — A.II.2 (delta Flesch agrégé)
-            "aggregated_readability": report.aggregated_readability,
-            "is_vlm": report.pipeline_info.get("is_vlm", False) if report.pipeline_info else False,
-        }
-        engines_summary.append(entry)
-    # Documents (vue galerie + vue détail)
-    # On collecte tous les doc_ids depuis l'union de tous les moteurs,
-    # en préservant l'ordre d'apparition (premier moteur d'abord, puis compléments).
-    seen_doc_ids: set[str] = set()
-    doc_ids_ordered: list[str] = []
-    for report in benchmark.engine_reports:
-        for dr in report.document_results:
-            if dr.doc_id not in seen_doc_ids:
-                seen_doc_ids.add(dr.doc_id)
-                doc_ids_ordered.append(dr.doc_id)
-    # Index croisé : doc_id → {engine_name → DocumentResult}
-    doc_engine_map: dict[str, dict] = {did: {} for did in doc_ids_ordered}
-    for report in benchmark.engine_reports:
-        for dr in report.document_results:
-            doc_engine_map.setdefault(dr.doc_id, {})[report.engine_name] = dr
-    documents = []
-    for doc_id in doc_ids_ordered:
-        engine_results = []
-        gt = ""
-        image_path = ""
-        for engine_name in [r.engine_name for r in benchmark.engine_reports]:
-            dr = doc_engine_map[doc_id].get(engine_name)
-            if dr is None:
-                continue
-            gt = dr.ground_truth
-            image_path = dr.image_path
-            diff_ops = compute_char_diff(dr.ground_truth, dr.hypothesis)
-            er_entry: dict = {
-                "engine": engine_name,
-                "hypothesis": dr.hypothesis,
-                "cer": _safe(dr.metrics.cer),
-                "cer_diplomatic": _safe(dr.metrics.cer_diplomatic) if dr.metrics.cer_diplomatic is not None else None,
-                "wer": _safe(dr.metrics.wer),
-                "mer": _safe(dr.metrics.mer),
-                "wil": _safe(dr.metrics.wil),
-                "duration": dr.duration_seconds,
-                "error": dr.engine_error,
-                "diff": diff_ops,
-            }
-            # Champs spécifiques aux pipelines OCR+LLM
-            if dr.ocr_intermediate is not None:
-                er_entry["ocr_intermediate"] = dr.ocr_intermediate
-                er_entry["ocr_diff"] = compute_word_diff(dr.ground_truth, dr.ocr_intermediate)
-                er_entry["llm_correction_diff"] = compute_word_diff(dr.ocr_intermediate, dr.hypothesis)
-            if dr.pipeline_metadata:
-                on = dr.pipeline_metadata.get("over_normalization")
-                if on is not None:
-                    er_entry["over_normalization"] = on
-                er_entry["pipeline_mode"] = dr.pipeline_metadata.get("pipeline_mode")
-            # Sprint 5 — métriques avancées par document
-            if dr.char_scores is not None:
-                er_entry["ligature_score"] = _safe(dr.char_scores.get("ligature", {}).get("score"))
-                er_entry["diacritic_score"] = _safe(dr.char_scores.get("diacritic", {}).get("score"))
-            if dr.taxonomy is not None:
-                er_entry["taxonomy"] = dr.taxonomy
-            if dr.structure is not None:
-                er_entry["structure"] = dr.structure
-            if dr.image_quality is not None:
-                er_entry["image_quality"] = dr.image_quality
-            # Sprint 10
-            if dr.line_metrics is not None:
-                er_entry["line_metrics"] = dr.line_metrics
-            if dr.hallucination_metrics is not None:
-                er_entry["hallucination_metrics"] = dr.hallucination_metrics
-            engine_results.append(er_entry)
-        # CER moyen sur ce document (pour le badge galerie)
-        cer_values = [er["cer"] for er in engine_results if er["error"] is None]
-        mean_cer = sum(cer_values) / len(cer_values) if cer_values else 1.0
-        best_engine = min(engine_results, key=lambda x: x["cer"], default=None)
-        # Script type (depuis metadata par document si disponible)
-        script_type = ""
-        first_dr = doc_engine_map[doc_id].get(
-            benchmark.engine_reports[0].engine_name if benchmark.engine_reports else None
-        )
-        if first_dr and first_dr.image_quality:
-            script_type = first_dr.image_quality.get("script_type", "")
-        documents.append({
-            "doc_id": doc_id,
-            "image_path": image_path,
-            "image_b64": images_b64.get(doc_id, ""),
-            "ground_truth": gt,
-            "mean_cer": _safe(mean_cer),
-            "best_engine": best_engine["engine"] if best_engine else "",
-            "engine_results": engine_results,
-            "script_type": script_type,
-        })
-    # ── Sprint 7 — Score de difficulté intrinsèque ───────────────────────
-    gt_map = {d["doc_id"]: d["ground_truth"] for d in documents}
-    cer_map: dict[str, dict[str, float]] = {d["doc_id"]: {} for d in documents}
-    iq_map: dict[str, float] = {}
-    for report in benchmark.engine_reports:
-        for dr in report.document_results:
-            cer_map.setdefault(dr.doc_id, {})[report.engine_name] = _safe(dr.metrics.cer)
-            if dr.image_quality and "quality_score" in dr.image_quality:
-                iq_map[dr.doc_id] = dr.image_quality["quality_score"]
-    difficulty_scores = compute_all_difficulties(
-        doc_ids=doc_ids_ordered,
-        ground_truths=gt_map,
-        cer_map=cer_map,
-        image_quality_map=iq_map or None,
-    )
-    # Ajouter difficulty_score à chaque document
-    for doc in documents:
-        ds = difficulty_scores.get(doc["doc_id"])
-        if ds:
-            doc["difficulty_score"] = _safe(ds.score)
-            doc["difficulty_label"] = difficulty_label(ds.score)
-        else:
-            doc["difficulty_score"] = 0.5
-            doc["difficulty_label"] = "Modéré"
-    # ── Sprint 7 — Tests statistiques (Wilcoxon pairwise + bootstrap CI) ─
-    engine_cer_map_stats: dict[str, list[float]] = {}
-    for report in benchmark.engine_reports:
-        vals = [_safe(dr.metrics.cer) for dr in report.document_results if dr.metrics.error is None]
-        if vals:
-            engine_cer_map_stats[report.engine_name] = vals
-    pairwise_stats = compute_pairwise_stats(engine_cer_map_stats)
-    # ── Sprint 17 — Friedman + Nemenyi ──────────────────────────────────
-    # Alignement strict sur le même ordre de documents : on reconstruit la
-    # map à partir des documents communs à tous les moteurs, sinon Friedman
-    # n'est pas applicable.
-    engine_cer_aligned: dict[str, list[float]] = {}
-    common_doc_ids: Optional[set[str]] = None
-    for report in benchmark.engine_reports:
-        doc_ids = {dr.doc_id for dr in report.document_results if dr.metrics.error is None}
-        common_doc_ids = doc_ids if common_doc_ids is None else common_doc_ids & doc_ids
-    if common_doc_ids:
-        ordered_common = [d for d in doc_ids_ordered if d in common_doc_ids]
-        for report in benchmark.engine_reports:
-            dr_by_id = {dr.doc_id: dr for dr in report.document_results}
-            engine_cer_aligned[report.engine_name] = [
-                _safe(dr_by_id[d].metrics.cer) for d in ordered_common
-            ]
-    friedman = friedman_test(engine_cer_aligned) if engine_cer_aligned else {
-        "statistic": 0.0, "p_value": 1.0, "significant": False,
-        "df": 0, "n_blocks": 0, "n_engines": 0, "mean_ranks": {},
-        "interpretation": "Test de Friedman non calculé — aucun document commun.",
-        "error": "no_common_documents",
-    }
-    nemenyi = nemenyi_posthoc(engine_cer_aligned) if engine_cer_aligned else {
-        "alpha": 0.05, "critical_distance": 0.0, "q_alpha": 0.0,
-        "n_blocks": 0, "n_engines": 0, "mean_ranks": {},
-        "engines_sorted": [], "significant_matrix": [], "tied_groups": [],
-        "error": "no_common_documents",
-    }
-    bootstrap_cis: list[dict] = []
-    for engine_name, vals in engine_cer_map_stats.items():
-        lo, hi = bootstrap_ci(vals)
-        mean_v = sum(vals) / len(vals) if vals else 0.0
-        bootstrap_cis.append({
-            "engine": engine_name,
-            "mean": _safe(mean_v),
-            "ci_lower": _safe(lo),
-            "ci_upper": _safe(hi),
-        })
-    # ── Sprint 7 — Courbes de fiabilité ──────────────────────────────────
-    reliability_curves: list[dict] = []
-    for report in benchmark.engine_reports:
-        vals = [_safe(dr.metrics.cer) for dr in report.document_results if dr.metrics.error is None]
-        curve = compute_reliability_curve(vals)
-        reliability_curves.append({
-            "engine": report.engine_name,
-            "points": curve,
-        })
-    # ── Sprint 7 — Venn des erreurs communes / exclusives ────────────────
-    # Construire les ensembles d'erreurs par moteur : {engine → set(doc_id:gt_tok:hyp_tok)}
-    venn_error_sets: dict[str, set[str]] = {}
-    for report in benchmark.engine_reports:
-        error_set: set[str] = set()
-        for dr in report.document_results:
-            ops = compute_word_diff(dr.ground_truth, dr.hypothesis)
-            for op in ops:
-                if op["op"] in ("replace", "delete", "insert"):
-                    key = f"{dr.doc_id}:{op.get('old', op.get('text',''))}:{op.get('new', op.get('text',''))}"
-                    error_set.add(key)
-        venn_error_sets[report.engine_name] = error_set
-    venn_data = compute_venn_data(venn_error_sets)
-    # ── Sprint 7 — Clustering des patterns d'erreurs ─────────────────────
-    error_data_all: list[dict] = []
-    for report in benchmark.engine_reports:
-        for dr in report.document_results:
-            error_data_all.append({
-                "engine": report.engine_name,
-                "gt": dr.ground_truth,
-                "hypothesis": dr.hypothesis,
-            })
-    error_clusters_raw = cluster_errors(error_data_all, max_clusters=8)
-    error_clusters = [c.as_dict() for c in error_clusters_raw]
-    # ── Sprint 7 — Matrice de corrélation ────────────────────────────────
-    # Pour chaque moteur : une liste de dicts métriques par document
-    correlation_per_engine: list[dict] = []
-    for report in benchmark.engine_reports:
-        metrics_list = []
-        for dr in report.document_results:
-            if dr.metrics.error is not None:
-                continue
-            entry: dict[str, float] = {
-                "cer": _safe(dr.metrics.cer),
-                "wer": _safe(dr.metrics.wer),
-                "mer": _safe(dr.metrics.mer),
-                "wil": _safe(dr.metrics.wil),
-            }
-            if dr.image_quality:
-                entry["quality_score"] = _safe(dr.image_quality.get("quality_score", 0.5))
-                entry["sharpness"] = _safe(dr.image_quality.get("sharpness_score", 0.5))
-            if dr.char_scores:
-                entry["ligature"] = _safe(dr.char_scores.get("ligature", {}).get("score", 0.5))
-                entry["diacritic"] = _safe(dr.char_scores.get("diacritic", {}).get("score", 0.5))
-            metrics_list.append(entry)
-        if metrics_list:
-            corr = compute_correlation_matrix(metrics_list)
-            correlation_per_engine.append({
-                "engine": report.engine_name,
-                **corr,
-            })
-    # ── Sprint 10 — Données scatter plots ─────────────────────────────────
-    # Scatter 1 : Gini vs CER moyen (moteurs)
-    gini_vs_cer = []
-    for report in benchmark.engine_reports:
-        gini_val = report.aggregated_line_metrics.get("gini_mean") if report.aggregated_line_metrics else None
-        cer_val = report.mean_cer
-        if gini_val is not None and cer_val is not None:
-            gini_vs_cer.append({
-                "engine": report.engine_name,
-                "cer": _safe(cer_val),
-                "gini": _safe(gini_val),
-                "is_pipeline": report.is_pipeline,
-            })
-    # ── Sprint 19 — Coûts et frontière de Pareto ────────────────────────
-    # Durée moyenne mesurée par moteur sur le benchmark courant (sec/page)
-    durations_by_engine: dict[str, float] = {}
-    for report in benchmark.engine_reports:
-        durs = [dr.duration_seconds for dr in report.document_results
-                if dr.duration_seconds is not None]
-        if durs:
-            durations_by_engine[report.engine_name] = sum(durs) / len(durs)
-    pricing_defaults, _ = load_pricing_database()
-    costs_by_engine = build_costs_for_benchmark(
-        engines_summary, durations_by_engine,
-    )
-    # Annoter chaque résumé moteur avec son coût et sa durée
-    for entry in engines_summary:
-        name = entry["name"]
-        entry["mean_duration_seconds"] = round(durations_by_engine.get(name, 0.0), 4) \
-            if name in durations_by_engine else None
-        entry["cost"] = costs_by_engine.get(name)
-    # Front Pareto sur (CER moyen, coût €/1000 pages) — moteurs avec les deux dispos
-    pareto_points = []
-    for entry in engines_summary:
-        cer = entry.get("cer")
-        cost = (entry.get("cost") or {}).get("cost_per_1k_pages_eur")
-        if cer is None or cost is None:
-            continue
-        pareto_points.append({"engine": entry["name"], "cer": cer, "cost": cost})
-    pareto_front_engines = compute_pareto_front(
-        pareto_points, objectives=("cer", "cost"),
-    )
-    # Front Pareto secondaire (CER, vitesse) pour le toggle "vitesse"
-    pareto_speed_points = []
-    for entry in engines_summary:
-        cer = entry.get("cer")
-        dur = entry.get("mean_duration_seconds")
-        if cer is None or dur is None:
-            continue
-        pareto_speed_points.append({"engine": entry["name"], "cer": cer, "dur": dur})
-    pareto_front_speed = compute_pareto_front(
-        pareto_speed_points, objectives=("cer", "dur"),
-    )
-    # Front Pareto carbone (CER, g CO2 / 1000 pages) — étiqueté expérimental
-    pareto_co2_points = []
-    for entry in engines_summary:
-        cer = entry.get("cer")
-        co2 = (entry.get("cost") or {}).get("co2_per_1k_pages_g")
-        if cer is None or co2 is None:
-            continue
-        pareto_co2_points.append({"engine": entry["name"], "cer": cer, "co2": co2})
-    pareto_front_co2 = compute_pareto_front(
-        pareto_co2_points, objectives=("cer", "co2"),
-    )
-    pareto_data = {
-        "cost": {
-            "points": pareto_points,
-            "front": pareto_front_engines,
-            "axis_label": "Coût (€ / 1000 pages)",
-        },
-        "speed": {
-            "points": pareto_speed_points,
-            "front": pareto_front_speed,
-            "axis_label": "Temps moyen (s / page)",
-        },
-        "co2": {
-            "points": pareto_co2_points,
-            "front": pareto_front_co2,
-            "axis_label": "Empreinte carbone (g CO₂ / 1000 pages, expérimental)",
-        },
-        "pricing_meta": {
-            "last_updated": pricing_defaults.last_updated,
-            "currency": pricing_defaults.currency,
-            "hourly_rate_local_cpu_eur": pricing_defaults.hourly_rate_local_cpu_eur,
-            "hourly_rate_local_gpu_eur": pricing_defaults.hourly_rate_local_gpu_eur,
-            "grid_intensity_local": pricing_defaults.grid_intensity_local,
-            "grid_intensity_cloud": pricing_defaults.grid_intensity_cloud,
-        },
-    }
-    # Scatter 2 : ratio longueur vs score d'ancrage (moteurs)
-    ratio_vs_anchor = []
-    for report in benchmark.engine_reports:
-        if report.aggregated_hallucination:
-            ratio_vs_anchor.append({
-                "engine": report.engine_name,
-                "length_ratio": _safe(report.aggregated_hallucination.get("length_ratio_mean", 1.0)),
-                "anchor_score": _safe(report.aggregated_hallucination.get("anchor_score_mean", 1.0)),
-                "hallucinating_rate": _safe(report.aggregated_hallucination.get("hallucinating_doc_rate", 0.0)),
-                "is_vlm": report.pipeline_info.get("is_vlm", False) if report.pipeline_info else False,
-            })
-    return {
-        "meta": {
-            "corpus_name": benchmark.corpus_name,
-            "corpus_source": benchmark.corpus_source,
-            "document_count": benchmark.document_count,
-            "run_date": benchmark.run_date,
-            "picarones_version": benchmark.picarones_version,
-            "metadata": benchmark.metadata,
-        },
-        "ranking": benchmark.ranking(),
-        "engines": engines_summary,
-        "documents": documents,
-        # Sprint 7
-        "statistics": {
-            "pairwise_wilcoxon": pairwise_stats,
-            "bootstrap_cis": bootstrap_cis,
-            # Sprint 17 — Friedman multi-moteurs + post-hoc Nemenyi + CDD
-            "friedman": friedman,
-            "nemenyi": nemenyi,
-        },
-        "reliability_curves": reliability_curves,
-        "venn_data": venn_data,
-        "error_clusters": error_clusters,
-        "correlation_per_engine": correlation_per_engine,
-        # Sprint 10
-        "gini_vs_cer": gini_vs_cer,
-        "ratio_vs_anchor": ratio_vs_anchor,
-        # Sprint 19 — vue Pareto coût/qualité avec variantes d'axe
-        "pareto": pareto_data,
-        # Sprint 36 — analyse inter-moteurs (divergence taxonomique +
-        # complémentarité / oracle).  ``None`` si moins de 2 moteurs.
-        "inter_engine_analysis": benchmark.inter_engine_analysis,
-        # Sprint 45-46 — stratification par script_type
-        "available_strata": benchmark.available_strata(),
-        "stratified_ranking": benchmark.stratified_ranking() or None,
-        "corpus_homogeneity": benchmark.corpus_homogeneity(),
-    }
 # ---------------------------------------------------------------------------
@@ -691,8 +75,8 @@ def _build_jinja_env():
     Autoescape désactivé : le comportement est équivalent à celui du
     ``_HTML_TEMPLATE.format()`` historique. Les variables injectées
     (JSON embarqué, SVG généré, synthèse narrative issue de templates
-    internes) sont toutes produites par le code Picarones et ne nécessitent
-    pas d'échappement HTML.
     """
     from jinja2 import Environment, FileSystemLoader
     env = Environment(
@@ -834,174 +218,158 @@ class ReportGenerator:
         glossary = load_glossary(self.lang)
         glossary_json = json.dumps(glossary, ensure_ascii=False, separators=(",", ":"))
-        # Sprint 37 — section inter-moteurs (matrice de divergence + oracle)
-        # rendue côté serveur. Vide si moins de 2 moteurs ou taxonomie absente.
         from picarones.report.inter_engine_render import (
             build_divergence_matrix_html,
             build_oracle_gap_html,
         )
-        divergence_matrix_html = build_divergence_matrix_html(
-            report_data.get("inter_engine_analysis"),
-            labels=labels,
-        )
-        oracle_gap_html = build_oracle_gap_html(
-            report_data.get("inter_engine_analysis"),
-            labels=labels,
-        )
-        # Sprint 41 — section NER (résumé F1 par moteur + heatmap par
-        # catégorie). Vide si aucun moteur n'a de aggregated_ner.
         from picarones.report.ner_render import (
             build_ner_per_category_html,
             build_ner_summary_html,
         )
-        ner_summary_html = build_ner_summary_html(
-            report_data.get("engines", []),
-            labels=labels,
-        )
-        ner_per_category_html = build_ner_per_category_html(
-            report_data.get("engines", []),
-            labels=labels,
-        )
         # Sprint 43 — section calibration (tableau ECE/MCE + grille de
-        # reliability diagrams par moteur). Vide si aucun moteur n'a
-        # de aggregated_calibration.
         from picarones.report.calibration_render import (
             build_calibration_summary_html,
             build_reliability_diagrams_grid_html,
         )
-        calibration_summary_html = build_calibration_summary_html(
-            report_data.get("engines", []),
-            labels=labels,
-        )
-        reliability_diagrams_html = build_reliability_diagrams_grid_html(
-            report_data.get("engines", []),
-            labels=labels,
-        )
-        # Sprint 46 — section stratifiée (tableau par strate). Vide si
-        # aucune strate disponible.
         from picarones.report.stratification_render import (
             build_stratified_ranking_html,
         )
-        stratified_ranking_html = build_stratified_ranking_html(
-            report_data.get("stratified_ranking"),
-            report_data.get("available_strata"),
-            report_data.get("corpus_homogeneity"),
-            labels=labels,
-        )
-        # Sprint 62 — profil philologique (6 sections adaptive sur les
-        # modules philologiques Sprints 55-60). Vide si aucun moteur
-        # n'a de aggregated_philological.
         from picarones.report.philological_render import (
             build_philological_profile_html,
         )
-        philological_profile_html = build_philological_profile_html(
-            report_data.get("engines", []),
-            labels=labels,
-        )
-        # Sprint 86 — A.II.5 : recherchabilité fuzzy +
-        # séquences numériques. Adaptive : "" si aucun signal.
         from picarones.report.searchability_render import (
             build_searchability_summary_html,
         )
         from picarones.report.numerical_sequences_render import (
             build_numerical_sequences_html,
         )
-        searchability_html = build_searchability_summary_html(
-            report_data.get("engines", []), labels=labels,
-        )
-        numerical_sequences_html = build_numerical_sequences_html(
-            report_data.get("engines", []), labels=labels,
-        )
         # Sprint 87 — A.II.2 : lisibilité (delta Flesch).
-        # Adaptive : "" si aucun moteur n'a de signal.
         from picarones.report.readability_render import (
             build_readability_summary_html,
         )
-        readability_html = build_readability_summary_html(
-            report_data.get("engines", []), labels=labels,
-        )
         # Sprint 89 — A.II.8b : spécialisation inter-moteurs.
-        # Adaptive : "" si moins de 2 moteurs avec taxonomie.
         from picarones.report.specialization_render import (
             build_specialization_html,
         )
-        # Construit une map {engine: counts} depuis les
-        # ``aggregated_taxonomy`` ; un moteur sans taxonomie
-        # est exclu.
-        _taxos: dict = {}
-        for eng in report_data.get("engines", []):
             tax = eng.get("aggregated_taxonomy")
             if isinstance(tax, dict):
                 counts = tax.get("counts") if "counts" in tax else tax
                 if isinstance(counts, dict) and counts:
-                    _taxos[eng.get("name", "?")] = {
                         k: float(v) for k, v in counts.items()
                         if isinstance(v, (int, float))
                     }
-        specialization_html = build_specialization_html(
-            _taxos, labels=labels,
-        )
-        # Chantier 3 (post-Sprint 97) — 3 nouvelles vues thématiques
-        # qui regroupent les renderers orphelins en sections
-        # collapsibles. Adaptive : retourne "" si aucune sous-section
-        # n'a de signal, donc la carte du template est masquée.
-        from picarones.report.views import (
-            build_advanced_taxonomy_view_html,
-            build_diagnostics_view_html,
-            build_economics_view_html,
-        )
-        economics_view_html = build_economics_view_html(
-            report_data, labels=labels,
-            engine_reports=self.benchmark.engine_reports,
-        )
-        advanced_taxonomy_view_html = build_advanced_taxonomy_view_html(
-            report_data, labels=labels,
-        )
-        diagnostics_view_html = build_diagnostics_view_html(
-            report_data, labels=labels,
-        )
-        env = _build_jinja_env()
-        template = env.get_template("base.html.j2")
-        html = template.render(
-            corpus_name=self.benchmark.corpus_name,
-            picarones_version=self.benchmark.picarones_version,
-            report_data_json=report_json,
-            i18n_json=i18n_json,
-            html_lang=labels.get("html_lang", "fr"),
-            chartjs_inline=chartjs_js,
-            critical_difference_svg=cdd_svg,
-            friedman=report_data.get("statistics", {}).get("friedman", {}),
-            synthesis=synthesis,
-            glossary_json=glossary_json,
-            divergence_matrix_html=divergence_matrix_html,
-            oracle_gap_html=oracle_gap_html,
-            ner_summary_html=ner_summary_html,
-            ner_per_category_html=ner_per_category_html,
-            calibration_summary_html=calibration_summary_html,
-            reliability_diagrams_html=reliability_diagrams_html,
-            stratified_ranking_html=stratified_ranking_html,
-            philological_profile_html=philological_profile_html,
-            searchability_html=searchability_html,
-            numerical_sequences_html=numerical_sequences_html,
-            readability_html=readability_html,
-            specialization_html=specialization_html,
             # Chantier 3 — vues thématiques composées
-            economics_view_html=economics_view_html,
-            advanced_taxonomy_view_html=advanced_taxonomy_view_html,
-            diagnostics_view_html=diagnostics_view_html,
-        )
-        output_path.write_text(html, encoding="utf-8")
-        return output_path.resolve()
     @classmethod
     def from_json(cls, json_path: str | Path, **kwargs) -> "ReportGenerator":

 2. Galerie     — grille d'images avec badge CER coloré
 3. Document    — image zoomable + diff coloré GT / OCR par moteur
 4. Analyses    — histogramme CER + graphique radar
+Architecture
+------------
+Ce module est l'**orchestrateur**. Les responsabilités lourdes sont
+découpées en sous-modules :
+- :mod:`picarones.report.assets` — chargement vendor.js, encodage
+  base64 d'images, externalisation lazy.
+- :mod:`picarones.report.report_data` — construction du dict JSON
+  passé au template (engines, documents, statistiques, Pareto, etc.).
+- :mod:`picarones.report.render_helpers` — couleurs / SVG mutualisés.
+Les noms ``_build_report_data``, ``_cer_color``, ``_cer_bg``,
+``_externalize_images_to_dir``, ``_encode_image_b64``,
+``_encode_images_b64_from_result``, ``_load_vendor_js``, ``_pct``,
+``_safe`` sont conservés en alias rétrocompat — plusieurs tests les
+importent directement.
 """
 from __future__ import annotations
 import json
 import logging
 from pathlib import Path
 from typing import Any, Optional
 from picarones.core.results import BenchmarkResult
+from picarones.measurements.statistics import build_critical_difference_svg
+from picarones.report.assets import (
+    encode_image_b64 as _encode_image_b64,
+    encode_images_b64_from_result as _encode_images_b64_from_result,
+    externalize_images_to_dir as _externalize_images_to_dir,
+    load_vendor_js as _load_vendor_js,
+)
+from picarones.report.render_helpers import (
+    cer_step_bg as _cer_bg,
+    cer_step_color as _cer_color,
+)
+from picarones.report.report_data import build_report_data as _build_report_data
+from picarones.report.report_data._helpers import (
+    percent_string as _pct,
+    safe_round as _safe,
 )
+logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
     Autoescape désactivé : le comportement est équivalent à celui du
     ``_HTML_TEMPLATE.format()`` historique. Les variables injectées
     (JSON embarqué, SVG généré, synthèse narrative issue de templates
+    internes) sont toutes produites par le code Picarones et ne
+    nécessitent pas d'échappement HTML.
     """
     from jinja2 import Environment, FileSystemLoader
     env = Environment(
         glossary = load_glossary(self.lang)
         glossary_json = json.dumps(glossary, ensure_ascii=False, separators=(",", ":"))
+        section_html = self._build_section_html(report_data, labels)
+        env = _build_jinja_env()
+        template = env.get_template("base.html.j2")
+        html = template.render(
+            corpus_name=self.benchmark.corpus_name,
+            picarones_version=self.benchmark.picarones_version,
+            report_data_json=report_json,
+            i18n_json=i18n_json,
+            html_lang=labels.get("html_lang", "fr"),
+            chartjs_inline=chartjs_js,
+            critical_difference_svg=cdd_svg,
+            friedman=report_data.get("statistics", {}).get("friedman", {}),
+            synthesis=synthesis,
+            glossary_json=glossary_json,
+            **section_html,
+        )
+        output_path.write_text(html, encoding="utf-8")
+        return output_path.resolve()
+    def _build_section_html(
+        self, report_data: dict, labels: dict[str, str],
+    ) -> dict[str, str]:
+        """Construit toutes les sections HTML conditionnelles du rapport.
+        Chaque renderer (NER, calibration, philologie, etc.) est appelé
+        de manière indépendante. Une section retourne ``""`` si aucun
+        moteur n'a de signal pour elle — le template gère l'affichage
+        conditionnel.
+        Returns
+        -------
+        dict[str, str]
+            Map ``{nom_de_section: html}`` à splatter dans
+            ``template.render(**section_html)``.
+        """
+        engines = report_data.get("engines", [])
+        # Sprint 37 — section inter-moteurs (matrice de divergence + oracle).
         from picarones.report.inter_engine_render import (
             build_divergence_matrix_html,
             build_oracle_gap_html,
         )
+        # Sprint 41 — section NER (résumé F1 par moteur + heatmap par catégorie).
         from picarones.report.ner_render import (
             build_ner_per_category_html,
             build_ner_summary_html,
         )
         # Sprint 43 — section calibration (tableau ECE/MCE + grille de
+        # reliability diagrams par moteur).
         from picarones.report.calibration_render import (
             build_calibration_summary_html,
             build_reliability_diagrams_grid_html,
         )
+        # Sprint 46 — section stratifiée (tableau par strate).
         from picarones.report.stratification_render import (
             build_stratified_ranking_html,
         )
+        # Sprint 62 — profil philologique (6 sections adaptive).
         from picarones.report.philological_render import (
             build_philological_profile_html,
         )
+        # Sprint 86 — A.II.5 : recherchabilité fuzzy + séquences numériques.
         from picarones.report.searchability_render import (
             build_searchability_summary_html,
         )
         from picarones.report.numerical_sequences_render import (
             build_numerical_sequences_html,
         )
         # Sprint 87 — A.II.2 : lisibilité (delta Flesch).
         from picarones.report.readability_render import (
             build_readability_summary_html,
         )
         # Sprint 89 — A.II.8b : spécialisation inter-moteurs.
         from picarones.report.specialization_render import (
             build_specialization_html,
         )
+        # Chantier 3 (post-Sprint 97) — 3 vues thématiques composées.
+        from picarones.report.views import (
+            build_advanced_taxonomy_view_html,
+            build_diagnostics_view_html,
+            build_economics_view_html,
+        )
+        # Spécialisation : construit une map {engine: counts} depuis les
+        # ``aggregated_taxonomy`` ; un moteur sans taxonomie est exclu.
+        taxos: dict = {}
+        for eng in engines:
             tax = eng.get("aggregated_taxonomy")
             if isinstance(tax, dict):
                 counts = tax.get("counts") if "counts" in tax else tax
                 if isinstance(counts, dict) and counts:
+                    taxos[eng.get("name", "?")] = {
                         k: float(v) for k, v in counts.items()
                         if isinstance(v, (int, float))
                     }
+        return {
+            # Sprint 37
+            "divergence_matrix_html": build_divergence_matrix_html(
+                report_data.get("inter_engine_analysis"), labels=labels,
+            ),
+            "oracle_gap_html": build_oracle_gap_html(
+                report_data.get("inter_engine_analysis"), labels=labels,
+            ),
+            # Sprint 41
+            "ner_summary_html": build_ner_summary_html(engines, labels=labels),
+            "ner_per_category_html": build_ner_per_category_html(engines, labels=labels),
+            # Sprint 43
+            "calibration_summary_html": build_calibration_summary_html(
+                engines, labels=labels,
+            ),
+            "reliability_diagrams_html": build_reliability_diagrams_grid_html(
+                engines, labels=labels,
+            ),
+            # Sprint 46
+            "stratified_ranking_html": build_stratified_ranking_html(
+                report_data.get("stratified_ranking"),
+                report_data.get("available_strata"),
+                report_data.get("corpus_homogeneity"),
+                labels=labels,
+            ),
+            # Sprint 62
+            "philological_profile_html": build_philological_profile_html(
+                engines, labels=labels,
+            ),
+            # Sprint 86
+            "searchability_html": build_searchability_summary_html(
+                engines, labels=labels,
+            ),
+            "numerical_sequences_html": build_numerical_sequences_html(
+                engines, labels=labels,
+            ),
+            # Sprint 87
+            "readability_html": build_readability_summary_html(
+                engines, labels=labels,
+            ),
+            # Sprint 89
+            "specialization_html": build_specialization_html(taxos, labels=labels),
             # Chantier 3 — vues thématiques composées
+            "economics_view_html": build_economics_view_html(
+                report_data, labels=labels,
+                engine_reports=self.benchmark.engine_reports,
+            ),
+            "advanced_taxonomy_view_html": build_advanced_taxonomy_view_html(
+                report_data, labels=labels,
+            ),
+            "diagnostics_view_html": build_diagnostics_view_html(
+                report_data, labels=labels,
+            ),
+        }
     @classmethod
     def from_json(cls, json_path: str | Path, **kwargs) -> "ReportGenerator":

picarones/report/render_helpers.py CHANGED Viewed

@@ -187,6 +187,60 @@ def text_color_for_bg(intensity: float, *, threshold: float = 0.55) -> str:
     return "#fff" if intensity > threshold else "#222"
 # ──────────────────────────────────────────────────────────────────
 # API publique : grille SVG
 # ──────────────────────────────────────────────────────────────────
@@ -328,6 +382,8 @@ __all__ = [
     "DIVERGING_NEGATIVE_RGB",
     "DIVERGING_NEUTRAL_RGB",
     "DIVERGING_POSITIVE_RGB",
     "color_traffic_light",
     "color_single_gradient",
     "color_diverging",

     return "#fff" if intensity > threshold else "#222"
+# ──────────────────────────────────────────────────────────────────
+# API publique : barème CER par paliers (badges du rapport)
+# ──────────────────────────────────────────────────────────────────
+#
+# Les badges de qualité du rapport (galerie, tableau de classement)
+# n'utilisent pas un dégradé continu mais un barème discret à 4
+# paliers calibrés sur les seuils éditoriaux usuels :
+#
+#   < 5 %  : vert    (qualité publication directe)
+#   < 15 % : jaune   (relecture humaine légère)
+#   < 30 % : orange  (relecture humaine systématique)
+#   ≥ 30 % : rouge   (catastrophique, à reprendre)
+#
+# Les couleurs sont importées de :mod:`picarones.report.colors`
+# (palette Okabe-Ito daltonien-friendly active par défaut).
+def cer_step_color(cer: float) -> str:
+    """Couleur de texte CSS pour un score CER, par paliers.
+    Voir le barème dans le bloc de documentation ci-dessus.
+    """
+    from picarones.report.colors import (
+        COLOR_GREEN,
+        COLOR_ORANGE,
+        COLOR_RED,
+        COLOR_YELLOW,
+    )
+    if cer < 0.05:
+        return COLOR_GREEN
+    if cer < 0.15:
+        return COLOR_YELLOW
+    if cer < 0.30:
+        return COLOR_ORANGE
+    return COLOR_RED
+def cer_step_bg(cer: float) -> str:
+    """Couleur de fond CSS associée à :func:`cer_step_color`."""
+    from picarones.report.colors import (
+        BG_GREEN,
+        BG_ORANGE,
+        BG_RED,
+        BG_YELLOW,
+    )
+    if cer < 0.05:
+        return BG_GREEN
+    if cer < 0.15:
+        return BG_YELLOW
+    if cer < 0.30:
+        return BG_ORANGE
+    return BG_RED
 # ──────────────────────────────────────────────────────────────────
 # API publique : grille SVG
 # ──────────────────────────────────────────────────────────────────
     "DIVERGING_NEGATIVE_RGB",
     "DIVERGING_NEUTRAL_RGB",
     "DIVERGING_POSITIVE_RGB",
+    "cer_step_color",
+    "cer_step_bg",
     "color_traffic_light",
     "color_single_gradient",
     "color_diverging",

picarones/report/report_data/__init__.py ADDED Viewed

	@@ -0,0 +1,102 @@

+"""Construction du dict de données consommé par le template Jinja.
+Avant le découpage, ``picarones.report.generator._build_report_data``
+faisait 463 lignes pour transformer un :class:`BenchmarkResult` en
+dict prêt pour Jinja. Cette fonction empilait par sprint des blocs
+indépendants — engines, documents, statistiques, scatter plots,
+front Pareto, etc.
+Ce sous-package éclate la construction en modules thématiques :
+- :mod:`engines` — résumé par moteur (``engines_summary``).
+- :mod:`documents` — vue galerie + détail + difficulté Sprint 7.
+- :mod:`statistics` — Wilcoxon, Friedman, Nemenyi, bootstrap CIs,
+  reliability curves, Venn, error clusters, corrélations.
+- :mod:`scatter` — Sprint 10 : Gini vs CER, ratio vs anchor.
+- :mod:`pareto` — Sprint 19 : 3 fronts Pareto + métadonnées pricing.
+L'API publique :func:`build_report_data` orchestre ces modules dans
+le bon ordre (les coûts du module Pareto enrichissent en place le
+``engines_summary`` produit par :mod:`engines`).
+"""
+from __future__ import annotations
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from picarones.core.results import BenchmarkResult
+from picarones.report.report_data.documents import (
+    annotate_documents_with_difficulty,
+    build_documents,
+)
+from picarones.report.report_data.engines import build_engines_summary
+from picarones.report.report_data.pareto import build_pareto_section
+from picarones.report.report_data.scatter import (
+    build_gini_vs_cer,
+    build_ratio_vs_anchor,
+)
+from picarones.report.report_data.statistics import (
+    build_bootstrap_cis,
+    build_correlation_per_engine,
+    build_error_clusters,
+    build_friedman_and_nemenyi,
+    build_pairwise_wilcoxon,
+    build_reliability_curves,
+    build_venn_data,
+)
+def build_report_data(
+    benchmark: "BenchmarkResult", images_b64: dict[str, str],
+) -> dict:
+    """Transforme un :class:`BenchmarkResult` en dict pour le rapport HTML.
+    L'ordre est important : :mod:`pareto` lit et enrichit en place
+    le ``engines_summary`` produit par :mod:`engines`.
+    """
+    engines_summary = build_engines_summary(benchmark)
+    documents = build_documents(benchmark, images_b64)
+    annotate_documents_with_difficulty(benchmark, documents)
+    pareto_data = build_pareto_section(engines_summary, benchmark)
+    return {
+        "meta": {
+            "corpus_name": benchmark.corpus_name,
+            "corpus_source": benchmark.corpus_source,
+            "document_count": benchmark.document_count,
+            "run_date": benchmark.run_date,
+            "picarones_version": benchmark.picarones_version,
+            "metadata": benchmark.metadata,
+        },
+        "ranking": benchmark.ranking(),
+        "engines": engines_summary,
+        "documents": documents,
+        # Sprint 7
+        "statistics": {
+            "pairwise_wilcoxon": build_pairwise_wilcoxon(benchmark),
+            "bootstrap_cis": build_bootstrap_cis(benchmark),
+            **build_friedman_and_nemenyi(benchmark),
+        },
+        "reliability_curves": build_reliability_curves(benchmark),
+        "venn_data": build_venn_data(benchmark),
+        "error_clusters": build_error_clusters(benchmark),
+        "correlation_per_engine": build_correlation_per_engine(benchmark),
+        # Sprint 10
+        "gini_vs_cer": build_gini_vs_cer(benchmark),
+        "ratio_vs_anchor": build_ratio_vs_anchor(benchmark),
+        # Sprint 19 — vue Pareto coût/qualité avec variantes d'axe
+        "pareto": pareto_data,
+        # Sprint 36 — analyse inter-moteurs (divergence taxonomique +
+        # complémentarité / oracle).  ``None`` si moins de 2 moteurs.
+        "inter_engine_analysis": benchmark.inter_engine_analysis,
+        # Sprint 45-46 — stratification par script_type
+        "available_strata": benchmark.available_strata(),
+        "stratified_ranking": benchmark.stratified_ranking() or None,
+        "corpus_homogeneity": benchmark.corpus_homogeneity(),
+    }
+__all__ = ["build_report_data"]

picarones/report/report_data/_helpers.py ADDED Viewed

	@@ -0,0 +1,30 @@

+"""Helpers numériques internes au sous-package report_data.
+Petites fonctions utilitaires partagées par tous les builders de
+sections (engines, documents, statistics, scatter, pareto). Ne pas
+importer depuis l'extérieur du sous-package — ces helpers sont
+spécifiques aux conventions du dict JSON consommé par le template.
+"""
+from __future__ import annotations
+from typing import Optional
+def safe_round(v: Optional[float], decimals: int = 4) -> float:
+    """Arrondit un float optionnel ; ``None`` devient ``0.0``."""
+    return round(v or 0.0, decimals)
+def percent_string(v: Optional[float], decimals: int = 2) -> str:
+    """Formate un ratio ∈ [0, 1] en chaîne pourcentage : ``0.4723 → "47.23 %"``.
+    ``None`` → ``"—"``. Conservé pour rétrocompat avec d'éventuels
+    callers externes (Sprint 7 historique).
+    """
+    if v is None:
+        return "—"
+    return f"{v * 100:.{decimals}f} %"
+__all__ = ["safe_round", "percent_string"]

picarones/report/report_data/documents.py ADDED Viewed

	@@ -0,0 +1,167 @@

+"""Construction de la liste ``documents`` (vue galerie + vue détail).
+Pour chaque document du corpus, agrège les hypothèses de tous les
+moteurs avec leurs métriques, le diff caractère par caractère, et
+les champs spécifiques aux pipelines OCR+LLM (intermédiaire, mode,
+sur-normalisation).
+:func:`annotate_documents_with_difficulty` enrichit ensuite chaque
+document avec son score de difficulté intrinsèque (Sprint 7).
+"""
+from __future__ import annotations
+from typing import TYPE_CHECKING
+from picarones.core.diff_utils import compute_char_diff, compute_word_diff
+from picarones.measurements.difficulty import (
+    compute_all_difficulties,
+    difficulty_label,
+)
+from picarones.report.report_data._helpers import safe_round
+if TYPE_CHECKING:
+    from picarones.core.results import BenchmarkResult
+def build_documents(
+    benchmark: "BenchmarkResult", images_b64: dict[str, str],
+) -> list[dict]:
+    """Retourne la liste ordonnée des documents prêts pour le template.
+    L'ordre des documents préserve l'ordre d'apparition (premier moteur
+    d'abord, puis compléments depuis les moteurs suivants si certains
+    documents ne sont pas couverts par tous les moteurs).
+    """
+    seen_doc_ids: set[str] = set()
+    doc_ids_ordered: list[str] = []
+    for report in benchmark.engine_reports:
+        for dr in report.document_results:
+            if dr.doc_id not in seen_doc_ids:
+                seen_doc_ids.add(dr.doc_id)
+                doc_ids_ordered.append(dr.doc_id)
+    # Index croisé : doc_id → {engine_name → DocumentResult}
+    doc_engine_map: dict[str, dict] = {did: {} for did in doc_ids_ordered}
+    for report in benchmark.engine_reports:
+        for dr in report.document_results:
+            doc_engine_map.setdefault(dr.doc_id, {})[report.engine_name] = dr
+    documents: list[dict] = []
+    engine_names = [r.engine_name for r in benchmark.engine_reports]
+    for doc_id in doc_ids_ordered:
+        engine_results: list[dict] = []
+        gt = ""
+        image_path = ""
+        for engine_name in engine_names:
+            dr = doc_engine_map[doc_id].get(engine_name)
+            if dr is None:
+                continue
+            gt = dr.ground_truth
+            image_path = dr.image_path
+            er_entry = _build_engine_result_entry(engine_name, dr)
+            engine_results.append(er_entry)
+        # CER moyen sur ce document (pour le badge galerie)
+        cer_values = [er["cer"] for er in engine_results if er["error"] is None]
+        mean_cer = sum(cer_values) / len(cer_values) if cer_values else 1.0
+        best_engine = min(engine_results, key=lambda x: x["cer"], default=None)
+        # Script type (depuis metadata par document si disponible)
+        script_type = ""
+        first_engine = engine_names[0] if engine_names else None
+        first_dr = doc_engine_map[doc_id].get(first_engine)
+        if first_dr and first_dr.image_quality:
+            script_type = first_dr.image_quality.get("script_type", "")
+        documents.append({
+            "doc_id": doc_id,
+            "image_path": image_path,
+            "image_b64": images_b64.get(doc_id, ""),
+            "ground_truth": gt,
+            "mean_cer": safe_round(mean_cer),
+            "best_engine": best_engine["engine"] if best_engine else "",
+            "engine_results": engine_results,
+            "script_type": script_type,
+        })
+    return documents
+def _build_engine_result_entry(engine_name: str, dr) -> dict:
+    """Construit une entrée moteur pour un document donné (extrait pour lisibilité)."""
+    diff_ops = compute_char_diff(dr.ground_truth, dr.hypothesis)
+    er_entry: dict = {
+        "engine": engine_name,
+        "hypothesis": dr.hypothesis,
+        "cer": safe_round(dr.metrics.cer),
+        "cer_diplomatic": safe_round(dr.metrics.cer_diplomatic) if dr.metrics.cer_diplomatic is not None else None,
+        "wer": safe_round(dr.metrics.wer),
+        "mer": safe_round(dr.metrics.mer),
+        "wil": safe_round(dr.metrics.wil),
+        "duration": dr.duration_seconds,
+        "error": dr.engine_error,
+        "diff": diff_ops,
+    }
+    # Champs spécifiques aux pipelines OCR+LLM
+    if dr.ocr_intermediate is not None:
+        er_entry["ocr_intermediate"] = dr.ocr_intermediate
+        er_entry["ocr_diff"] = compute_word_diff(dr.ground_truth, dr.ocr_intermediate)
+        er_entry["llm_correction_diff"] = compute_word_diff(dr.ocr_intermediate, dr.hypothesis)
+    if dr.pipeline_metadata:
+        on = dr.pipeline_metadata.get("over_normalization")
+        if on is not None:
+            er_entry["over_normalization"] = on
+        er_entry["pipeline_mode"] = dr.pipeline_metadata.get("pipeline_mode")
+    # Sprint 5 — métriques avancées par document
+    if dr.char_scores is not None:
+        er_entry["ligature_score"] = safe_round(dr.char_scores.get("ligature", {}).get("score"))
+        er_entry["diacritic_score"] = safe_round(dr.char_scores.get("diacritic", {}).get("score"))
+    if dr.taxonomy is not None:
+        er_entry["taxonomy"] = dr.taxonomy
+    if dr.structure is not None:
+        er_entry["structure"] = dr.structure
+    if dr.image_quality is not None:
+        er_entry["image_quality"] = dr.image_quality
+    # Sprint 10
+    if dr.line_metrics is not None:
+        er_entry["line_metrics"] = dr.line_metrics
+    if dr.hallucination_metrics is not None:
+        er_entry["hallucination_metrics"] = dr.hallucination_metrics
+    return er_entry
+def annotate_documents_with_difficulty(
+    benchmark: "BenchmarkResult", documents: list[dict],
+) -> None:
+    """Annote chaque document du dict avec son score de difficulté (Sprint 7).
+    Modifie ``documents`` en place. Les valeurs par défaut ``0.5`` /
+    ``"Modéré"`` sont retournées si la difficulté n'a pas pu être
+    calculée (par exemple corpus dégénéré).
+    """
+    doc_ids_ordered = [d["doc_id"] for d in documents]
+    gt_map = {d["doc_id"]: d["ground_truth"] for d in documents}
+    cer_map: dict[str, dict[str, float]] = {d["doc_id"]: {} for d in documents}
+    iq_map: dict[str, float] = {}
+    for report in benchmark.engine_reports:
+        for dr in report.document_results:
+            cer_map.setdefault(dr.doc_id, {})[report.engine_name] = safe_round(dr.metrics.cer)
+            if dr.image_quality and "quality_score" in dr.image_quality:
+                iq_map[dr.doc_id] = dr.image_quality["quality_score"]
+    difficulty_scores = compute_all_difficulties(
+        doc_ids=doc_ids_ordered,
+        ground_truths=gt_map,
+        cer_map=cer_map,
+        image_quality_map=iq_map or None,
+    )
+    for doc in documents:
+        ds = difficulty_scores.get(doc["doc_id"])
+        if ds:
+            doc["difficulty_score"] = safe_round(ds.score)
+            doc["difficulty_label"] = difficulty_label(ds.score)
+        else:
+            doc["difficulty_score"] = 0.5
+            doc["difficulty_label"] = "Modéré"
+__all__ = ["build_documents", "annotate_documents_with_difficulty"]

picarones/report/report_data/engines.py ADDED Viewed

	@@ -0,0 +1,103 @@

+"""Construction du résumé par moteur (``engines_summary``).
+Pour chaque ``EngineReport``, accumule métriques agrégées (CER, WER,
+MER, WIL), distribution CER pour l'histogramme, métriques avancées
+patrimoniales (Sprint 5), distribution d'erreurs (Sprint 10), NER
+(Sprint 41), calibration (Sprint 43), profil philologique (Sprint
+62), recherchabilité + séquences numériques (Sprint 86), lisibilité
+(Sprint 87) et indicateurs pipeline OCR+LLM.
+Les coûts (durée moyenne, prix par 1k pages, CO₂) sont ajoutés
+ultérieurement par :mod:`picarones.report.report_data.pareto` qui
+en a besoin pour calculer les fronts.
+"""
+from __future__ import annotations
+from typing import TYPE_CHECKING
+from picarones.report.report_data._helpers import safe_round
+if TYPE_CHECKING:
+    from picarones.core.results import BenchmarkResult
+def build_engines_summary(benchmark: "BenchmarkResult") -> list[dict]:
+    """Retourne la liste des dicts moteur, une entrée par ``EngineReport``."""
+    engines_summary: list[dict] = []
+    for report in benchmark.engine_reports:
+        agg = report.aggregated_metrics
+        diplo_agg = agg.get("cer_diplomatic", {})
+        line_metrics = report.aggregated_line_metrics
+        halluc = report.aggregated_hallucination
+        entry: dict = {
+            "name": report.engine_name,
+            "version": report.engine_version,
+            "cer":  safe_round(agg.get("cer", {}).get("mean")),
+            "wer":  safe_round(agg.get("wer", {}).get("mean")),
+            "mer":  safe_round(agg.get("mer", {}).get("mean")),
+            "wil":  safe_round(agg.get("wil", {}).get("mean")),
+            "cer_median": safe_round(agg.get("cer", {}).get("median")),
+            "cer_min":    safe_round(agg.get("cer", {}).get("min")),
+            "cer_max":    safe_round(agg.get("cer", {}).get("max")),
+            "doc_count":  agg.get("document_count", 0),
+            "failed":     agg.get("failed_count", 0),
+            # CER diplomatique (après normalisation historique : ſ=s, u=v, i=j…)
+            "cer_diplomatic": safe_round(diplo_agg.get("mean")) if diplo_agg else None,
+            "cer_diplomatic_profile": diplo_agg.get("profile"),
+            # Distribution pour l'histogramme : liste des CER individuels
+            "cer_values": [
+                safe_round(dr.metrics.cer)
+                for dr in report.document_results
+                if dr.metrics.error is None
+            ],
+            "cer_diplomatic_values": [
+                safe_round(dr.metrics.cer_diplomatic)
+                for dr in report.document_results
+                if dr.metrics.error is None and dr.metrics.cer_diplomatic is not None
+            ],
+            # Champs pipeline OCR+LLM (vides pour les moteurs OCR seuls)
+            "is_pipeline": report.is_pipeline,
+            "pipeline_info": report.pipeline_info,
+            # Sprint 5 — métriques avancées patrimoniales
+            "ligature_score": safe_round(report.ligature_score) if report.ligature_score is not None else None,
+            "diacritic_score": safe_round(report.diacritic_score) if report.diacritic_score is not None else None,
+            "aggregated_confusion": report.aggregated_confusion,
+            "aggregated_taxonomy": report.aggregated_taxonomy,
+            "aggregated_structure": report.aggregated_structure,
+            "aggregated_image_quality": report.aggregated_image_quality,
+            # Sprint 10 — distribution des erreurs + hallucinations VLM
+            "gini": safe_round(line_metrics.get("gini_mean")) if line_metrics else None,
+            "cer_p90": safe_round(line_metrics.get("percentiles", {}).get("p90")) if line_metrics else None,
+            "cer_p99": safe_round(line_metrics.get("percentiles", {}).get("p99")) if line_metrics else None,
+            "catastrophic_rate_30": safe_round(line_metrics.get("catastrophic_rate", {}).get("0.3")) if line_metrics else None,
+            "aggregated_line_metrics": line_metrics,
+            "anchor_score": safe_round(halluc.get("anchor_score_mean")) if halluc else None,
+            "length_ratio": safe_round(halluc.get("length_ratio_mean")) if halluc else None,
+            "hallucinating_doc_rate": safe_round(halluc.get("hallucinating_doc_rate")) if halluc else None,
+            "aggregated_hallucination": halluc,
+            # Sprint 41 — NER agrégé (None si aucun calcul effectué)
+            "aggregated_ner": report.aggregated_ner,
+            # Sprint 43 — calibration agrégée (None si aucune confidence
+            # n'a été exposée par le moteur sur ce corpus)
+            "aggregated_calibration": report.aggregated_calibration,
+            # Sprint 62 — profil philologique agrégé (None si aucun
+            # signal philologique sur le corpus pour ce moteur)
+            "aggregated_philological": report.aggregated_philological,
+            # Sprint 86 — A.II.5 (recherchabilité fuzzy + séquences
+            # numériques). None si aucun document n'a de signal.
+            "aggregated_searchability": report.aggregated_searchability,
+            "aggregated_numerical_sequences": (
+                report.aggregated_numerical_sequences
+            ),
+            # Sprint 87 — A.II.2 (delta Flesch agrégé)
+            "aggregated_readability": report.aggregated_readability,
+            "is_vlm": report.pipeline_info.get("is_vlm", False) if report.pipeline_info else False,
+        }
+        engines_summary.append(entry)
+    return engines_summary
+__all__ = ["build_engines_summary"]

picarones/report/report_data/pareto.py ADDED Viewed

	@@ -0,0 +1,123 @@

+"""Front Pareto coût/qualité (Sprint 19).
+Construit trois fronts Pareto avec des axes alternatifs :
+- ``cost`` — CER vs coût € / 1000 pages.
+- ``speed`` — CER vs durée moyenne par page.
+- ``co2`` — CER vs empreinte carbone (g CO₂ / 1000 pages, expérimental).
+**Effet de bord** : :func:`build_pareto_section` enrichit en place
+le ``engines_summary`` reçu en argument avec les champs
+``mean_duration_seconds`` et ``cost`` (coût par 1000 pages + détail
+de pricing). Cette responsabilité partagée est documentée dans le
+module ``__init__.py`` du sous-package.
+"""
+from __future__ import annotations
+from typing import TYPE_CHECKING
+from picarones.measurements.pricing import (
+    build_costs_for_benchmark,
+    load_pricing_database,
+)
+from picarones.measurements.statistics import compute_pareto_front
+if TYPE_CHECKING:
+    from picarones.core.results import BenchmarkResult
+def build_pareto_section(
+    engines_summary: list[dict], benchmark: "BenchmarkResult",
+) -> dict:
+    """Construit le bloc ``pareto`` du dict de rapport.
+    Annote en place chaque entrée de ``engines_summary`` avec
+    ``mean_duration_seconds`` et ``cost``.
+    """
+    durations_by_engine: dict[str, float] = {}
+    for report in benchmark.engine_reports:
+        durs = [
+            dr.duration_seconds
+            for dr in report.document_results
+            if dr.duration_seconds is not None
+        ]
+        if durs:
+            durations_by_engine[report.engine_name] = sum(durs) / len(durs)
+    pricing_defaults, _ = load_pricing_database()
+    costs_by_engine = build_costs_for_benchmark(
+        engines_summary, durations_by_engine,
+    )
+    # Annoter en place chaque résumé moteur avec son coût et sa durée.
+    for entry in engines_summary:
+        name = entry["name"]
+        entry["mean_duration_seconds"] = (
+            round(durations_by_engine.get(name, 0.0), 4)
+            if name in durations_by_engine else None
+        )
+        entry["cost"] = costs_by_engine.get(name)
+    pareto_points = []
+    for entry in engines_summary:
+        cer = entry.get("cer")
+        cost = (entry.get("cost") or {}).get("cost_per_1k_pages_eur")
+        if cer is None or cost is None:
+            continue
+        pareto_points.append({"engine": entry["name"], "cer": cer, "cost": cost})
+    pareto_front_engines = compute_pareto_front(
+        pareto_points, objectives=("cer", "cost"),
+    )
+    pareto_speed_points = []
+    for entry in engines_summary:
+        cer = entry.get("cer")
+        dur = entry.get("mean_duration_seconds")
+        if cer is None or dur is None:
+            continue
+        pareto_speed_points.append({"engine": entry["name"], "cer": cer, "dur": dur})
+    pareto_front_speed = compute_pareto_front(
+        pareto_speed_points, objectives=("cer", "dur"),
+    )
+    pareto_co2_points = []
+    for entry in engines_summary:
+        cer = entry.get("cer")
+        co2 = (entry.get("cost") or {}).get("co2_per_1k_pages_g")
+        if cer is None or co2 is None:
+            continue
+        pareto_co2_points.append({"engine": entry["name"], "cer": cer, "co2": co2})
+    pareto_front_co2 = compute_pareto_front(
+        pareto_co2_points, objectives=("cer", "co2"),
+    )
+    return {
+        "cost": {
+            "points": pareto_points,
+            "front": pareto_front_engines,
+            "axis_label": "Coût (€ / 1000 pages)",
+        },
+        "speed": {
+            "points": pareto_speed_points,
+            "front": pareto_front_speed,
+            "axis_label": "Temps moyen (s / page)",
+        },
+        "co2": {
+            "points": pareto_co2_points,
+            "front": pareto_front_co2,
+            "axis_label": (
+                "Empreinte carbone (g CO₂ / 1000 pages, expérimental)"
+            ),
+        },
+        "pricing_meta": {
+            "last_updated": pricing_defaults.last_updated,
+            "currency": pricing_defaults.currency,
+            "hourly_rate_local_cpu_eur": pricing_defaults.hourly_rate_local_cpu_eur,
+            "hourly_rate_local_gpu_eur": pricing_defaults.hourly_rate_local_gpu_eur,
+            "grid_intensity_local": pricing_defaults.grid_intensity_local,
+            "grid_intensity_cloud": pricing_defaults.grid_intensity_cloud,
+        },
+    }
+__all__ = ["build_pareto_section"]

picarones/report/report_data/scatter.py ADDED Viewed

	@@ -0,0 +1,56 @@

+"""Scatter plots du rapport (Sprint 10).
+- ``gini_vs_cer`` — corrélation Gini (concentration des erreurs)
+  vs CER moyen, par moteur.
+- ``ratio_vs_anchor`` — ratio de longueur OCR/GT vs score d'ancrage,
+  par moteur (révèle les hallucinations VLM).
+"""
+from __future__ import annotations
+from typing import TYPE_CHECKING
+from picarones.report.report_data._helpers import safe_round
+if TYPE_CHECKING:
+    from picarones.core.results import BenchmarkResult
+def build_gini_vs_cer(benchmark: "BenchmarkResult") -> list[dict]:
+    """Scatter Gini de la distribution d'erreurs vs CER moyen."""
+    gini_vs_cer: list[dict] = []
+    for report in benchmark.engine_reports:
+        line_metrics = report.aggregated_line_metrics
+        gini_val = line_metrics.get("gini_mean") if line_metrics else None
+        cer_val = report.mean_cer
+        if gini_val is not None and cer_val is not None:
+            gini_vs_cer.append({
+                "engine": report.engine_name,
+                "cer": safe_round(cer_val),
+                "gini": safe_round(gini_val),
+                "is_pipeline": report.is_pipeline,
+            })
+    return gini_vs_cer
+def build_ratio_vs_anchor(benchmark: "BenchmarkResult") -> list[dict]:
+    """Scatter ratio de longueur vs score d'ancrage (détection VLM)."""
+    ratio_vs_anchor: list[dict] = []
+    for report in benchmark.engine_reports:
+        halluc = report.aggregated_hallucination
+        if not halluc:
+            continue
+        ratio_vs_anchor.append({
+            "engine": report.engine_name,
+            "length_ratio": safe_round(halluc.get("length_ratio_mean", 1.0)),
+            "anchor_score": safe_round(halluc.get("anchor_score_mean", 1.0)),
+            "hallucinating_rate": safe_round(halluc.get("hallucinating_doc_rate", 0.0)),
+            "is_vlm": (
+                report.pipeline_info.get("is_vlm", False)
+                if report.pipeline_info else False
+            ),
+        })
+    return ratio_vs_anchor
+__all__ = ["build_gini_vs_cer", "build_ratio_vs_anchor"]

picarones/report/report_data/statistics.py ADDED Viewed

	@@ -0,0 +1,216 @@

+"""Sections statistiques du rapport (Sprint 7 + Sprint 17).
+Construit les blocs :
+- ``pairwise_wilcoxon`` — tests de Wilcoxon par paire de moteurs.
+- ``bootstrap_cis`` — intervalles de confiance bootstrap par moteur.
+- ``friedman`` + ``nemenyi`` — Sprint 17, multi-moteurs.
+- ``reliability_curves`` — courbes de fiabilité par moteur.
+- ``venn_data`` — diagramme de Venn des erreurs communes/exclusives.
+- ``error_clusters`` — clustering des patterns d'erreurs.
+- ``correlation_per_engine`` — matrice de corrélation par moteur.
+"""
+from __future__ import annotations
+from typing import TYPE_CHECKING, Optional
+from picarones.core.diff_utils import compute_word_diff
+from picarones.measurements.statistics import (
+    bootstrap_ci,
+    cluster_errors,
+    compute_correlation_matrix,
+    compute_pairwise_stats,
+    compute_reliability_curve,
+    compute_venn_data,
+    friedman_test,
+    nemenyi_posthoc,
+)
+from picarones.report.report_data._helpers import safe_round
+if TYPE_CHECKING:
+    from picarones.core.results import BenchmarkResult
+def _engine_cer_values(benchmark: "BenchmarkResult") -> dict[str, list[float]]:
+    """Map ``engine_name → [cer_individuels valides]``."""
+    out: dict[str, list[float]] = {}
+    for report in benchmark.engine_reports:
+        vals = [
+            safe_round(dr.metrics.cer)
+            for dr in report.document_results
+            if dr.metrics.error is None
+        ]
+        if vals:
+            out[report.engine_name] = vals
+    return out
+def build_pairwise_wilcoxon(benchmark: "BenchmarkResult") -> list[dict]:
+    """Tests de Wilcoxon par paire de moteurs (Sprint 7)."""
+    return compute_pairwise_stats(_engine_cer_values(benchmark))
+def build_bootstrap_cis(benchmark: "BenchmarkResult") -> list[dict]:
+    """Intervalles de confiance bootstrap par moteur (Sprint 7)."""
+    bootstrap_cis: list[dict] = []
+    for engine_name, vals in _engine_cer_values(benchmark).items():
+        lo, hi = bootstrap_ci(vals)
+        mean_v = sum(vals) / len(vals) if vals else 0.0
+        bootstrap_cis.append({
+            "engine": engine_name,
+            "mean": safe_round(mean_v),
+            "ci_lower": safe_round(lo),
+            "ci_upper": safe_round(hi),
+        })
+    return bootstrap_cis
+def build_friedman_and_nemenyi(benchmark: "BenchmarkResult") -> dict:
+    """Test de Friedman + post-hoc Nemenyi (Sprint 17, multi-moteurs).
+    Alignement strict sur le même ordre de documents : on reconstruit
+    la map à partir des documents communs à tous les moteurs, sinon
+    Friedman n'est pas applicable.
+    Returns
+    -------
+    dict
+        ``{"friedman": {...}, "nemenyi": {...}}`` à fusionner dans
+        la section ``statistics`` du rapport.
+    """
+    # Liste ordonnée des doc_ids selon l'ordre d'apparition.
+    seen: set[str] = set()
+    doc_ids_ordered: list[str] = []
+    for report in benchmark.engine_reports:
+        for dr in report.document_results:
+            if dr.doc_id not in seen:
+                seen.add(dr.doc_id)
+                doc_ids_ordered.append(dr.doc_id)
+    common_doc_ids: Optional[set[str]] = None
+    for report in benchmark.engine_reports:
+        doc_ids = {dr.doc_id for dr in report.document_results if dr.metrics.error is None}
+        common_doc_ids = doc_ids if common_doc_ids is None else common_doc_ids & doc_ids
+    engine_cer_aligned: dict[str, list[float]] = {}
+    if common_doc_ids:
+        ordered_common = [d for d in doc_ids_ordered if d in common_doc_ids]
+        for report in benchmark.engine_reports:
+            dr_by_id = {dr.doc_id: dr for dr in report.document_results}
+            engine_cer_aligned[report.engine_name] = [
+                safe_round(dr_by_id[d].metrics.cer) for d in ordered_common
+            ]
+    if engine_cer_aligned:
+        friedman = friedman_test(engine_cer_aligned)
+        nemenyi = nemenyi_posthoc(engine_cer_aligned)
+    else:
+        friedman = {
+            "statistic": 0.0, "p_value": 1.0, "significant": False,
+            "df": 0, "n_blocks": 0, "n_engines": 0, "mean_ranks": {},
+            "interpretation": "Test de Friedman non calculé — aucun document commun.",
+            "error": "no_common_documents",
+        }
+        nemenyi = {
+            "alpha": 0.05, "critical_distance": 0.0, "q_alpha": 0.0,
+            "n_blocks": 0, "n_engines": 0, "mean_ranks": {},
+            "engines_sorted": [], "significant_matrix": [], "tied_groups": [],
+            "error": "no_common_documents",
+        }
+    return {"friedman": friedman, "nemenyi": nemenyi}
+def build_reliability_curves(benchmark: "BenchmarkResult") -> list[dict]:
+    """Courbes de fiabilité par moteur (Sprint 7)."""
+    reliability_curves: list[dict] = []
+    for report in benchmark.engine_reports:
+        vals = [
+            safe_round(dr.metrics.cer)
+            for dr in report.document_results
+            if dr.metrics.error is None
+        ]
+        curve = compute_reliability_curve(vals)
+        reliability_curves.append({
+            "engine": report.engine_name,
+            "points": curve,
+        })
+    return reliability_curves
+def build_venn_data(benchmark: "BenchmarkResult") -> dict:
+    """Venn des erreurs communes / exclusives (Sprint 7).
+    Construit les ensembles d'erreurs par moteur :
+    ``{engine → set("doc_id:gt_tok:hyp_tok")}``.
+    """
+    venn_error_sets: dict[str, set[str]] = {}
+    for report in benchmark.engine_reports:
+        error_set: set[str] = set()
+        for dr in report.document_results:
+            ops = compute_word_diff(dr.ground_truth, dr.hypothesis)
+            for op in ops:
+                if op["op"] in ("replace", "delete", "insert"):
+                    key = (
+                        f"{dr.doc_id}:"
+                        f"{op.get('old', op.get('text', ''))}:"
+                        f"{op.get('new', op.get('text', ''))}"
+                    )
+                    error_set.add(key)
+        venn_error_sets[report.engine_name] = error_set
+    return compute_venn_data(venn_error_sets)
+def build_error_clusters(benchmark: "BenchmarkResult") -> list[dict]:
+    """Clustering des patterns d'erreurs (Sprint 7)."""
+    error_data_all: list[dict] = []
+    for report in benchmark.engine_reports:
+        for dr in report.document_results:
+            error_data_all.append({
+                "engine": report.engine_name,
+                "gt": dr.ground_truth,
+                "hypothesis": dr.hypothesis,
+            })
+    error_clusters_raw = cluster_errors(error_data_all, max_clusters=8)
+    return [c.as_dict() for c in error_clusters_raw]
+def build_correlation_per_engine(benchmark: "BenchmarkResult") -> list[dict]:
+    """Matrice de corrélation par moteur entre métriques métiers (Sprint 7)."""
+    correlation_per_engine: list[dict] = []
+    for report in benchmark.engine_reports:
+        metrics_list: list[dict[str, float]] = []
+        for dr in report.document_results:
+            if dr.metrics.error is not None:
+                continue
+            entry: dict[str, float] = {
+                "cer": safe_round(dr.metrics.cer),
+                "wer": safe_round(dr.metrics.wer),
+                "mer": safe_round(dr.metrics.mer),
+                "wil": safe_round(dr.metrics.wil),
+            }
+            if dr.image_quality:
+                entry["quality_score"] = safe_round(dr.image_quality.get("quality_score", 0.5))
+                entry["sharpness"] = safe_round(dr.image_quality.get("sharpness_score", 0.5))
+            if dr.char_scores:
+                entry["ligature"] = safe_round(dr.char_scores.get("ligature", {}).get("score", 0.5))
+                entry["diacritic"] = safe_round(dr.char_scores.get("diacritic", {}).get("score", 0.5))
+            metrics_list.append(entry)
+        if metrics_list:
+            corr = compute_correlation_matrix(metrics_list)
+            correlation_per_engine.append({
+                "engine": report.engine_name,
+                **corr,
+            })
+    return correlation_per_engine
+__all__ = [
+    "build_pairwise_wilcoxon",
+    "build_bootstrap_cis",
+    "build_friedman_and_nemenyi",
+    "build_reliability_curves",
+    "build_venn_data",
+    "build_error_clusters",
+    "build_correlation_per_engine",
+]

tests/architecture/test_file_budgets.py CHANGED Viewed

@@ -36,13 +36,18 @@ FILE_BUDGETS: dict[str, int] = {
     # --- God-modules : budget actuel + 15 % de marge.
     # Le rétrécissement sera l'objet d'un sprint de refactor dédié.
     "picarones/measurements/statistics.py": 1300,         # actuel 1128
-    "picarones/report/generator.py": 1250,                # actuel 1063
     "picarones/measurements/runner.py": 1200,             # actuel 1019
     # --- Fichiers métier larges.
     "picarones/measurements/robustness.py": 850,          # actuel 731
-    "picarones/report/pipeline_render.py": 825,           # actuel 717
     "picarones/core/results.py": 750,                     # actuel 636
-    "picarones/report/philological_render.py": 725,       # actuel 615
     "picarones/measurements/history.py": 725,             # actuel 615
     "picarones/measurements/modern_archives.py": 700,     # actuel 599
     "picarones/measurements/builtin_hooks.py": 700,       # actuel 590

     # --- God-modules : budget actuel + 15 % de marge.
     # Le rétrécissement sera l'objet d'un sprint de refactor dédié.
     "picarones/measurements/statistics.py": 1300,         # actuel 1128
     "picarones/measurements/runner.py": 1200,             # actuel 1019
+    # --- Refactor (sprint « découpage de generator.py ») : passé de
+    # 1063 à 431 lignes via extraction vers picarones/report/assets.py
+    # et le sous-package picarones/report/report_data/. Budget serré
+    # à 500 pour verrouiller le gain ; toute croissance > 500 sera
+    # un signal pour redécouper.
+    "picarones/report/generator.py": 500,                 # actuel 431
     # --- Fichiers métier larges.
     "picarones/measurements/robustness.py": 850,          # actuel 731
+    "picarones/report/pipeline_render.py": 815,           # actuel 707 (rétréci)
     "picarones/core/results.py": 750,                     # actuel 636
+    "picarones/report/philological_render.py": 700,       # actuel 595 (rétréci)
     "picarones/measurements/history.py": 725,             # actuel 615
     "picarones/measurements/modern_archives.py": 700,     # actuel 599
     "picarones/measurements/builtin_hooks.py": 700,       # actuel 590

tests/report/test_views.py CHANGED Viewed

@@ -333,7 +333,12 @@ class TestDetailsShell:
 class TestGeneratorWiring:
     def test_generator_imports_three_views(self):
         """generator.py doit importer les 3 vues automatiques (economics,
-        advanced_taxonomy, diagnostics) pour les passer au template."""
         from pathlib import Path
         gen_src = (
@@ -343,10 +348,21 @@ class TestGeneratorWiring:
         assert "build_economics_view_html" in gen_src
         assert "build_advanced_taxonomy_view_html" in gen_src
         assert "build_diagnostics_view_html" in gen_src
-        # Et les 3 variables doivent être passées au template
-        assert "economics_view_html=" in gen_src
-        assert "advanced_taxonomy_view_html=" in gen_src
-        assert "diagnostics_view_html=" in gen_src
     def test_template_uses_three_views(self):
         from pathlib import Path

 class TestGeneratorWiring:
     def test_generator_imports_three_views(self):
         """generator.py doit importer les 3 vues automatiques (economics,
+        advanced_taxonomy, diagnostics) pour les passer au template.
+        Tolère les deux conventions de câblage : argument nommé
+        ``economics_view_html=...`` ou clé de dict ``"economics_view_html"``
+        splatée via ``**section_html`` (cf. ``_build_section_html``).
+        """
         from pathlib import Path
         gen_src = (
         assert "build_economics_view_html" in gen_src
         assert "build_advanced_taxonomy_view_html" in gen_src
         assert "build_diagnostics_view_html" in gen_src
+        # Et les 3 variables doivent être câblées vers le template, soit
+        # par argument explicite (``var=...``), soit par clé de dict
+        # splatée (``"var": ...``).
+        for name in (
+            "economics_view_html",
+            "advanced_taxonomy_view_html",
+            "diagnostics_view_html",
+        ):
+            assert (
+                f"{name}=" in gen_src
+                or f'"{name}"' in gen_src
+            ), (
+                f"variable {name!r} ni argument nommé ni clé de dict "
+                "dans generator.py"
+            )
     def test_template_uses_three_views(self):
         from pathlib import Path