Spaces:
Running
feat(migration): Phase 5.E — generator + comparison + snapshot + data + templates + i18n
Browse filesPhase 5.E finalise Phase 5 en migrant les derniers composants
``report/`` vers ``reports_v2/html/``.
Migrations effectuées
---------------------
| Source legacy | Destination canonique |
|------------------------------------------------|----------------------------------------------------|
| ``report/generator.py`` (466) | ``reports_v2/html/generator.py`` |
| ``report/comparison.py`` (409) | ``reports_v2/html/comparison.py`` |
| ``report/snapshot.py`` (266) | ``reports_v2/html/snapshot.py`` |
| ``report/report_data/`` (8 fichiers, 1135 l) | ``reports_v2/html/data/`` |
| ``report/templates/`` (13 fichiers) | ``reports_v2/html/templates/`` |
| ``picarones/i18n.py`` (124) | ``picarones/reports_v2/i18n/__init__.py`` |
| ``report/__init__.py`` (3) | shim re-export |
Total : ~2400 lignes relocalisées + 13 templates Jinja2 + le
loader i18n. 12 nouveaux shims minimaux (< 25 lignes) avec
``DeprecationWarning``.
Adaptations transverses
-----------------------
- ``reports_v2/html/snapshot.py`` : ``importlib.metadata``
remplace ``picarones.__version__`` (interdit par layer-deps).
- ``reports_v2/html/snapshot.py`` import ``pricing`` redirigé
vers ``evaluation/metrics/pricing``.
- ``reports_v2/html/generator.py`` : ~30 imports internes
redirigés vers ``reports_v2/html/{data,renderers,views,
snapshot}`` et ``evaluation/{statistics,metric_result,
benchmark_result}``.
- ``reports_v2/html/data/`` : 7 imports vers ``measurements/``
redirigés vers ``evaluation/`` ou ``evaluation/metrics/``.
- ``reports_v2/html/views/`` : 6 imports vers ``measurements/``
redirigés vers ``evaluation/metrics/``.
- ``test_module_coverage.py::TEST_ONLY_BASELINE`` étendu à
``statistics``, ``pricing``, ``difficulty``.
- ``test_file_budgets.py`` : 2 entrées legacy retirées.
- 28+ chemins de templates dans les tests redirigés vers
``reports_v2/html/templates/``.
- Tests ``from picarones import i18n`` → ``from picarones.reports_v2 import i18n``.
État final de ``picarones/report/``
-----------------------------------
Le répertoire ``picarones/report/`` ne contient désormais **que
des shims** (~30 fichiers). Aucun module avec du contenu réel ne
subsiste. Le canonique vit intégralement dans
``picarones/reports_v2/html/`` (générateur + renderers + vues +
données + templates + comparaison + snapshot).
Acceptance Phase 5.E + Phase 5 entière
--------------------------------------
5019 tests passent, lint vert, architecture vérifiée
(anti-cycles, file budgets, module coverage).
Phase 5 est terminée. Reste pour le retrait du legacy :
Phase 6 (``pipelines/``), Phase 7 (``modules/``),
Phase 8 (``extras/importers/``), Phase 9 (``web/``),
Phase 10 (``cli/``), Phase 11 (retrait final + 2.0).
https://claude.ai/code/session_011XQZNitg1rCgia8ZD1a2hP
- docs/migration/legacy-retirement-plan.md +78 -4
- picarones/cli/__init__.py +2 -2
- picarones/cli/_workflows.py +1 -1
- picarones/i18n.py +19 -120
- picarones/report/__init__.py +17 -2
- picarones/report/comparison.py +11 -402
- picarones/report/generator.py +11 -459
- picarones/report/report_data/__init__.py +13 -124
- picarones/report/report_data/_helpers.py +11 -23
- picarones/report/report_data/documents.py +11 -160
- picarones/report/report_data/engines.py +11 -96
- picarones/report/report_data/extra_metrics.py +11 -265
- picarones/report/report_data/pareto.py +11 -152
- picarones/report/report_data/scatter.py +11 -49
- picarones/report/report_data/statistics.py +11 -209
- picarones/report/snapshot.py +11 -259
- picarones/report/templates/_critical_difference.html +0 -39
- picarones/report/templates/_footer.html +0 -6
- picarones/report/templates/_header.html +0 -35
- picarones/report/templates/_narrative_summary.html +0 -22
- picarones/report/templates/_side_panels.html +0 -76
- picarones/report/templates/view_analyses.html +0 -326
- picarones/report/templates/view_characters.html +0 -32
- picarones/report/templates/view_document.html +0 -83
- picarones/report/templates/view_gallery.html +0 -35
- picarones/report/templates/view_ranking.html +0 -91
- picarones/reports_v2/html/__init__.py +2 -1
- picarones/reports_v2/html/comparison.py +414 -0
- picarones/reports_v2/html/data/__init__.py +132 -0
- picarones/reports_v2/html/data/_helpers.py +30 -0
- picarones/reports_v2/html/data/documents.py +167 -0
- picarones/reports_v2/html/data/engines.py +103 -0
- picarones/reports_v2/html/data/extra_metrics.py +272 -0
- picarones/reports_v2/html/data/pareto.py +159 -0
- picarones/reports_v2/html/data/scatter.py +56 -0
- picarones/reports_v2/html/data/statistics.py +216 -0
- picarones/reports_v2/html/generator.py +471 -0
- picarones/reports_v2/html/snapshot.py +281 -0
- picarones/{report → reports_v2/html}/templates/_app.js +0 -0
- picarones/{report → reports_v2/html}/templates/_styles.css +0 -0
- picarones/{report → reports_v2/html}/templates/base.html.j2 +0 -0
- picarones/reports_v2/i18n/__init__.py +132 -0
- picarones/web/benchmark_utils.py +2 -2
- tests/architecture/test_file_budgets.py +6 -2
- tests/architecture/test_module_coverage.py +7 -0
- tests/core/test_sprint14_robust_filtering.py +1 -1
- tests/engines/test_sprint3_llm_pipelines.py +2 -2
- tests/engines/test_sprint4_normalization_iiif.py +3 -3
- tests/integration/test_sprint11_i18n_english.py +11 -11
- tests/integration/test_sprint30_polish_a11y_dx.py +8 -8
|
@@ -701,11 +701,12 @@ architecture vérifiée.
|
|
| 701 |
``pipeline_benchmark``, ``pipeline_comparison``,
|
| 702 |
``core/pipeline``) puis 2 renderers
|
| 703 |
(``numerical_sequences``, ``pipeline``).
|
| 704 |
-
- Phase 5.D
|
| 705 |
-
- Phase 5.E
|
| 706 |
-
``snapshot.py``, ``report_data/``, templates
|
|
|
|
| 707 |
|
| 708 |
-
|
| 709 |
|
| 710 |
#### Phase 5.C.batch2 — Lot 2 : 5 renderers moyens (2026-05)
|
| 711 |
|
|
@@ -989,6 +990,79 @@ Total : ~1114 lignes relocalisées. 6 nouveaux shims minimaux
|
|
| 989 |
**Acceptance Phase 5.D** : 5019 tests passent, lint vert,
|
| 990 |
architecture vérifiée.
|
| 991 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 992 |
### Phase 6 — Pipelines OCR+LLM (`pipelines/`)
|
| 993 |
|
| 994 |
**Modules** : `pipelines/base.OCRLLMPipeline` (3 modes), `pipelines/
|
|
|
|
| 701 |
``pipeline_benchmark``, ``pipeline_comparison``,
|
| 702 |
``core/pipeline``) puis 2 renderers
|
| 703 |
(``numerical_sequences``, ``pipeline``).
|
| 704 |
+
- Phase 5.D ✅ — 5 vues (``views/*.py``).
|
| 705 |
+
- Phase 5.E ✅ — ``generator.py``, ``comparison.py``,
|
| 706 |
+
``snapshot.py``, ``report_data/`` (8 fichiers), templates
|
| 707 |
+
Jinja2 (13 fichiers), ``picarones/i18n.py``.
|
| 708 |
|
| 709 |
+
Phase 5 est **terminée**.
|
| 710 |
|
| 711 |
#### Phase 5.C.batch2 — Lot 2 : 5 renderers moyens (2026-05)
|
| 712 |
|
|
|
|
| 990 |
**Acceptance Phase 5.D** : 5019 tests passent, lint vert,
|
| 991 |
architecture vérifiée.
|
| 992 |
|
| 993 |
+
#### Phase 5.E — Migration generator + comparison + snapshot + report_data + templates + i18n (2026-05)
|
| 994 |
+
|
| 995 |
+
Phase 5.E finalise Phase 5 en migrant les derniers composants
|
| 996 |
+
``report/`` :
|
| 997 |
+
|
| 998 |
+
**Migrations effectuées** :
|
| 999 |
+
|
| 1000 |
+
| Source legacy | Destination canonique |
|
| 1001 |
+
|------------------------------------------------|----------------------------------------------------|
|
| 1002 |
+
| ``report/generator.py`` (466) | ``reports_v2/html/generator.py`` |
|
| 1003 |
+
| ``report/comparison.py`` (409) | ``reports_v2/html/comparison.py`` |
|
| 1004 |
+
| ``report/snapshot.py`` (266) | ``reports_v2/html/snapshot.py`` |
|
| 1005 |
+
| ``report/report_data/__init__.py`` (132) | ``reports_v2/html/data/__init__.py`` |
|
| 1006 |
+
| ``report/report_data/_helpers.py`` (30) | ``reports_v2/html/data/_helpers.py`` |
|
| 1007 |
+
| ``report/report_data/documents.py`` (167) | ``reports_v2/html/data/documents.py`` |
|
| 1008 |
+
| ``report/report_data/engines.py`` (103) | ``reports_v2/html/data/engines.py`` |
|
| 1009 |
+
| ``report/report_data/extra_metrics.py`` (272) | ``reports_v2/html/data/extra_metrics.py`` |
|
| 1010 |
+
| ``report/report_data/pareto.py`` (159) | ``reports_v2/html/data/pareto.py`` |
|
| 1011 |
+
| ``report/report_data/scatter.py`` (56) | ``reports_v2/html/data/scatter.py`` |
|
| 1012 |
+
| ``report/report_data/statistics.py`` (216) | ``reports_v2/html/data/statistics.py`` |
|
| 1013 |
+
| ``report/templates/`` (13 fichiers) | ``reports_v2/html/templates/`` (13 fichiers) |
|
| 1014 |
+
| ``picarones/i18n.py`` (124) | ``picarones/reports_v2/i18n/__init__.py`` |
|
| 1015 |
+
| ``report/__init__.py`` (3) | shim re-export |
|
| 1016 |
+
|
| 1017 |
+
Total : ~2400 lignes relocalisées + 13 templates Jinja2 + le
|
| 1018 |
+
loader i18n. Au total **12 nouveaux shims minimaux** (< 25
|
| 1019 |
+
lignes) avec ``DeprecationWarning``.
|
| 1020 |
+
|
| 1021 |
+
**Adaptations transverses** :
|
| 1022 |
+
|
| 1023 |
+
- ``reports_v2/html/snapshot.py`` ne peut pas importer
|
| 1024 |
+
``picarones.__version__`` (interdit par layer-deps) : utilise
|
| 1025 |
+
``importlib.metadata`` avec fallback (idem qu'au Phase 4-ter).
|
| 1026 |
+
- ``reports_v2/html/snapshot.py`` import ``pricing`` redirigé
|
| 1027 |
+
vers le canonique ``evaluation/metrics/pricing``.
|
| 1028 |
+
- ``reports_v2/html/generator.py`` toutes les ~30 imports
|
| 1029 |
+
internes redirigés vers ``reports_v2/html/{data,renderers,
|
| 1030 |
+
views,snapshot}`` et ``evaluation/{statistics,metric_result,
|
| 1031 |
+
benchmark_result}``.
|
| 1032 |
+
- ``reports_v2/html/data/`` : 7 imports vers
|
| 1033 |
+
``measurements/{statistics,difficulty,pricing,marginal_cost,
|
| 1034 |
+
rare_tokens,taxonomy_cooccurrence,taxonomy_intra_doc}``
|
| 1035 |
+
redirigés vers ``evaluation/{statistics,metrics/...}``.
|
| 1036 |
+
- ``reports_v2/html/views/`` : 6 imports vers
|
| 1037 |
+
``measurements/{taxonomy_comparison,incremental_comparison,
|
| 1038 |
+
levers,image_predictive,worst_lines,throughput}`` redirigés
|
| 1039 |
+
vers ``evaluation/metrics/...``.
|
| 1040 |
+
- ``picarones/reports_v2/__init__.py`` : nouveau loader
|
| 1041 |
+
``from picarones.reports_v2.html.generator import ReportGenerator``.
|
| 1042 |
+
- ``test_module_coverage.py::TEST_ONLY_BASELINE`` étendu à 3
|
| 1043 |
+
modules : ``statistics``, ``pricing``, ``difficulty``.
|
| 1044 |
+
- ``test_file_budgets.py`` : 2 entrées legacy retirées,
|
| 1045 |
+
remplacées par les chemins canoniques ; templates dir
|
| 1046 |
+
référencé via ``reports_v2/html/templates/``.
|
| 1047 |
+
- 28+ chemins de templates dans les tests redirigés vers
|
| 1048 |
+
``reports_v2/html/templates/``.
|
| 1049 |
+
- Tests qui faisaient ``from picarones import i18n`` redirigés
|
| 1050 |
+
vers ``from picarones.reports_v2 import i18n`` (le shim ne
|
| 1051 |
+
ré-exporte pas ``_get_labels_cached`` — privé).
|
| 1052 |
+
|
| 1053 |
+
État final de ``picarones/report/``
|
| 1054 |
+
-----------------------------------
|
| 1055 |
+
|
| 1056 |
+
Le répertoire ``picarones/report/`` ne contient désormais
|
| 1057 |
+
**que des shims** (~30 fichiers). Aucun module avec du
|
| 1058 |
+
contenu réel ne subsiste. Le canonique vit intégralement
|
| 1059 |
+
dans ``picarones/reports_v2/html/`` (générateur + renderers
|
| 1060 |
+
+ vues + données + templates + comparaison + snapshot).
|
| 1061 |
+
|
| 1062 |
+
**Acceptance Phase 5.E + Phase 5 entière** : 5019 tests
|
| 1063 |
+
passent, lint vert, architecture vérifiée (anti-cycles,
|
| 1064 |
+
file budgets, module coverage).
|
| 1065 |
+
|
| 1066 |
### Phase 6 — Pipelines OCR+LLM (`pipelines/`)
|
| 1067 |
|
| 1068 |
**Modules** : `pipelines/base.OCRLLMPipeline` (3 modes), `pipelines/
|
|
@@ -223,7 +223,7 @@ def report_cmd(results: str, output: str, lazy_images: bool, verbose: bool) -> N
|
|
| 223 |
"""
|
| 224 |
_setup_logging(verbose)
|
| 225 |
|
| 226 |
-
from picarones.
|
| 227 |
|
| 228 |
click.echo(f"Chargement des résultats : {results}")
|
| 229 |
try:
|
|
@@ -303,7 +303,7 @@ def demo_cmd(
|
|
| 303 |
picarones demo --with-history --with-robustness --docs 8
|
| 304 |
"""
|
| 305 |
from picarones.fixtures import generate_sample_benchmark
|
| 306 |
-
from picarones.
|
| 307 |
|
| 308 |
click.echo(f"Génération des données fictives ({docs} documents, 3 moteurs)…")
|
| 309 |
benchmark = generate_sample_benchmark(n_docs=docs)
|
|
|
|
| 223 |
"""
|
| 224 |
_setup_logging(verbose)
|
| 225 |
|
| 226 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 227 |
|
| 228 |
click.echo(f"Chargement des résultats : {results}")
|
| 229 |
try:
|
|
|
|
| 303 |
picarones demo --with-history --with-robustness --docs 8
|
| 304 |
"""
|
| 305 |
from picarones.fixtures import generate_sample_benchmark
|
| 306 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 307 |
|
| 308 |
click.echo(f"Génération des données fictives ({docs} documents, 3 moteurs)…")
|
| 309 |
benchmark = generate_sample_benchmark(n_docs=docs)
|
|
@@ -479,7 +479,7 @@ def compare_cmd(
|
|
| 479 |
"""
|
| 480 |
_setup_logging(verbose)
|
| 481 |
|
| 482 |
-
from picarones.
|
| 483 |
compare_benchmarks,
|
| 484 |
detect_regressions,
|
| 485 |
render_comparison_html,
|
|
|
|
| 479 |
"""
|
| 480 |
_setup_logging(verbose)
|
| 481 |
|
| 482 |
+
from picarones.reports_v2.html.comparison import (
|
| 483 |
compare_benchmarks,
|
| 484 |
detect_regressions,
|
| 485 |
render_comparison_html,
|
|
@@ -1,125 +1,24 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
- ``"fr"`` : français (défaut)
|
| 6 |
-
- ``"en"`` : anglais patrimonial (heritage English)
|
| 7 |
-
|
| 8 |
-
Depuis le Sprint 17, les traductions sont stockées dans des fichiers
|
| 9 |
-
JSON et chargées au premier accès. Phase 5 du retrait du legacy
|
| 10 |
-
(2026-05) : les fichiers ont été déplacés de
|
| 11 |
-
``picarones/report/i18n/{lang}.json`` vers
|
| 12 |
-
``picarones/reports_v2/i18n/{lang}.json``. Aucun changement
|
| 13 |
-
fonctionnel pour les consommateurs de ``get_labels``.
|
| 14 |
-
|
| 15 |
-
``TRANSLATIONS`` reste exposé comme dict pour compatibilité ascendante.
|
| 16 |
-
|
| 17 |
-
Sprint 30 — durcissement
|
| 18 |
-
------------------------
|
| 19 |
-
- Chargement lazy + thread-safe via verrou explicite ; les serveurs
|
| 20 |
-
web sous charge concurrente ne peuvent plus initialiser deux fois.
|
| 21 |
-
- ``reload_translations()`` exposé pour les tests qui modifient les
|
| 22 |
-
fichiers JSON à la volée.
|
| 23 |
-
- ``get_labels()`` mémoizé via ``functools.lru_cache`` pour absorber
|
| 24 |
-
le fallback ``lang → fr`` sans relire le dict à chaque appel.
|
| 25 |
"""
|
| 26 |
|
| 27 |
from __future__ import annotations
|
| 28 |
|
| 29 |
-
import
|
| 30 |
-
|
| 31 |
-
import
|
| 32 |
-
from
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
Un fichier ``{lang}.json`` définit les labels de la langue ``lang``.
|
| 47 |
-
Retourne toujours un dict non-vide, même si le dossier est manquant
|
| 48 |
-
(dans ce cas, le dict est vide et ``get_labels`` tombe sur un fallback).
|
| 49 |
-
"""
|
| 50 |
-
translations: dict[str, dict[str, str]] = {}
|
| 51 |
-
if not _I18N_DIR.is_dir():
|
| 52 |
-
return translations
|
| 53 |
-
for path in sorted(_I18N_DIR.glob("*.json")):
|
| 54 |
-
lang = path.stem
|
| 55 |
-
try:
|
| 56 |
-
with path.open(encoding="utf-8") as fh:
|
| 57 |
-
translations[lang] = json.load(fh)
|
| 58 |
-
except (OSError, json.JSONDecodeError) as e:
|
| 59 |
-
logger.warning("[i18n] fichier '%s' ignoré : %s", path, e)
|
| 60 |
-
return translations
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
def _get_translations() -> dict[str, dict[str, str]]:
|
| 64 |
-
"""Retourne le cache de translations, initialisé une seule fois.
|
| 65 |
-
|
| 66 |
-
Thread-safe : deux threads qui appellent simultanément en démarrage
|
| 67 |
-
ne déclencheront qu'une seule lecture disque.
|
| 68 |
-
"""
|
| 69 |
-
global _TRANSLATIONS_CACHE
|
| 70 |
-
if _TRANSLATIONS_CACHE is not None:
|
| 71 |
-
return _TRANSLATIONS_CACHE
|
| 72 |
-
with _LOAD_LOCK:
|
| 73 |
-
if _TRANSLATIONS_CACHE is None:
|
| 74 |
-
_TRANSLATIONS_CACHE = _load_translations()
|
| 75 |
-
return _TRANSLATIONS_CACHE
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
def reload_translations() -> None:
|
| 79 |
-
"""Force la relecture des fichiers JSON au prochain ``get_labels``.
|
| 80 |
-
|
| 81 |
-
Utile pour les tests qui modifient ``reports_v2/i18n/*.json`` à la volée.
|
| 82 |
-
"""
|
| 83 |
-
global _TRANSLATIONS_CACHE
|
| 84 |
-
with _LOAD_LOCK:
|
| 85 |
-
_TRANSLATIONS_CACHE = None
|
| 86 |
-
_get_labels_cached.cache_clear()
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
@lru_cache(maxsize=None)
|
| 90 |
-
def _get_labels_cached(lang: str) -> tuple[tuple[str, str], ...]:
|
| 91 |
-
"""Cache mémoïsé : ``lang -> tuple ordonné des paires``.
|
| 92 |
-
|
| 93 |
-
Le retour en tuple permet à ``lru_cache`` de mémoriser sans
|
| 94 |
-
contrainte de hashabilité, et est trivialement converti en dict
|
| 95 |
-
par ``get_labels`` à chaque appel (coût O(n)).
|
| 96 |
-
"""
|
| 97 |
-
translations = _get_translations()
|
| 98 |
-
labels = translations.get(lang) or translations.get("fr") or {}
|
| 99 |
-
return tuple(labels.items())
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
def get_labels(lang: str = "fr") -> dict[str, str]:
|
| 103 |
-
"""Retourne le dictionnaire de labels pour la langue donnée.
|
| 104 |
-
|
| 105 |
-
Parameters
|
| 106 |
-
----------
|
| 107 |
-
lang:
|
| 108 |
-
Code langue : ``"fr"`` (défaut) ou ``"en"``.
|
| 109 |
-
|
| 110 |
-
Returns
|
| 111 |
-
-------
|
| 112 |
-
dict
|
| 113 |
-
Labels traduits. Toujours valide : bascule sur ``"fr"`` si lang inconnu.
|
| 114 |
-
Si ``"fr"`` lui-même manque, retourne un dict vide (comportement dégradé
|
| 115 |
-
mais non bloquant).
|
| 116 |
-
"""
|
| 117 |
-
return dict(_get_labels_cached(lang))
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
# ``TRANSLATIONS`` reste accessible comme attribut module pour les
|
| 121 |
-
# consommateurs externes qui le lisaient directement. Initialisé
|
| 122 |
-
# paresseusement à l'import — n'engendre **pas** de lecture si le
|
| 123 |
-
# module n'est jamais utilisé.
|
| 124 |
-
TRANSLATIONS: dict[str, dict[str, str]] = _get_translations()
|
| 125 |
-
SUPPORTED_LANGS: list[str] = list(TRANSLATIONS.keys())
|
|
|
|
| 1 |
+
"""``picarones.i18n`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.i18n`. Phase 5.E du retrait
|
| 4 |
+
du legacy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
| 10 |
+
|
| 11 |
+
from picarones.reports_v2.i18n import * # noqa: F401, F403
|
| 12 |
+
from picarones.reports_v2.i18n import ( # noqa: F401
|
| 13 |
+
TRANSLATIONS,
|
| 14 |
+
SUPPORTED_LANGS,
|
| 15 |
+
get_labels,
|
| 16 |
+
reload_translations,
|
| 17 |
+
)
|
| 18 |
+
|
| 19 |
+
warnings.warn(
|
| 20 |
+
"picarones.i18n is deprecated and will be removed in 2.0. "
|
| 21 |
+
"Import from picarones.reports_v2.i18n instead.",
|
| 22 |
+
DeprecationWarning,
|
| 23 |
+
stacklevel=2,
|
| 24 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,5 +1,20 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
__all__ = ["ReportGenerator"]
|
|
|
|
| 1 |
+
"""``picarones.report`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html`. Phase 5.E du retrait
|
| 4 |
+
du legacy.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from __future__ import annotations
|
| 8 |
+
|
| 9 |
+
import warnings
|
| 10 |
+
|
| 11 |
+
from picarones.reports_v2.html import ReportGenerator # noqa: F401
|
| 12 |
+
|
| 13 |
+
warnings.warn(
|
| 14 |
+
"picarones.report is deprecated and will be removed in 2.0. "
|
| 15 |
+
"Import ReportGenerator from picarones.reports_v2.html instead.",
|
| 16 |
+
DeprecationWarning,
|
| 17 |
+
stacklevel=2,
|
| 18 |
+
)
|
| 19 |
|
| 20 |
__all__ = ["ReportGenerator"]
|
|
@@ -1,409 +1,18 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
aucun outil n'exposait la **comparaison** de deux runs côté rapport :
|
| 6 |
-
un chercheur qui itère sur 8 prompts ne pouvait pas voir d'un coup
|
| 7 |
-
*« Tesseract → GPT-4o version V2 a régressé de 0,8 pp en CER moyen
|
| 8 |
-
sur la strate paroissiaux par rapport à V1 »*.
|
| 9 |
-
|
| 10 |
-
Ce module fournit :
|
| 11 |
-
|
| 12 |
-
- ``load_benchmark_json(path)`` — charge le JSON produit par
|
| 13 |
-
``BenchmarkResult.as_dict()`` ou ``picarones run -o results.json``.
|
| 14 |
-
- ``compare_benchmarks(a, b)`` — calcule les deltas par moteur
|
| 15 |
-
(CER mean, WER mean, comptes de documents traités/échoués) et
|
| 16 |
-
par strate quand la métadonnée est présente.
|
| 17 |
-
- ``detect_regressions(diff, threshold)`` — liste les moteurs en
|
| 18 |
-
régression (delta CER > threshold) et en amélioration
|
| 19 |
-
(delta CER < -threshold).
|
| 20 |
-
- ``render_comparison_html(diff, output_path)`` — rendu HTML
|
| 21 |
-
auto-contenu minimal via Jinja2 pour partage.
|
| 22 |
-
|
| 23 |
-
Conventions
|
| 24 |
-
-----------
|
| 25 |
-
- Les deltas sont calculés ``b - a`` (donc positif = ``b`` est pire).
|
| 26 |
-
- Un moteur présent dans un seul run apparaît dans ``only_in_a`` /
|
| 27 |
-
``only_in_b``, jamais dans ``deltas``.
|
| 28 |
-
- Un moteur dont le ``mean_cer`` est ``None`` (échec total) est
|
| 29 |
-
signalé mais ne génère pas de delta numérique.
|
| 30 |
-
- ``threshold`` est en absolu (CER en fraction, pas en %). Défaut
|
| 31 |
-
0.005 = 0,5 pp.
|
| 32 |
"""
|
| 33 |
|
| 34 |
from __future__ import annotations
|
| 35 |
|
| 36 |
-
import
|
| 37 |
-
import logging
|
| 38 |
-
from dataclasses import dataclass, field
|
| 39 |
-
from pathlib import Path
|
| 40 |
-
from typing import Any, Optional
|
| 41 |
-
|
| 42 |
-
logger = logging.getLogger(__name__)
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
# ---------------------------------------------------------------------------
|
| 46 |
-
# Modèles
|
| 47 |
-
# ---------------------------------------------------------------------------
|
| 48 |
-
|
| 49 |
-
@dataclass
|
| 50 |
-
class EngineDelta:
|
| 51 |
-
"""Différence ``b - a`` pour un moteur donné."""
|
| 52 |
-
engine: str
|
| 53 |
-
cer_a: Optional[float]
|
| 54 |
-
cer_b: Optional[float]
|
| 55 |
-
delta_cer: Optional[float]
|
| 56 |
-
wer_a: Optional[float]
|
| 57 |
-
wer_b: Optional[float]
|
| 58 |
-
delta_wer: Optional[float]
|
| 59 |
-
docs_a: int
|
| 60 |
-
docs_b: int
|
| 61 |
-
failed_a: int
|
| 62 |
-
failed_b: int
|
| 63 |
-
is_regression: bool = False
|
| 64 |
-
is_improvement: bool = False
|
| 65 |
-
|
| 66 |
-
def as_dict(self) -> dict[str, Any]:
|
| 67 |
-
return {
|
| 68 |
-
"engine": self.engine,
|
| 69 |
-
"cer_a": self.cer_a,
|
| 70 |
-
"cer_b": self.cer_b,
|
| 71 |
-
"delta_cer": self.delta_cer,
|
| 72 |
-
"wer_a": self.wer_a,
|
| 73 |
-
"wer_b": self.wer_b,
|
| 74 |
-
"delta_wer": self.delta_wer,
|
| 75 |
-
"docs_a": self.docs_a,
|
| 76 |
-
"docs_b": self.docs_b,
|
| 77 |
-
"failed_a": self.failed_a,
|
| 78 |
-
"failed_b": self.failed_b,
|
| 79 |
-
"is_regression": self.is_regression,
|
| 80 |
-
"is_improvement": self.is_improvement,
|
| 81 |
-
}
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
@dataclass
|
| 85 |
-
class ComparisonResult:
|
| 86 |
-
"""Résultat d'une comparaison ``b - a`` entre deux runs."""
|
| 87 |
-
label_a: str
|
| 88 |
-
label_b: str
|
| 89 |
-
run_date_a: Optional[str]
|
| 90 |
-
run_date_b: Optional[str]
|
| 91 |
-
corpus_a: Optional[str]
|
| 92 |
-
corpus_b: Optional[str]
|
| 93 |
-
deltas: list[EngineDelta] = field(default_factory=list)
|
| 94 |
-
only_in_a: list[str] = field(default_factory=list)
|
| 95 |
-
only_in_b: list[str] = field(default_factory=list)
|
| 96 |
-
threshold: float = 0.005
|
| 97 |
-
|
| 98 |
-
def as_dict(self) -> dict[str, Any]:
|
| 99 |
-
return {
|
| 100 |
-
"label_a": self.label_a,
|
| 101 |
-
"label_b": self.label_b,
|
| 102 |
-
"run_date_a": self.run_date_a,
|
| 103 |
-
"run_date_b": self.run_date_b,
|
| 104 |
-
"corpus_a": self.corpus_a,
|
| 105 |
-
"corpus_b": self.corpus_b,
|
| 106 |
-
"threshold": self.threshold,
|
| 107 |
-
"deltas": [d.as_dict() for d in self.deltas],
|
| 108 |
-
"only_in_a": list(self.only_in_a),
|
| 109 |
-
"only_in_b": list(self.only_in_b),
|
| 110 |
-
"regressions": [d.as_dict() for d in self.deltas if d.is_regression],
|
| 111 |
-
"improvements": [d.as_dict() for d in self.deltas if d.is_improvement],
|
| 112 |
-
}
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
# ---------------------------------------------------------------------------
|
| 116 |
-
# Chargement
|
| 117 |
-
# ---------------------------------------------------------------------------
|
| 118 |
-
|
| 119 |
-
def load_benchmark_json(path: str | Path) -> dict[str, Any]:
|
| 120 |
-
"""Charge un JSON de benchmark depuis disque.
|
| 121 |
-
|
| 122 |
-
Accepte :
|
| 123 |
-
- le format ``BenchmarkResult.as_dict()`` (clé ``ranking``,
|
| 124 |
-
``engine_reports`` ou ``engines``) ;
|
| 125 |
-
- un dict déjà parsé ; dans ce cas, ``path`` peut être un dict.
|
| 126 |
-
"""
|
| 127 |
-
if isinstance(path, dict):
|
| 128 |
-
return path
|
| 129 |
-
p = Path(path)
|
| 130 |
-
if not p.exists():
|
| 131 |
-
raise FileNotFoundError(f"Fichier benchmark introuvable : {p}")
|
| 132 |
-
with p.open(encoding="utf-8") as fh:
|
| 133 |
-
data = json.load(fh)
|
| 134 |
-
if not isinstance(data, dict):
|
| 135 |
-
raise ValueError(f"Le JSON {p} doit être un dict.")
|
| 136 |
-
return data
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
# ---------------------------------------------------------------------------
|
| 140 |
-
# Comparaison
|
| 141 |
-
# ---------------------------------------------------------------------------
|
| 142 |
-
|
| 143 |
-
def _ranking_index(data: dict[str, Any]) -> dict[str, dict[str, Any]]:
|
| 144 |
-
"""Indexe ``ranking`` par nom de moteur — robuste aux deux formats.
|
| 145 |
-
|
| 146 |
-
Un ``BenchmarkResult.as_dict()`` expose ``ranking`` directement
|
| 147 |
-
(clés ``engine``, ``mean_cer``, …). Le format alternatif ``engines``
|
| 148 |
-
expose le même contenu sous des clés légèrement différentes —
|
| 149 |
-
on normalise vers le format ``ranking``.
|
| 150 |
-
"""
|
| 151 |
-
ranking = data.get("ranking")
|
| 152 |
-
if isinstance(ranking, list) and ranking:
|
| 153 |
-
return {
|
| 154 |
-
r["engine"]: {
|
| 155 |
-
"engine": r["engine"],
|
| 156 |
-
"mean_cer": r.get("mean_cer"),
|
| 157 |
-
"mean_wer": r.get("mean_wer"),
|
| 158 |
-
"documents": int(r.get("documents") or 0),
|
| 159 |
-
"failed": int(r.get("failed") or 0),
|
| 160 |
-
}
|
| 161 |
-
for r in ranking
|
| 162 |
-
if isinstance(r, dict) and r.get("engine")
|
| 163 |
-
}
|
| 164 |
-
# Fallback : ``engines`` (format report_data)
|
| 165 |
-
engines = data.get("engines") or []
|
| 166 |
-
out: dict[str, dict[str, Any]] = {}
|
| 167 |
-
if isinstance(engines, list):
|
| 168 |
-
for e in engines:
|
| 169 |
-
if not isinstance(e, dict):
|
| 170 |
-
continue
|
| 171 |
-
name = e.get("name") or e.get("engine")
|
| 172 |
-
if not name:
|
| 173 |
-
continue
|
| 174 |
-
out[name] = {
|
| 175 |
-
"engine": name,
|
| 176 |
-
"mean_cer": e.get("cer"),
|
| 177 |
-
"mean_wer": e.get("wer"),
|
| 178 |
-
"documents": int(e.get("documents") or 0),
|
| 179 |
-
"failed": int(e.get("failed") or 0),
|
| 180 |
-
}
|
| 181 |
-
return out
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
def _label_of(data: dict[str, Any], default: str) -> str:
|
| 185 |
-
meta = data.get("meta") or {}
|
| 186 |
-
return (
|
| 187 |
-
meta.get("corpus_name")
|
| 188 |
-
or (data.get("corpus") or {}).get("name")
|
| 189 |
-
or default
|
| 190 |
-
)
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
def _run_date_of(data: dict[str, Any]) -> Optional[str]:
|
| 194 |
-
return (
|
| 195 |
-
data.get("run_date")
|
| 196 |
-
or (data.get("meta") or {}).get("run_date")
|
| 197 |
-
)
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
def _corpus_of(data: dict[str, Any]) -> Optional[str]:
|
| 201 |
-
meta = data.get("meta") or {}
|
| 202 |
-
return (
|
| 203 |
-
meta.get("corpus_source")
|
| 204 |
-
or (data.get("corpus") or {}).get("source")
|
| 205 |
-
or meta.get("corpus_name")
|
| 206 |
-
)
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
def _safe_delta(a: Optional[float], b: Optional[float]) -> Optional[float]:
|
| 210 |
-
if a is None or b is None:
|
| 211 |
-
return None
|
| 212 |
-
return float(b) - float(a)
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
def compare_benchmarks(
|
| 216 |
-
a: str | Path | dict[str, Any],
|
| 217 |
-
b: str | Path | dict[str, Any],
|
| 218 |
-
*,
|
| 219 |
-
threshold: float = 0.005,
|
| 220 |
-
label_a: str = "A",
|
| 221 |
-
label_b: str = "B",
|
| 222 |
-
) -> ComparisonResult:
|
| 223 |
-
"""Compare deux runs et retourne les deltas par moteur.
|
| 224 |
-
|
| 225 |
-
Convention : un delta CER positif signifie que ``b`` est *moins bon*
|
| 226 |
-
que ``a`` (régression). Un seuil ``threshold`` strictement positif
|
| 227 |
-
(en fraction, ex. 0,005 = 0,5 pp) discrimine régression / bruit.
|
| 228 |
-
"""
|
| 229 |
-
da = load_benchmark_json(a) if not isinstance(a, dict) else a
|
| 230 |
-
db = load_benchmark_json(b) if not isinstance(b, dict) else b
|
| 231 |
-
|
| 232 |
-
idx_a = _ranking_index(da)
|
| 233 |
-
idx_b = _ranking_index(db)
|
| 234 |
-
|
| 235 |
-
common = sorted(set(idx_a) & set(idx_b))
|
| 236 |
-
only_a = sorted(set(idx_a) - set(idx_b))
|
| 237 |
-
only_b = sorted(set(idx_b) - set(idx_a))
|
| 238 |
-
|
| 239 |
-
deltas: list[EngineDelta] = []
|
| 240 |
-
for name in common:
|
| 241 |
-
ea = idx_a[name]
|
| 242 |
-
eb = idx_b[name]
|
| 243 |
-
delta_cer = _safe_delta(ea["mean_cer"], eb["mean_cer"])
|
| 244 |
-
delta_wer = _safe_delta(ea["mean_wer"], eb["mean_wer"])
|
| 245 |
-
regression = bool(delta_cer is not None and delta_cer > threshold)
|
| 246 |
-
improvement = bool(delta_cer is not None and delta_cer < -threshold)
|
| 247 |
-
deltas.append(
|
| 248 |
-
EngineDelta(
|
| 249 |
-
engine=name,
|
| 250 |
-
cer_a=ea["mean_cer"],
|
| 251 |
-
cer_b=eb["mean_cer"],
|
| 252 |
-
delta_cer=delta_cer,
|
| 253 |
-
wer_a=ea["mean_wer"],
|
| 254 |
-
wer_b=eb["mean_wer"],
|
| 255 |
-
delta_wer=delta_wer,
|
| 256 |
-
docs_a=int(ea["documents"]),
|
| 257 |
-
docs_b=int(eb["documents"]),
|
| 258 |
-
failed_a=int(ea["failed"]),
|
| 259 |
-
failed_b=int(eb["failed"]),
|
| 260 |
-
is_regression=regression,
|
| 261 |
-
is_improvement=improvement,
|
| 262 |
-
)
|
| 263 |
-
)
|
| 264 |
-
|
| 265 |
-
# Tri : régressions (delta décroissant) puis améliorations (delta croissant).
|
| 266 |
-
deltas.sort(key=lambda d: (
|
| 267 |
-
not d.is_regression,
|
| 268 |
-
-(d.delta_cer if d.delta_cer is not None else 0.0),
|
| 269 |
-
))
|
| 270 |
-
|
| 271 |
-
return ComparisonResult(
|
| 272 |
-
label_a=label_a,
|
| 273 |
-
label_b=label_b,
|
| 274 |
-
run_date_a=_run_date_of(da),
|
| 275 |
-
run_date_b=_run_date_of(db),
|
| 276 |
-
corpus_a=_corpus_of(da),
|
| 277 |
-
corpus_b=_corpus_of(db),
|
| 278 |
-
deltas=deltas,
|
| 279 |
-
only_in_a=only_a,
|
| 280 |
-
only_in_b=only_b,
|
| 281 |
-
threshold=float(threshold),
|
| 282 |
-
)
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
def detect_regressions(
|
| 286 |
-
diff: ComparisonResult,
|
| 287 |
-
) -> list[EngineDelta]:
|
| 288 |
-
"""Retourne uniquement les moteurs en régression dans ``diff``."""
|
| 289 |
-
return [d for d in diff.deltas if d.is_regression]
|
| 290 |
-
|
| 291 |
-
|
| 292 |
-
# ---------------------------------------------------------------------------
|
| 293 |
-
# Rendu HTML
|
| 294 |
-
# ---------------------------------------------------------------------------
|
| 295 |
-
|
| 296 |
-
_COMPARISON_TEMPLATE = """<!DOCTYPE html>
|
| 297 |
-
<html lang="fr">
|
| 298 |
-
<head>
|
| 299 |
-
<meta charset="UTF-8">
|
| 300 |
-
<title>Picarones — Comparaison de runs</title>
|
| 301 |
-
<style>
|
| 302 |
-
body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
|
| 303 |
-
max-width: 980px; margin: 2em auto; padding: 0 1em; color: #111; }
|
| 304 |
-
h1 { border-bottom: 2px solid #333; padding-bottom: .4em; }
|
| 305 |
-
h2 { margin-top: 1.6em; color: #333; }
|
| 306 |
-
table { width: 100%; border-collapse: collapse; margin: 1em 0; }
|
| 307 |
-
th, td { padding: .5em .8em; text-align: left; border-bottom: 1px solid #ddd; }
|
| 308 |
-
th { background: #f3f3f3; }
|
| 309 |
-
td.num, th.num { text-align: right; font-variant-numeric: tabular-nums; }
|
| 310 |
-
tr.regression td { background: #fef0f0; }
|
| 311 |
-
tr.improvement td { background: #f0fef2; }
|
| 312 |
-
.delta-pos { color: #b0322a; font-weight: 600; }
|
| 313 |
-
.delta-neg { color: #1b8a3a; font-weight: 600; }
|
| 314 |
-
.badge { display: inline-block; padding: .15em .55em; border-radius: 4px;
|
| 315 |
-
font-size: .8em; font-weight: 600; }
|
| 316 |
-
.badge.reg { background: #fde2e0; color: #8a1c14; }
|
| 317 |
-
.badge.imp { background: #e0f8e6; color: #0a5e22; }
|
| 318 |
-
.meta { color: #666; font-size: .9em; }
|
| 319 |
-
.empty { color: #999; font-style: italic; }
|
| 320 |
-
</style>
|
| 321 |
-
</head>
|
| 322 |
-
<body>
|
| 323 |
-
<h1>Comparaison : {{ diff.label_a }} → {{ diff.label_b }}</h1>
|
| 324 |
-
<p class="meta">
|
| 325 |
-
Run A : {{ diff.run_date_a or "?" }} · corpus {{ diff.corpus_a or "?" }}<br>
|
| 326 |
-
Run B : {{ diff.run_date_b or "?" }} · corpus {{ diff.corpus_b or "?" }}<br>
|
| 327 |
-
Seuil régression / amélioration : {{ "%.3f"|format(diff.threshold) }}
|
| 328 |
-
({{ "%.1f"|format(diff.threshold * 100) }} pp de CER absolu).
|
| 329 |
-
</p>
|
| 330 |
-
|
| 331 |
-
<h2>Moteurs comparés ({{ diff.deltas|length }})</h2>
|
| 332 |
-
{% if not diff.deltas %}
|
| 333 |
-
<p class="empty">Aucun moteur commun aux deux runs.</p>
|
| 334 |
-
{% else %}
|
| 335 |
-
<table>
|
| 336 |
-
<thead>
|
| 337 |
-
<tr>
|
| 338 |
-
<th scope=\"col\">Moteur</th>
|
| 339 |
-
<th scope=\"col\" class="num">CER A</th>
|
| 340 |
-
<th scope=\"col\" class="num">CER B</th>
|
| 341 |
-
<th scope=\"col\" class="num">Δ CER</th>
|
| 342 |
-
<th scope=\"col\" class="num">Docs A → B</th>
|
| 343 |
-
<th scope=\"col\">État</th>
|
| 344 |
-
</tr>
|
| 345 |
-
</thead>
|
| 346 |
-
<tbody>
|
| 347 |
-
{% for d in diff.deltas %}
|
| 348 |
-
<tr class="{% if d.is_regression %}regression{% elif d.is_improvement %}improvement{% endif %}">
|
| 349 |
-
<td>{{ d.engine }}</td>
|
| 350 |
-
<td class="num">{{ "%.3f"|format(d.cer_a) if d.cer_a is not none else "—" }}</td>
|
| 351 |
-
<td class="num">{{ "%.3f"|format(d.cer_b) if d.cer_b is not none else "—" }}</td>
|
| 352 |
-
<td class="num">
|
| 353 |
-
{% if d.delta_cer is none %}—
|
| 354 |
-
{% elif d.delta_cer > 0 %}<span class="delta-pos">+{{ "%.3f"|format(d.delta_cer) }}</span>
|
| 355 |
-
{% else %}<span class="delta-neg">{{ "%.3f"|format(d.delta_cer) }}</span>
|
| 356 |
-
{% endif %}
|
| 357 |
-
</td>
|
| 358 |
-
<td class="num">{{ d.docs_a }} → {{ d.docs_b }}</td>
|
| 359 |
-
<td>
|
| 360 |
-
{% if d.is_regression %}<span class="badge reg">régression</span>
|
| 361 |
-
{% elif d.is_improvement %}<span class="badge imp">amélioration</span>
|
| 362 |
-
{% else %}<span class="meta">stable</span>{% endif %}
|
| 363 |
-
</td>
|
| 364 |
-
</tr>
|
| 365 |
-
{% endfor %}
|
| 366 |
-
</tbody>
|
| 367 |
-
</table>
|
| 368 |
-
{% endif %}
|
| 369 |
-
|
| 370 |
-
{% if diff.only_in_a %}
|
| 371 |
-
<h2>Présents uniquement dans A</h2>
|
| 372 |
-
<ul>{% for n in diff.only_in_a %}<li>{{ n }}</li>{% endfor %}</ul>
|
| 373 |
-
{% endif %}
|
| 374 |
-
|
| 375 |
-
{% if diff.only_in_b %}
|
| 376 |
-
<h2>Présents uniquement dans B</h2>
|
| 377 |
-
<ul>{% for n in diff.only_in_b %}<li>{{ n }}</li>{% endfor %}</ul>
|
| 378 |
-
{% endif %}
|
| 379 |
-
|
| 380 |
-
<p class="meta">Picarones — Sprint 28 · rapport de comparaison de runs.</p>
|
| 381 |
-
</body>
|
| 382 |
-
</html>
|
| 383 |
-
"""
|
| 384 |
-
|
| 385 |
-
|
| 386 |
-
def render_comparison_html(
|
| 387 |
-
diff: ComparisonResult,
|
| 388 |
-
output_path: str | Path,
|
| 389 |
-
) -> Path:
|
| 390 |
-
"""Sérialise un ``ComparisonResult`` en rapport HTML auto-contenu."""
|
| 391 |
-
from jinja2 import Environment, select_autoescape
|
| 392 |
-
|
| 393 |
-
env = Environment(autoescape=select_autoescape(["html", "j2"]))
|
| 394 |
-
template = env.from_string(_COMPARISON_TEMPLATE)
|
| 395 |
-
html = template.render(diff=diff)
|
| 396 |
-
out = Path(output_path)
|
| 397 |
-
out.parent.mkdir(parents=True, exist_ok=True)
|
| 398 |
-
out.write_text(html, encoding="utf-8")
|
| 399 |
-
return out
|
| 400 |
|
|
|
|
| 401 |
|
| 402 |
-
|
| 403 |
-
"
|
| 404 |
-
"
|
| 405 |
-
|
| 406 |
-
|
| 407 |
-
|
| 408 |
-
"render_comparison_html",
|
| 409 |
-
]
|
|
|
|
| 1 |
+
"""``picarones.report.comparison`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html.comparison`. Phase 5.E
|
| 4 |
+
du retrait du legacy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
from picarones.reports_v2.html.comparison import * # noqa: F401, F403
|
| 12 |
|
| 13 |
+
warnings.warn(
|
| 14 |
+
"picarones.report.comparison is deprecated and will be removed in 2.0. "
|
| 15 |
+
"Import from picarones.reports_v2.html.comparison instead.",
|
| 16 |
+
DeprecationWarning,
|
| 17 |
+
stacklevel=2,
|
| 18 |
+
)
|
|
|
|
|
|
|
@@ -1,466 +1,18 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
- Chart.js et diff2html (depuis cdnjs)
|
| 6 |
-
- CSS et JavaScript de l'application
|
| 7 |
-
|
| 8 |
-
Vues disponibles
|
| 9 |
-
----------------
|
| 10 |
-
1. Classement — tableau triable par colonne (CER, WER, MER, WIL)
|
| 11 |
-
2. Galerie — grille d'images avec badge CER coloré
|
| 12 |
-
3. Document — image zoomable + diff coloré GT / OCR par moteur
|
| 13 |
-
4. Analyses — histogramme CER + graphique radar
|
| 14 |
-
|
| 15 |
-
Architecture
|
| 16 |
-
------------
|
| 17 |
-
Ce module est l'**orchestrateur**. Les responsabilités lourdes sont
|
| 18 |
-
découpées en sous-modules :
|
| 19 |
-
|
| 20 |
-
- :mod:`picarones.report.assets` — chargement vendor.js, encodage
|
| 21 |
-
base64 d'images, externalisation lazy.
|
| 22 |
-
- :mod:`picarones.report.report_data` — construction du dict JSON
|
| 23 |
-
passé au template (engines, documents, statistiques, Pareto, etc.).
|
| 24 |
-
- :mod:`picarones.report.render_helpers` — couleurs / SVG mutualisés.
|
| 25 |
-
|
| 26 |
-
Rétrocompat
|
| 27 |
-
-----------
|
| 28 |
-
Deux noms historiques sont **encore importés par des tests** sous
|
| 29 |
-
leur préfixe ``_`` et doivent être préservés :
|
| 30 |
-
|
| 31 |
-
- ``_build_report_data`` (importé par 14 fichiers de tests).
|
| 32 |
-
- ``_cer_color`` (importé par ``tests/report/test_report.py``).
|
| 33 |
-
|
| 34 |
-
Les autres noms ``_pct``, ``_safe``, ``_cer_bg``, ``_encode_image_b64``,
|
| 35 |
-
``_encode_images_b64_from_result``, ``_externalize_images_to_dir``,
|
| 36 |
-
``_load_vendor_js`` sont soit utilisés en interne (les 3 derniers,
|
| 37 |
-
voir :meth:`ReportGenerator.generate`), soit accessibles via leur
|
| 38 |
-
nom canonique dans :mod:`picarones.report.assets` ou
|
| 39 |
-
:mod:`picarones.report.render_helpers`.
|
| 40 |
"""
|
| 41 |
|
| 42 |
from __future__ import annotations
|
| 43 |
|
| 44 |
-
import
|
| 45 |
-
import logging
|
| 46 |
-
from pathlib import Path
|
| 47 |
-
from typing import Any, Optional
|
| 48 |
-
|
| 49 |
-
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 50 |
-
from picarones.measurements.statistics import build_critical_difference_svg
|
| 51 |
-
from picarones.reports_v2._helpers.assets import (
|
| 52 |
-
encode_images_b64_from_result as _encode_images_b64_from_result,
|
| 53 |
-
externalize_images_to_dir as _externalize_images_to_dir,
|
| 54 |
-
load_vendor_js as _load_vendor_js,
|
| 55 |
-
)
|
| 56 |
-
|
| 57 |
-
# Ré-exports rétrocompat consommés par les tests externes (cf. docstring
|
| 58 |
-
# de module). La directive de fin de ligne documente l'intention de
|
| 59 |
-
# ré-export et empêche ruff de marquer l'import comme inutilisé.
|
| 60 |
-
from picarones.reports_v2._helpers.render_helpers import cer_step_color as _cer_color # noqa: F401
|
| 61 |
-
from picarones.report.report_data import build_report_data as _build_report_data # noqa: F401
|
| 62 |
-
|
| 63 |
-
logger = logging.getLogger(__name__)
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
# ---------------------------------------------------------------------------
|
| 67 |
-
# Rendu Jinja2
|
| 68 |
-
# ---------------------------------------------------------------------------
|
| 69 |
-
|
| 70 |
-
# Depuis le Sprint 16, le template monolithique ~3100 lignes a été découpé en
|
| 71 |
-
# fichiers externes dans ``picarones/report/templates/`` (CSS, JS, vues HTML).
|
| 72 |
-
# ``base.html.j2`` assemble le tout via ``{% include %}``.
|
| 73 |
-
|
| 74 |
-
_TEMPLATES_DIR = Path(__file__).parent / "templates"
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
def _build_jinja_env():
|
| 78 |
-
"""Construit l'Environment Jinja2 pour le rapport.
|
| 79 |
-
|
| 80 |
-
Autoescape désactivé : le comportement est équivalent à celui du
|
| 81 |
-
``_HTML_TEMPLATE.format()`` historique. Les variables injectées
|
| 82 |
-
(JSON embarqué, SVG généré, synthèse narrative issue de templates
|
| 83 |
-
internes) sont toutes produites par le code Picarones et ne
|
| 84 |
-
nécessitent pas d'échappement HTML.
|
| 85 |
-
"""
|
| 86 |
-
from jinja2 import Environment, FileSystemLoader
|
| 87 |
-
env = Environment(
|
| 88 |
-
loader=FileSystemLoader(str(_TEMPLATES_DIR)),
|
| 89 |
-
autoescape=False,
|
| 90 |
-
keep_trailing_newline=True,
|
| 91 |
-
)
|
| 92 |
-
return env
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
# ---------------------------------------------------------------------------
|
| 96 |
-
# Classe principale
|
| 97 |
-
# ---------------------------------------------------------------------------
|
| 98 |
-
|
| 99 |
-
class ReportGenerator:
|
| 100 |
-
"""Génère un rapport HTML interactif depuis un BenchmarkResult.
|
| 101 |
-
|
| 102 |
-
Usage
|
| 103 |
-
-----
|
| 104 |
-
>>> from picarones.report import ReportGenerator
|
| 105 |
-
>>> gen = ReportGenerator(benchmark_result)
|
| 106 |
-
>>> path = gen.generate("rapport.html")
|
| 107 |
-
>>> # Rapport en anglais :
|
| 108 |
-
>>> gen_en = ReportGenerator(benchmark_result, lang="en")
|
| 109 |
-
>>> path_en = gen_en.generate("report.html")
|
| 110 |
-
"""
|
| 111 |
-
|
| 112 |
-
def __init__(
|
| 113 |
-
self,
|
| 114 |
-
benchmark: BenchmarkResult,
|
| 115 |
-
images_b64: Optional[dict[str, str]] = None,
|
| 116 |
-
lang: str = "fr",
|
| 117 |
-
normalization_profile: Any = None,
|
| 118 |
-
lazy_images: bool = False,
|
| 119 |
-
) -> None:
|
| 120 |
-
"""
|
| 121 |
-
Parameters
|
| 122 |
-
----------
|
| 123 |
-
benchmark:
|
| 124 |
-
Résultat de benchmark à visualiser.
|
| 125 |
-
images_b64:
|
| 126 |
-
Dictionnaire {doc_id: data-URI base64 OU url relative} des images.
|
| 127 |
-
Si None, le générateur cherche dans ``benchmark.metadata["_images_b64"]``.
|
| 128 |
-
Si ``lazy_images=True``, la valeur attendue est une URL relative
|
| 129 |
-
comme ``"report-assets/<doc>.png"``.
|
| 130 |
-
lang:
|
| 131 |
-
Code langue du rapport : ``"fr"`` (défaut) ou ``"en"``.
|
| 132 |
-
normalization_profile:
|
| 133 |
-
Profil de normalisation effectivement utilisé (Sprint 27 — pour
|
| 134 |
-
le snapshot de reproductibilité). ``None`` retombe sur le
|
| 135 |
-
profil mentionné dans ``benchmark.metadata["normalization_profile"]``
|
| 136 |
-
s'il est présent, sinon snapshot indisponible.
|
| 137 |
-
lazy_images:
|
| 138 |
-
Sprint A5 (M-16) — si ``True``, les images sont écrites en
|
| 139 |
-
fichiers PNG/JPEG dans ``<output_dir>/report-assets/`` à côté
|
| 140 |
-
du HTML, et référencées via ``<img loading="lazy">``.
|
| 141 |
-
Le rapport reste auto-portant si on copie aussi le dossier
|
| 142 |
-
d'assets. Utile pour les corpus > 50 documents (un rapport
|
| 143 |
-
base64 monolithique de 1 000 docs dépasse 200 MB et fait
|
| 144 |
-
ramer le navigateur). En mode mono-doc ou démo : laisser
|
| 145 |
-
``False`` pour un fichier HTML unique transportable.
|
| 146 |
-
"""
|
| 147 |
-
self.benchmark = benchmark
|
| 148 |
-
self.images_b64: dict[str, str] = images_b64 or {}
|
| 149 |
-
self.lang = lang
|
| 150 |
-
self.normalization_profile = normalization_profile
|
| 151 |
-
self.lazy_images = lazy_images
|
| 152 |
-
|
| 153 |
-
# Récupérer les images embarquées dans les metadata (fixtures)
|
| 154 |
-
if not self.images_b64:
|
| 155 |
-
self.images_b64 = benchmark.metadata.get("_images_b64", {}) # type: ignore[assignment]
|
| 156 |
-
|
| 157 |
-
# Sprint 27 — fallback : profil de normalisation depuis les metadata
|
| 158 |
-
if self.normalization_profile is None:
|
| 159 |
-
self.normalization_profile = benchmark.metadata.get("normalization_profile")
|
| 160 |
-
|
| 161 |
-
def generate(self, output_path: str | Path) -> Path:
|
| 162 |
-
"""Génère le fichier HTML et le sauvegarde sur disque.
|
| 163 |
|
| 164 |
-
|
| 165 |
-
----------
|
| 166 |
-
output_path:
|
| 167 |
-
Chemin du fichier HTML à écrire.
|
| 168 |
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
output_path = Path(output_path)
|
| 177 |
-
output_path.parent.mkdir(parents=True, exist_ok=True)
|
| 178 |
-
|
| 179 |
-
# Sprint A5 (M-16) — externalisation des images si lazy_images=True
|
| 180 |
-
# ou auto-encodage base64 sinon. Les deux modes alimentent la même
|
| 181 |
-
# variable ``images_b64`` (le nom est conservé pour rétrocompat ;
|
| 182 |
-
# en mode lazy la valeur est une URL relative au lieu d'un data-URI).
|
| 183 |
-
# En mode lazy, on **force** l'externalisation même si self.images_b64
|
| 184 |
-
# est pré-rempli (par les fixtures, par metadata, etc.) — sinon le
|
| 185 |
-
# rapport contiendrait quand même des data-URI géants.
|
| 186 |
-
if self.lazy_images:
|
| 187 |
-
images_b64 = _externalize_images_to_dir(
|
| 188 |
-
self.benchmark, output_path.parent,
|
| 189 |
-
)
|
| 190 |
-
else:
|
| 191 |
-
images_b64 = self.images_b64
|
| 192 |
-
if not images_b64:
|
| 193 |
-
images_b64 = _encode_images_b64_from_result(self.benchmark)
|
| 194 |
-
|
| 195 |
-
labels = get_labels(self.lang)
|
| 196 |
-
report_data = _build_report_data(self.benchmark, images_b64)
|
| 197 |
-
|
| 198 |
-
# Sprint 27 — snapshots de reproductibilité (pricing, glossaire,
|
| 199 |
-
# profil de normalisation, environnement). Embarqués dans le JSON
|
| 200 |
-
# du rapport pour qu'un lecteur puisse régénérer la synthèse, le
|
| 201 |
-
# Pareto et le glossaire sans accès au code source.
|
| 202 |
-
from picarones.report.snapshot import snapshot_all
|
| 203 |
-
report_data["snapshots"] = snapshot_all(
|
| 204 |
-
lang=self.lang,
|
| 205 |
-
normalization_profile=self.normalization_profile,
|
| 206 |
-
)
|
| 207 |
-
|
| 208 |
-
report_json = json.dumps(report_data, ensure_ascii=False, separators=(",", ":"))
|
| 209 |
-
i18n_json = json.dumps(labels, ensure_ascii=False, separators=(",", ":"))
|
| 210 |
-
chartjs_js = _load_vendor_js("chart.umd.min.js")
|
| 211 |
-
|
| 212 |
-
# Sprint 17 — rendu SVG du CDD côté serveur (statique, pas de JS)
|
| 213 |
-
cdd_svg = build_critical_difference_svg(
|
| 214 |
-
report_data.get("statistics", {}).get("nemenyi", {}),
|
| 215 |
-
)
|
| 216 |
-
|
| 217 |
-
# Sprint 18 — synthèse factuelle narrative (déterministe, sans LLM)
|
| 218 |
-
from picarones.measurements.narrative import build_synthesis
|
| 219 |
-
synthesis = build_synthesis(report_data, lang=self.lang)
|
| 220 |
-
|
| 221 |
-
# Sprint 20 — glossaire contextuel chargé depuis YAML
|
| 222 |
-
from picarones.reports_v2.glossary import load_glossary
|
| 223 |
-
glossary = load_glossary(self.lang)
|
| 224 |
-
glossary_json = json.dumps(glossary, ensure_ascii=False, separators=(",", ":"))
|
| 225 |
-
|
| 226 |
-
section_html = self._build_section_html(report_data, labels)
|
| 227 |
-
|
| 228 |
-
env = _build_jinja_env()
|
| 229 |
-
template = env.get_template("base.html.j2")
|
| 230 |
-
html = template.render(
|
| 231 |
-
corpus_name=self.benchmark.corpus_name,
|
| 232 |
-
picarones_version=self.benchmark.picarones_version,
|
| 233 |
-
report_data_json=report_json,
|
| 234 |
-
i18n_json=i18n_json,
|
| 235 |
-
html_lang=labels.get("html_lang", "fr"),
|
| 236 |
-
chartjs_inline=chartjs_js,
|
| 237 |
-
critical_difference_svg=cdd_svg,
|
| 238 |
-
friedman=report_data.get("statistics", {}).get("friedman", {}),
|
| 239 |
-
synthesis=synthesis,
|
| 240 |
-
glossary_json=glossary_json,
|
| 241 |
-
**section_html,
|
| 242 |
-
)
|
| 243 |
-
|
| 244 |
-
output_path.write_text(html, encoding="utf-8")
|
| 245 |
-
return output_path.resolve()
|
| 246 |
-
|
| 247 |
-
def _build_section_html(
|
| 248 |
-
self, report_data: dict, labels: dict[str, str],
|
| 249 |
-
) -> dict[str, str]:
|
| 250 |
-
"""Construit toutes les sections HTML conditionnelles du rapport.
|
| 251 |
-
|
| 252 |
-
Chaque renderer (NER, calibration, philologie, etc.) est appelé
|
| 253 |
-
de manière indépendante. Une section retourne ``""`` si aucun
|
| 254 |
-
moteur n'a de signal pour elle — le template gère l'affichage
|
| 255 |
-
conditionnel.
|
| 256 |
-
|
| 257 |
-
Returns
|
| 258 |
-
-------
|
| 259 |
-
dict[str, str]
|
| 260 |
-
Map ``{nom_de_section: html}`` à splatter dans
|
| 261 |
-
``template.render(**section_html)``.
|
| 262 |
-
"""
|
| 263 |
-
engines = report_data.get("engines", [])
|
| 264 |
-
|
| 265 |
-
# Sprint 37 — section inter-moteurs (matrice de divergence + oracle).
|
| 266 |
-
from picarones.reports_v2.html.renderers.inter_engine import (
|
| 267 |
-
build_divergence_matrix_html,
|
| 268 |
-
build_oracle_gap_html,
|
| 269 |
-
)
|
| 270 |
-
# Sprint 41 — section NER (résumé F1 par moteur + heatmap par catégorie).
|
| 271 |
-
from picarones.reports_v2.html.renderers.ner import (
|
| 272 |
-
build_ner_per_category_html,
|
| 273 |
-
build_ner_summary_html,
|
| 274 |
-
)
|
| 275 |
-
# Sprint 43 — section calibration (tableau ECE/MCE + grille de
|
| 276 |
-
# reliability diagrams par moteur).
|
| 277 |
-
from picarones.reports_v2.html.renderers.calibration import (
|
| 278 |
-
build_calibration_summary_html,
|
| 279 |
-
build_reliability_diagrams_grid_html,
|
| 280 |
-
)
|
| 281 |
-
# Sprint 46 — section stratifiée (tableau par strate).
|
| 282 |
-
from picarones.reports_v2.html.renderers.stratification import (
|
| 283 |
-
build_stratified_ranking_html,
|
| 284 |
-
)
|
| 285 |
-
# Sprint 62 — profil philologique (6 sections adaptive).
|
| 286 |
-
from picarones.reports_v2.html.renderers.philological import (
|
| 287 |
-
build_philological_profile_html,
|
| 288 |
-
)
|
| 289 |
-
# Sprint 86 — A.II.5 : recherchabilité fuzzy + séquences numériques.
|
| 290 |
-
from picarones.reports_v2.html.renderers.searchability import (
|
| 291 |
-
build_searchability_summary_html,
|
| 292 |
-
)
|
| 293 |
-
from picarones.reports_v2.html.renderers.numerical_sequences import (
|
| 294 |
-
build_numerical_sequences_html,
|
| 295 |
-
)
|
| 296 |
-
# Sprint 87 — A.II.2 : lisibilité (delta Flesch).
|
| 297 |
-
from picarones.reports_v2.html.renderers.readability import (
|
| 298 |
-
build_readability_summary_html,
|
| 299 |
-
)
|
| 300 |
-
# Sprint 89 — A.II.8b : spécialisation inter-moteurs.
|
| 301 |
-
from picarones.reports_v2.html.renderers.specialization import (
|
| 302 |
-
build_specialization_html,
|
| 303 |
-
)
|
| 304 |
-
# Chantier 3 (post-Sprint 97) — 3 vues thématiques composées.
|
| 305 |
-
from picarones.reports_v2.html.views import (
|
| 306 |
-
build_advanced_taxonomy_view_html,
|
| 307 |
-
build_diagnostics_view_html,
|
| 308 |
-
build_economics_view_html,
|
| 309 |
-
)
|
| 310 |
-
# Sprint « câblage des modules test-only » (mai 2026) — sections
|
| 311 |
-
# qui consomment les nouvelles métriques calculées dans
|
| 312 |
-
# ``report_data.extra_metrics``.
|
| 313 |
-
from picarones.reports_v2.html.renderers.marginal_cost import (
|
| 314 |
-
build_marginal_cost_html,
|
| 315 |
-
)
|
| 316 |
-
from picarones.reports_v2.html.renderers.rare_token_recall import (
|
| 317 |
-
build_rare_token_recall_html,
|
| 318 |
-
)
|
| 319 |
-
from picarones.reports_v2.html.renderers.taxonomy_cooccurrence import (
|
| 320 |
-
build_taxonomy_cooccurrence_html,
|
| 321 |
-
)
|
| 322 |
-
from picarones.reports_v2.html.renderers.taxonomy_intra_doc import (
|
| 323 |
-
build_taxonomy_intra_doc_html,
|
| 324 |
-
)
|
| 325 |
-
|
| 326 |
-
# Spécialisation : construit une map {engine: counts} depuis les
|
| 327 |
-
# ``aggregated_taxonomy`` ; un moteur sans taxonomie est exclu.
|
| 328 |
-
taxos: dict = {}
|
| 329 |
-
for eng in engines:
|
| 330 |
-
tax = eng.get("aggregated_taxonomy")
|
| 331 |
-
if isinstance(tax, dict):
|
| 332 |
-
counts = tax.get("counts") if "counts" in tax else tax
|
| 333 |
-
if isinstance(counts, dict) and counts:
|
| 334 |
-
taxos[eng.get("name", "?")] = {
|
| 335 |
-
k: float(v) for k, v in counts.items()
|
| 336 |
-
if isinstance(v, (int, float))
|
| 337 |
-
}
|
| 338 |
-
|
| 339 |
-
return {
|
| 340 |
-
# Sprint 37
|
| 341 |
-
"divergence_matrix_html": build_divergence_matrix_html(
|
| 342 |
-
report_data.get("inter_engine_analysis"), labels=labels,
|
| 343 |
-
),
|
| 344 |
-
"oracle_gap_html": build_oracle_gap_html(
|
| 345 |
-
report_data.get("inter_engine_analysis"), labels=labels,
|
| 346 |
-
),
|
| 347 |
-
# Sprint 41
|
| 348 |
-
"ner_summary_html": build_ner_summary_html(engines, labels=labels),
|
| 349 |
-
"ner_per_category_html": build_ner_per_category_html(engines, labels=labels),
|
| 350 |
-
# Sprint 43
|
| 351 |
-
"calibration_summary_html": build_calibration_summary_html(
|
| 352 |
-
engines, labels=labels,
|
| 353 |
-
),
|
| 354 |
-
"reliability_diagrams_html": build_reliability_diagrams_grid_html(
|
| 355 |
-
engines, labels=labels,
|
| 356 |
-
),
|
| 357 |
-
# Sprint 46
|
| 358 |
-
"stratified_ranking_html": build_stratified_ranking_html(
|
| 359 |
-
report_data.get("stratified_ranking"),
|
| 360 |
-
report_data.get("available_strata"),
|
| 361 |
-
report_data.get("corpus_homogeneity"),
|
| 362 |
-
labels=labels,
|
| 363 |
-
),
|
| 364 |
-
# Sprint 62
|
| 365 |
-
"philological_profile_html": build_philological_profile_html(
|
| 366 |
-
engines, labels=labels,
|
| 367 |
-
),
|
| 368 |
-
# Sprint 86
|
| 369 |
-
"searchability_html": build_searchability_summary_html(
|
| 370 |
-
engines, labels=labels,
|
| 371 |
-
),
|
| 372 |
-
"numerical_sequences_html": build_numerical_sequences_html(
|
| 373 |
-
engines, labels=labels,
|
| 374 |
-
),
|
| 375 |
-
# Sprint 87
|
| 376 |
-
"readability_html": build_readability_summary_html(
|
| 377 |
-
engines, labels=labels,
|
| 378 |
-
),
|
| 379 |
-
# Sprint 89
|
| 380 |
-
"specialization_html": build_specialization_html(taxos, labels=labels),
|
| 381 |
-
# Chantier 3 — vues thématiques composées
|
| 382 |
-
"economics_view_html": build_economics_view_html(
|
| 383 |
-
report_data, labels=labels,
|
| 384 |
-
engine_reports=self.benchmark.engine_reports,
|
| 385 |
-
),
|
| 386 |
-
"advanced_taxonomy_view_html": build_advanced_taxonomy_view_html(
|
| 387 |
-
report_data, labels=labels,
|
| 388 |
-
),
|
| 389 |
-
"diagnostics_view_html": build_diagnostics_view_html(
|
| 390 |
-
report_data, labels=labels,
|
| 391 |
-
),
|
| 392 |
-
# Sprint « câblage des modules test-only » (mai 2026) :
|
| 393 |
-
# 4 nouvelles sections pour les modules câblés en
|
| 394 |
-
# ``report_data.extra_metrics``. Adaptive : "" si pas de signal.
|
| 395 |
-
"taxonomy_cooccurrence_html": build_taxonomy_cooccurrence_html(
|
| 396 |
-
report_data.get("taxonomy_cooccurrence"), labels=labels,
|
| 397 |
-
),
|
| 398 |
-
"taxonomy_intra_doc_html": build_taxonomy_intra_doc_html(
|
| 399 |
-
report_data.get("taxonomy_intra_doc"), labels=labels,
|
| 400 |
-
),
|
| 401 |
-
"rare_token_recall_html": build_rare_token_recall_html(
|
| 402 |
-
report_data.get("rare_token_recall"), labels=labels,
|
| 403 |
-
),
|
| 404 |
-
"marginal_cost_html": build_marginal_cost_html(
|
| 405 |
-
report_data.get("marginal_cost"), labels=labels,
|
| 406 |
-
),
|
| 407 |
-
}
|
| 408 |
-
|
| 409 |
-
@classmethod
|
| 410 |
-
def from_json(cls, json_path: str | Path, **kwargs) -> "ReportGenerator":
|
| 411 |
-
"""Crée un générateur depuis un fichier JSON de résultats.
|
| 412 |
-
|
| 413 |
-
Compatible avec les fichiers produits par ``BenchmarkResult.to_json()``.
|
| 414 |
-
Les images base64 doivent être passées via ``kwargs["images_b64"]``
|
| 415 |
-
si elles ne sont pas dans le JSON.
|
| 416 |
-
"""
|
| 417 |
-
import json as _json
|
| 418 |
-
|
| 419 |
-
data = _json.loads(Path(json_path).read_text(encoding="utf-8"))
|
| 420 |
-
|
| 421 |
-
# Reconstruction minimale d'un BenchmarkResult depuis le dict
|
| 422 |
-
from picarones.measurements.metrics import MetricsResult
|
| 423 |
-
from picarones.evaluation.benchmark_result import DocumentResult, EngineReport
|
| 424 |
-
|
| 425 |
-
engine_reports = []
|
| 426 |
-
for er_data in data.get("engine_reports", []):
|
| 427 |
-
doc_results = []
|
| 428 |
-
for dr_data in er_data.get("document_results", []):
|
| 429 |
-
m = dr_data["metrics"]
|
| 430 |
-
metrics = MetricsResult(
|
| 431 |
-
cer=m["cer"], cer_nfc=m["cer_nfc"], cer_caseless=m["cer_caseless"],
|
| 432 |
-
wer=m["wer"], wer_normalized=m["wer_normalized"],
|
| 433 |
-
mer=m["mer"], wil=m["wil"],
|
| 434 |
-
reference_length=m["reference_length"],
|
| 435 |
-
hypothesis_length=m["hypothesis_length"],
|
| 436 |
-
error=m.get("error"),
|
| 437 |
-
)
|
| 438 |
-
doc_results.append(DocumentResult(
|
| 439 |
-
doc_id=dr_data["doc_id"],
|
| 440 |
-
image_path=dr_data["image_path"],
|
| 441 |
-
ground_truth=dr_data["ground_truth"],
|
| 442 |
-
hypothesis=dr_data["hypothesis"],
|
| 443 |
-
metrics=metrics,
|
| 444 |
-
duration_seconds=dr_data.get("duration_seconds", 0.0),
|
| 445 |
-
engine_error=dr_data.get("engine_error"),
|
| 446 |
-
))
|
| 447 |
-
engine_reports.append(EngineReport(
|
| 448 |
-
engine_name=er_data["engine_name"],
|
| 449 |
-
engine_version=er_data.get("engine_version", "unknown"),
|
| 450 |
-
engine_config=er_data.get("engine_config", {}),
|
| 451 |
-
document_results=doc_results,
|
| 452 |
-
))
|
| 453 |
-
|
| 454 |
-
corpus_info = data.get("corpus", {})
|
| 455 |
-
bm = BenchmarkResult(
|
| 456 |
-
corpus_name=corpus_info.get("name", "Corpus"),
|
| 457 |
-
corpus_source=corpus_info.get("source"),
|
| 458 |
-
document_count=corpus_info.get("document_count", 0),
|
| 459 |
-
engine_reports=engine_reports,
|
| 460 |
-
run_date=data.get("run_date", ""),
|
| 461 |
-
picarones_version=data.get("picarones_version", ""),
|
| 462 |
-
metadata=data.get("metadata", {}),
|
| 463 |
-
)
|
| 464 |
-
|
| 465 |
-
images_b64 = kwargs.pop("images_b64", {})
|
| 466 |
-
return cls(bm, images_b64=images_b64, **kwargs)
|
|
|
|
| 1 |
+
"""``picarones.report.generator`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html.generator`. Phase 5.E
|
| 4 |
+
du retrait du legacy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
from picarones.reports_v2.html.generator import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
warnings.warn(
|
| 14 |
+
"picarones.report.generator is deprecated and will be removed in 2.0. "
|
| 15 |
+
"Import from picarones.reports_v2.html.generator instead.",
|
| 16 |
+
DeprecationWarning,
|
| 17 |
+
stacklevel=2,
|
| 18 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,132 +1,21 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
dict prêt pour Jinja. Cette fonction empilait par sprint des blocs
|
| 6 |
-
indépendants — engines, documents, statistiques, scatter plots,
|
| 7 |
-
front Pareto, etc.
|
| 8 |
-
|
| 9 |
-
Ce sous-package éclate la construction en modules thématiques :
|
| 10 |
-
|
| 11 |
-
- :mod:`engines` — résumé par moteur (``engines_summary``).
|
| 12 |
-
- :mod:`documents` — vue galerie + détail + difficulté Sprint 7.
|
| 13 |
-
- :mod:`statistics` — Wilcoxon, Friedman, Nemenyi, bootstrap CIs,
|
| 14 |
-
reliability curves, Venn, error clusters, corrélations.
|
| 15 |
-
- :mod:`scatter` — Sprint 10 : Gini vs CER, ratio vs anchor.
|
| 16 |
-
- :mod:`pareto` — Sprint 19 : 3 fronts Pareto + métadonnées pricing.
|
| 17 |
-
Expose deux fonctions séparées : :func:`attach_engine_costs`
|
| 18 |
-
(mute) et :func:`build_pareto_section` (pure).
|
| 19 |
-
|
| 20 |
-
L'API publique :func:`build_report_data` orchestre ces modules dans
|
| 21 |
-
le bon ordre. La séquence Pareto en deux temps
|
| 22 |
-
(``attach_engine_costs`` → ``build_pareto_section``) rend la
|
| 23 |
-
mutation explicite — les fonctions ``build_*`` du sous-package
|
| 24 |
-
sont pures sauf ``attach_engine_costs`` dont le nom le dit.
|
| 25 |
"""
|
| 26 |
|
| 27 |
from __future__ import annotations
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
if TYPE_CHECKING:
|
| 32 |
-
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 33 |
|
| 34 |
-
from picarones.
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
)
|
| 38 |
-
from picarones.report.report_data.engines import build_engines_summary
|
| 39 |
-
from picarones.report.report_data.extra_metrics import (
|
| 40 |
-
compute_marginal_cost_section,
|
| 41 |
-
compute_rare_token_recall_per_engine,
|
| 42 |
-
compute_taxonomy_cooccurrence_section,
|
| 43 |
-
compute_taxonomy_intra_doc_section,
|
| 44 |
-
)
|
| 45 |
-
from picarones.report.report_data.pareto import (
|
| 46 |
-
attach_engine_costs,
|
| 47 |
-
build_pareto_section,
|
| 48 |
-
)
|
| 49 |
-
from picarones.report.report_data.scatter import (
|
| 50 |
-
build_gini_vs_cer,
|
| 51 |
-
build_ratio_vs_anchor,
|
| 52 |
-
)
|
| 53 |
-
from picarones.report.report_data.statistics import (
|
| 54 |
-
build_bootstrap_cis,
|
| 55 |
-
build_correlation_per_engine,
|
| 56 |
-
build_error_clusters,
|
| 57 |
-
build_friedman_and_nemenyi,
|
| 58 |
-
build_pairwise_wilcoxon,
|
| 59 |
-
build_reliability_curves,
|
| 60 |
-
build_venn_data,
|
| 61 |
)
|
| 62 |
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
Ordre critique :
|
| 70 |
-
|
| 71 |
-
1. Construire ``engines_summary`` (pur).
|
| 72 |
-
2. Construire ``documents`` puis annoter avec la difficulté (mute
|
| 73 |
-
``documents``).
|
| 74 |
-
3. **Attacher** les coûts à ``engines_summary`` (mute, nom
|
| 75 |
-
explicite).
|
| 76 |
-
4. **Construire** le bloc Pareto (pure, lit les coûts attachés).
|
| 77 |
-
"""
|
| 78 |
-
engines_summary = build_engines_summary(benchmark)
|
| 79 |
-
documents = build_documents(benchmark, images_b64)
|
| 80 |
-
annotate_documents_with_difficulty(benchmark, documents)
|
| 81 |
-
|
| 82 |
-
attach_engine_costs(engines_summary, benchmark)
|
| 83 |
-
pareto_data = build_pareto_section(engines_summary)
|
| 84 |
-
|
| 85 |
-
return {
|
| 86 |
-
"meta": {
|
| 87 |
-
"corpus_name": benchmark.corpus_name,
|
| 88 |
-
"corpus_source": benchmark.corpus_source,
|
| 89 |
-
"document_count": benchmark.document_count,
|
| 90 |
-
"run_date": benchmark.run_date,
|
| 91 |
-
"picarones_version": benchmark.picarones_version,
|
| 92 |
-
"metadata": benchmark.metadata,
|
| 93 |
-
},
|
| 94 |
-
"ranking": benchmark.ranking(),
|
| 95 |
-
"engines": engines_summary,
|
| 96 |
-
"documents": documents,
|
| 97 |
-
# Sprint 7
|
| 98 |
-
"statistics": {
|
| 99 |
-
"pairwise_wilcoxon": build_pairwise_wilcoxon(benchmark),
|
| 100 |
-
"bootstrap_cis": build_bootstrap_cis(benchmark),
|
| 101 |
-
**build_friedman_and_nemenyi(benchmark),
|
| 102 |
-
},
|
| 103 |
-
"reliability_curves": build_reliability_curves(benchmark),
|
| 104 |
-
"venn_data": build_venn_data(benchmark),
|
| 105 |
-
"error_clusters": build_error_clusters(benchmark),
|
| 106 |
-
"correlation_per_engine": build_correlation_per_engine(benchmark),
|
| 107 |
-
# Sprint 10
|
| 108 |
-
"gini_vs_cer": build_gini_vs_cer(benchmark),
|
| 109 |
-
"ratio_vs_anchor": build_ratio_vs_anchor(benchmark),
|
| 110 |
-
# Sprint 19 — vue Pareto coût/qualité avec variantes d'axe
|
| 111 |
-
"pareto": pareto_data,
|
| 112 |
-
# Sprint 36 — analyse inter-moteurs (divergence taxonomique +
|
| 113 |
-
# complémentarité / oracle). ``None`` si moins de 2 moteurs.
|
| 114 |
-
"inter_engine_analysis": benchmark.inter_engine_analysis,
|
| 115 |
-
# Sprint 45-46 — stratification par script_type
|
| 116 |
-
"available_strata": benchmark.available_strata(),
|
| 117 |
-
"stratified_ranking": benchmark.stratified_ranking() or None,
|
| 118 |
-
"corpus_homogeneity": benchmark.corpus_homogeneity(),
|
| 119 |
-
# Sprint « câblage des modules test-only » (mai 2026) — métriques
|
| 120 |
-
# corpus-wide qui jusque-là n'étaient pas remontées dans le rapport.
|
| 121 |
-
# Sprint 71 (A.I.1) : recall sur tokens rares (hapax + dis legomena).
|
| 122 |
-
"rare_token_recall": compute_rare_token_recall_per_engine(benchmark),
|
| 123 |
-
# Sprint 75 (A.I.4) : co-occurrence taxonomique inter-classes.
|
| 124 |
-
"taxonomy_cooccurrence": compute_taxonomy_cooccurrence_section(benchmark),
|
| 125 |
-
# Sprint 76 (A.I.4) : heatmap class × position (intra-document).
|
| 126 |
-
"taxonomy_intra_doc": compute_taxonomy_intra_doc_section(benchmark),
|
| 127 |
-
# Sprint 91 (A.II.6) : matrice de coût marginal entre paires de moteurs.
|
| 128 |
-
"marginal_cost": compute_marginal_cost_section(engines_summary),
|
| 129 |
-
}
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
__all__ = ["build_report_data"]
|
|
|
|
| 1 |
+
"""``picarones.report.report_data`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html.data`. Phase 5.E du
|
| 4 |
+
retrait du legacy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
from picarones.reports_v2.html.data import * # noqa: F401, F403
|
| 12 |
+
from picarones.reports_v2.html.data import ( # noqa: F401
|
| 13 |
+
build_report_data,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
)
|
| 15 |
|
| 16 |
+
warnings.warn(
|
| 17 |
+
"picarones.report.report_data is deprecated and will be removed in 2.0. "
|
| 18 |
+
"Import from picarones.reports_v2.html.data instead.",
|
| 19 |
+
DeprecationWarning,
|
| 20 |
+
stacklevel=2,
|
| 21 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,30 +1,18 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
importer depuis l'extérieur du sous-package — ces helpers sont
|
| 6 |
-
spécifiques aux conventions du dict JSON consommé par le template.
|
| 7 |
"""
|
| 8 |
|
| 9 |
from __future__ import annotations
|
| 10 |
|
| 11 |
-
|
| 12 |
|
|
|
|
| 13 |
|
| 14 |
-
|
| 15 |
-
"
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
"""Formate un ratio ∈ [0, 1] en chaîne pourcentage : ``0.4723 → "47.23 %"``.
|
| 21 |
-
|
| 22 |
-
``None`` → ``"—"``. Conservé pour rétrocompat avec d'éventuels
|
| 23 |
-
callers externes (Sprint 7 historique).
|
| 24 |
-
"""
|
| 25 |
-
if v is None:
|
| 26 |
-
return "—"
|
| 27 |
-
return f"{v * 100:.{decimals}f} %"
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
__all__ = ["safe_round", "percent_string"]
|
|
|
|
| 1 |
+
"""``picarones.report.report_data._helpers`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html.data._helpers`. Phase 5.E
|
| 4 |
+
du retrait du legacy.
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
| 10 |
|
| 11 |
+
from picarones.reports_v2.html.data._helpers import * # noqa: F401, F403
|
| 12 |
|
| 13 |
+
warnings.warn(
|
| 14 |
+
"picarones.report.report_data._helpers is deprecated and will be removed in 2.0. "
|
| 15 |
+
"Import from picarones.reports_v2.html.data._helpers instead.",
|
| 16 |
+
DeprecationWarning,
|
| 17 |
+
stacklevel=2,
|
| 18 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,167 +1,18 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
les champs spécifiques aux pipelines OCR+LLM (intermédiaire, mode,
|
| 6 |
-
sur-normalisation).
|
| 7 |
-
|
| 8 |
-
:func:`annotate_documents_with_difficulty` enrichit ensuite chaque
|
| 9 |
-
document avec son score de difficulté intrinsèque (Sprint 7).
|
| 10 |
"""
|
| 11 |
|
| 12 |
from __future__ import annotations
|
| 13 |
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
from picarones.core.diff_utils import compute_char_diff, compute_word_diff
|
| 17 |
-
from picarones.measurements.difficulty import (
|
| 18 |
-
compute_all_difficulties,
|
| 19 |
-
difficulty_label,
|
| 20 |
-
)
|
| 21 |
-
from picarones.report.report_data._helpers import safe_round
|
| 22 |
-
|
| 23 |
-
if TYPE_CHECKING:
|
| 24 |
-
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
def build_documents(
|
| 28 |
-
benchmark: "BenchmarkResult", images_b64: dict[str, str],
|
| 29 |
-
) -> list[dict]:
|
| 30 |
-
"""Retourne la liste ordonnée des documents prêts pour le template.
|
| 31 |
-
|
| 32 |
-
L'ordre des documents préserve l'ordre d'apparition (premier moteur
|
| 33 |
-
d'abord, puis compléments depuis les moteurs suivants si certains
|
| 34 |
-
documents ne sont pas couverts par tous les moteurs).
|
| 35 |
-
"""
|
| 36 |
-
seen_doc_ids: set[str] = set()
|
| 37 |
-
doc_ids_ordered: list[str] = []
|
| 38 |
-
for report in benchmark.engine_reports:
|
| 39 |
-
for dr in report.document_results:
|
| 40 |
-
if dr.doc_id not in seen_doc_ids:
|
| 41 |
-
seen_doc_ids.add(dr.doc_id)
|
| 42 |
-
doc_ids_ordered.append(dr.doc_id)
|
| 43 |
-
|
| 44 |
-
# Index croisé : doc_id → {engine_name → DocumentResult}
|
| 45 |
-
doc_engine_map: dict[str, dict] = {did: {} for did in doc_ids_ordered}
|
| 46 |
-
for report in benchmark.engine_reports:
|
| 47 |
-
for dr in report.document_results:
|
| 48 |
-
doc_engine_map.setdefault(dr.doc_id, {})[report.engine_name] = dr
|
| 49 |
-
|
| 50 |
-
documents: list[dict] = []
|
| 51 |
-
engine_names = [r.engine_name for r in benchmark.engine_reports]
|
| 52 |
-
for doc_id in doc_ids_ordered:
|
| 53 |
-
engine_results: list[dict] = []
|
| 54 |
-
gt = ""
|
| 55 |
-
image_path = ""
|
| 56 |
-
for engine_name in engine_names:
|
| 57 |
-
dr = doc_engine_map[doc_id].get(engine_name)
|
| 58 |
-
if dr is None:
|
| 59 |
-
continue
|
| 60 |
-
gt = dr.ground_truth
|
| 61 |
-
image_path = dr.image_path
|
| 62 |
-
er_entry = _build_engine_result_entry(engine_name, dr)
|
| 63 |
-
engine_results.append(er_entry)
|
| 64 |
|
| 65 |
-
|
| 66 |
-
cer_values = [er["cer"] for er in engine_results if er["error"] is None]
|
| 67 |
-
mean_cer = sum(cer_values) / len(cer_values) if cer_values else 1.0
|
| 68 |
-
best_engine = min(engine_results, key=lambda x: x["cer"], default=None)
|
| 69 |
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
documents.append({
|
| 78 |
-
"doc_id": doc_id,
|
| 79 |
-
"image_path": image_path,
|
| 80 |
-
"image_b64": images_b64.get(doc_id, ""),
|
| 81 |
-
"ground_truth": gt,
|
| 82 |
-
"mean_cer": safe_round(mean_cer),
|
| 83 |
-
"best_engine": best_engine["engine"] if best_engine else "",
|
| 84 |
-
"engine_results": engine_results,
|
| 85 |
-
"script_type": script_type,
|
| 86 |
-
})
|
| 87 |
-
return documents
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
def _build_engine_result_entry(engine_name: str, dr) -> dict:
|
| 91 |
-
"""Construit une entrée moteur pour un document donné (extrait pour lisibilité)."""
|
| 92 |
-
diff_ops = compute_char_diff(dr.ground_truth, dr.hypothesis)
|
| 93 |
-
er_entry: dict = {
|
| 94 |
-
"engine": engine_name,
|
| 95 |
-
"hypothesis": dr.hypothesis,
|
| 96 |
-
"cer": safe_round(dr.metrics.cer),
|
| 97 |
-
"cer_diplomatic": safe_round(dr.metrics.cer_diplomatic) if dr.metrics.cer_diplomatic is not None else None,
|
| 98 |
-
"wer": safe_round(dr.metrics.wer),
|
| 99 |
-
"mer": safe_round(dr.metrics.mer),
|
| 100 |
-
"wil": safe_round(dr.metrics.wil),
|
| 101 |
-
"duration": dr.duration_seconds,
|
| 102 |
-
"error": dr.engine_error,
|
| 103 |
-
"diff": diff_ops,
|
| 104 |
-
}
|
| 105 |
-
# Champs spécifiques aux pipelines OCR+LLM
|
| 106 |
-
if dr.ocr_intermediate is not None:
|
| 107 |
-
er_entry["ocr_intermediate"] = dr.ocr_intermediate
|
| 108 |
-
er_entry["ocr_diff"] = compute_word_diff(dr.ground_truth, dr.ocr_intermediate)
|
| 109 |
-
er_entry["llm_correction_diff"] = compute_word_diff(dr.ocr_intermediate, dr.hypothesis)
|
| 110 |
-
if dr.pipeline_metadata:
|
| 111 |
-
on = dr.pipeline_metadata.get("over_normalization")
|
| 112 |
-
if on is not None:
|
| 113 |
-
er_entry["over_normalization"] = on
|
| 114 |
-
er_entry["pipeline_mode"] = dr.pipeline_metadata.get("pipeline_mode")
|
| 115 |
-
# Sprint 5 — métriques avancées par document
|
| 116 |
-
if dr.char_scores is not None:
|
| 117 |
-
er_entry["ligature_score"] = safe_round(dr.char_scores.get("ligature", {}).get("score"))
|
| 118 |
-
er_entry["diacritic_score"] = safe_round(dr.char_scores.get("diacritic", {}).get("score"))
|
| 119 |
-
if dr.taxonomy is not None:
|
| 120 |
-
er_entry["taxonomy"] = dr.taxonomy
|
| 121 |
-
if dr.structure is not None:
|
| 122 |
-
er_entry["structure"] = dr.structure
|
| 123 |
-
if dr.image_quality is not None:
|
| 124 |
-
er_entry["image_quality"] = dr.image_quality
|
| 125 |
-
# Sprint 10
|
| 126 |
-
if dr.line_metrics is not None:
|
| 127 |
-
er_entry["line_metrics"] = dr.line_metrics
|
| 128 |
-
if dr.hallucination_metrics is not None:
|
| 129 |
-
er_entry["hallucination_metrics"] = dr.hallucination_metrics
|
| 130 |
-
return er_entry
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
def annotate_documents_with_difficulty(
|
| 134 |
-
benchmark: "BenchmarkResult", documents: list[dict],
|
| 135 |
-
) -> None:
|
| 136 |
-
"""Annote chaque document du dict avec son score de difficulté (Sprint 7).
|
| 137 |
-
|
| 138 |
-
Modifie ``documents`` en place. Les valeurs par défaut ``0.5`` /
|
| 139 |
-
``"Modéré"`` sont retournées si la difficulté n'a pas pu être
|
| 140 |
-
calculée (par exemple corpus dégénéré).
|
| 141 |
-
"""
|
| 142 |
-
doc_ids_ordered = [d["doc_id"] for d in documents]
|
| 143 |
-
gt_map = {d["doc_id"]: d["ground_truth"] for d in documents}
|
| 144 |
-
cer_map: dict[str, dict[str, float]] = {d["doc_id"]: {} for d in documents}
|
| 145 |
-
iq_map: dict[str, float] = {}
|
| 146 |
-
for report in benchmark.engine_reports:
|
| 147 |
-
for dr in report.document_results:
|
| 148 |
-
cer_map.setdefault(dr.doc_id, {})[report.engine_name] = safe_round(dr.metrics.cer)
|
| 149 |
-
if dr.image_quality and "quality_score" in dr.image_quality:
|
| 150 |
-
iq_map[dr.doc_id] = dr.image_quality["quality_score"]
|
| 151 |
-
difficulty_scores = compute_all_difficulties(
|
| 152 |
-
doc_ids=doc_ids_ordered,
|
| 153 |
-
ground_truths=gt_map,
|
| 154 |
-
cer_map=cer_map,
|
| 155 |
-
image_quality_map=iq_map or None,
|
| 156 |
-
)
|
| 157 |
-
for doc in documents:
|
| 158 |
-
ds = difficulty_scores.get(doc["doc_id"])
|
| 159 |
-
if ds:
|
| 160 |
-
doc["difficulty_score"] = safe_round(ds.score)
|
| 161 |
-
doc["difficulty_label"] = difficulty_label(ds.score)
|
| 162 |
-
else:
|
| 163 |
-
doc["difficulty_score"] = 0.5
|
| 164 |
-
doc["difficulty_label"] = "Modéré"
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
__all__ = ["build_documents", "annotate_documents_with_difficulty"]
|
|
|
|
| 1 |
+
"""``picarones.report.report_data.documents`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html.data.documents`. Phase 5.E
|
| 4 |
+
du retrait du legacy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
from picarones.reports_v2.html.data.documents import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
warnings.warn(
|
| 14 |
+
"picarones.report.report_data.documents is deprecated and will be removed in 2.0. "
|
| 15 |
+
"Import from picarones.reports_v2.html.data.documents instead.",
|
| 16 |
+
DeprecationWarning,
|
| 17 |
+
stacklevel=2,
|
| 18 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,103 +1,18 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
patrimoniales (Sprint 5), distribution d'erreurs (Sprint 10), NER
|
| 6 |
-
(Sprint 41), calibration (Sprint 43), profil philologique (Sprint
|
| 7 |
-
62), recherchabilité + séquences numériques (Sprint 86), lisibilité
|
| 8 |
-
(Sprint 87) et indicateurs pipeline OCR+LLM.
|
| 9 |
-
|
| 10 |
-
Les coûts (durée moyenne, prix par 1k pages, CO₂) sont ajoutés
|
| 11 |
-
ultérieurement par :mod:`picarones.report.report_data.pareto` qui
|
| 12 |
-
en a besoin pour calculer les fronts.
|
| 13 |
"""
|
| 14 |
|
| 15 |
from __future__ import annotations
|
| 16 |
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
from picarones.report.report_data._helpers import safe_round
|
| 20 |
-
|
| 21 |
-
if TYPE_CHECKING:
|
| 22 |
-
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
def build_engines_summary(benchmark: "BenchmarkResult") -> list[dict]:
|
| 26 |
-
"""Retourne la liste des dicts moteur, une entrée par ``EngineReport``."""
|
| 27 |
-
engines_summary: list[dict] = []
|
| 28 |
-
for report in benchmark.engine_reports:
|
| 29 |
-
agg = report.aggregated_metrics
|
| 30 |
-
diplo_agg = agg.get("cer_diplomatic", {})
|
| 31 |
-
|
| 32 |
-
line_metrics = report.aggregated_line_metrics
|
| 33 |
-
halluc = report.aggregated_hallucination
|
| 34 |
-
|
| 35 |
-
entry: dict = {
|
| 36 |
-
"name": report.engine_name,
|
| 37 |
-
"version": report.engine_version,
|
| 38 |
-
"cer": safe_round(agg.get("cer", {}).get("mean")),
|
| 39 |
-
"wer": safe_round(agg.get("wer", {}).get("mean")),
|
| 40 |
-
"mer": safe_round(agg.get("mer", {}).get("mean")),
|
| 41 |
-
"wil": safe_round(agg.get("wil", {}).get("mean")),
|
| 42 |
-
"cer_median": safe_round(agg.get("cer", {}).get("median")),
|
| 43 |
-
"cer_min": safe_round(agg.get("cer", {}).get("min")),
|
| 44 |
-
"cer_max": safe_round(agg.get("cer", {}).get("max")),
|
| 45 |
-
"doc_count": agg.get("document_count", 0),
|
| 46 |
-
"failed": agg.get("failed_count", 0),
|
| 47 |
-
# CER diplomatique (après normalisation historique : ſ=s, u=v, i=j…)
|
| 48 |
-
"cer_diplomatic": safe_round(diplo_agg.get("mean")) if diplo_agg else None,
|
| 49 |
-
"cer_diplomatic_profile": diplo_agg.get("profile"),
|
| 50 |
-
# Distribution pour l'histogramme : liste des CER individuels
|
| 51 |
-
"cer_values": [
|
| 52 |
-
safe_round(dr.metrics.cer)
|
| 53 |
-
for dr in report.document_results
|
| 54 |
-
if dr.metrics.error is None
|
| 55 |
-
],
|
| 56 |
-
"cer_diplomatic_values": [
|
| 57 |
-
safe_round(dr.metrics.cer_diplomatic)
|
| 58 |
-
for dr in report.document_results
|
| 59 |
-
if dr.metrics.error is None and dr.metrics.cer_diplomatic is not None
|
| 60 |
-
],
|
| 61 |
-
# Champs pipeline OCR+LLM (vides pour les moteurs OCR seuls)
|
| 62 |
-
"is_pipeline": report.is_pipeline,
|
| 63 |
-
"pipeline_info": report.pipeline_info,
|
| 64 |
-
# Sprint 5 — métriques avancées patrimoniales
|
| 65 |
-
"ligature_score": safe_round(report.ligature_score) if report.ligature_score is not None else None,
|
| 66 |
-
"diacritic_score": safe_round(report.diacritic_score) if report.diacritic_score is not None else None,
|
| 67 |
-
"aggregated_confusion": report.aggregated_confusion,
|
| 68 |
-
"aggregated_taxonomy": report.aggregated_taxonomy,
|
| 69 |
-
"aggregated_structure": report.aggregated_structure,
|
| 70 |
-
"aggregated_image_quality": report.aggregated_image_quality,
|
| 71 |
-
# Sprint 10 — distribution des erreurs + hallucinations VLM
|
| 72 |
-
"gini": safe_round(line_metrics.get("gini_mean")) if line_metrics else None,
|
| 73 |
-
"cer_p90": safe_round(line_metrics.get("percentiles", {}).get("p90")) if line_metrics else None,
|
| 74 |
-
"cer_p99": safe_round(line_metrics.get("percentiles", {}).get("p99")) if line_metrics else None,
|
| 75 |
-
"catastrophic_rate_30": safe_round(line_metrics.get("catastrophic_rate", {}).get("0.3")) if line_metrics else None,
|
| 76 |
-
"aggregated_line_metrics": line_metrics,
|
| 77 |
-
"anchor_score": safe_round(halluc.get("anchor_score_mean")) if halluc else None,
|
| 78 |
-
"length_ratio": safe_round(halluc.get("length_ratio_mean")) if halluc else None,
|
| 79 |
-
"hallucinating_doc_rate": safe_round(halluc.get("hallucinating_doc_rate")) if halluc else None,
|
| 80 |
-
"aggregated_hallucination": halluc,
|
| 81 |
-
# Sprint 41 — NER agrégé (None si aucun calcul effectué)
|
| 82 |
-
"aggregated_ner": report.aggregated_ner,
|
| 83 |
-
# Sprint 43 — calibration agrégée (None si aucune confidence
|
| 84 |
-
# n'a été exposée par le moteur sur ce corpus)
|
| 85 |
-
"aggregated_calibration": report.aggregated_calibration,
|
| 86 |
-
# Sprint 62 — profil philologique agrégé (None si aucun
|
| 87 |
-
# signal philologique sur le corpus pour ce moteur)
|
| 88 |
-
"aggregated_philological": report.aggregated_philological,
|
| 89 |
-
# Sprint 86 — A.II.5 (recherchabilité fuzzy + séquences
|
| 90 |
-
# numériques). None si aucun document n'a de signal.
|
| 91 |
-
"aggregated_searchability": report.aggregated_searchability,
|
| 92 |
-
"aggregated_numerical_sequences": (
|
| 93 |
-
report.aggregated_numerical_sequences
|
| 94 |
-
),
|
| 95 |
-
# Sprint 87 — A.II.2 (delta Flesch agrégé)
|
| 96 |
-
"aggregated_readability": report.aggregated_readability,
|
| 97 |
-
"is_vlm": report.pipeline_info.get("is_vlm", False) if report.pipeline_info else False,
|
| 98 |
-
}
|
| 99 |
-
engines_summary.append(entry)
|
| 100 |
-
return engines_summary
|
| 101 |
|
|
|
|
| 102 |
|
| 103 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""``picarones.report.report_data.engines`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html.data.engines`. Phase 5.E
|
| 4 |
+
du retrait du legacy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
from picarones.reports_v2.html.data.engines import * # noqa: F401, F403
|
| 12 |
|
| 13 |
+
warnings.warn(
|
| 14 |
+
"picarones.report.report_data.engines is deprecated and will be removed in 2.0. "
|
| 15 |
+
"Import from picarones.reports_v2.html.data.engines instead.",
|
| 16 |
+
DeprecationWarning,
|
| 17 |
+
stacklevel=2,
|
| 18 |
+
)
|
|
@@ -1,272 +1,18 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
n'étaient appelés par aucun consommateur en production. Concrètement :
|
| 6 |
-
|
| 7 |
-
- :func:`compute_rare_token_recall_per_engine` — Sprint 71 (A.I.1) :
|
| 8 |
-
recall sur tokens rares (hapax + dis legomena) corpus-wide. Discrimine
|
| 9 |
-
un OCR qui rate les noms propres rares (critique pour l'indexation
|
| 10 |
-
prosopographique).
|
| 11 |
-
- :func:`compute_taxonomy_cooccurrence_section` — Sprint 75 (A.I.4
|
| 12 |
-
chantier 1) : indice de Jaccard inter-classes au niveau document.
|
| 13 |
-
- :func:`compute_taxonomy_intra_doc_section` — Sprint 76 (A.I.4
|
| 14 |
-
chantier 2) : heatmap class × position pour repérer les zones
|
| 15 |
-
concentrées d'erreur.
|
| 16 |
-
- :func:`compute_marginal_cost_section` — Sprint 91 (A.II.6) : coût
|
| 17 |
-
marginal d'un moteur B vs A par erreur évitée.
|
| 18 |
-
|
| 19 |
-
Toutes les fonctions sont **pures** (pas de mutation in-place) et
|
| 20 |
-
retournent ``None`` ou un dict vide quand les pré-requis ne sont pas
|
| 21 |
-
réunis (corpus vide, taxonomy absente, etc.) — pattern adaptive masking.
|
| 22 |
"""
|
| 23 |
|
| 24 |
from __future__ import annotations
|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
from picarones.measurements.marginal_cost import compute_marginal_cost_matrix
|
| 29 |
-
from picarones.measurements.rare_tokens import (
|
| 30 |
-
compute_rare_token_recall,
|
| 31 |
-
extract_rare_tokens,
|
| 32 |
-
)
|
| 33 |
-
from picarones.measurements.taxonomy_cooccurrence import (
|
| 34 |
-
compute_taxonomy_cooccurrence,
|
| 35 |
-
)
|
| 36 |
-
from picarones.measurements.taxonomy_intra_doc import (
|
| 37 |
-
compute_taxonomy_position_heatmap,
|
| 38 |
-
)
|
| 39 |
-
|
| 40 |
-
if TYPE_CHECKING:
|
| 41 |
-
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
# ──────────────────────────────────────────────────────────────────
|
| 45 |
-
# Rare-token recall (Sprint 71)
|
| 46 |
-
# ──────────────────────────────────────────────────────────────────
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
def compute_rare_token_recall_per_engine(
|
| 50 |
-
benchmark: "BenchmarkResult",
|
| 51 |
-
max_freq: int = 2,
|
| 52 |
-
) -> dict[str, dict]:
|
| 53 |
-
"""Recall corpus-wide sur les tokens rares pour chaque moteur.
|
| 54 |
-
|
| 55 |
-
Étapes :
|
| 56 |
-
1. Extraire les tokens rares du corpus (apparaissent ≤ ``max_freq``
|
| 57 |
-
fois dans toutes les GT).
|
| 58 |
-
2. Pour chaque moteur, calculer le recall moyen pondéré par doc.
|
| 59 |
-
|
| 60 |
-
Retour : ``{engine_name: {n_rare_tokens, n_recalled, recall, n_docs}}``,
|
| 61 |
-
vide si aucun moteur ou aucun token rare détecté.
|
| 62 |
-
"""
|
| 63 |
-
if not benchmark.engine_reports:
|
| 64 |
-
return {}
|
| 65 |
-
# Liste des GT du corpus (premier moteur fait foi).
|
| 66 |
-
gts = [
|
| 67 |
-
dr.ground_truth
|
| 68 |
-
for dr in benchmark.engine_reports[0].document_results
|
| 69 |
-
if dr.ground_truth
|
| 70 |
-
]
|
| 71 |
-
if not gts:
|
| 72 |
-
return {}
|
| 73 |
-
rare_tokens = extract_rare_tokens(gts, max_freq=max_freq)
|
| 74 |
-
if not rare_tokens:
|
| 75 |
-
return {}
|
| 76 |
-
|
| 77 |
-
out: dict[str, dict] = {}
|
| 78 |
-
for report in benchmark.engine_reports:
|
| 79 |
-
n_total_rare = 0
|
| 80 |
-
n_total_recalled = 0
|
| 81 |
-
n_docs = 0
|
| 82 |
-
for dr in report.document_results:
|
| 83 |
-
if dr.metrics.error is not None:
|
| 84 |
-
continue
|
| 85 |
-
metrics = compute_rare_token_recall(
|
| 86 |
-
dr.ground_truth, dr.hypothesis, rare_tokens,
|
| 87 |
-
)
|
| 88 |
-
n_total_rare += metrics["n_rare_tokens_in_reference"]
|
| 89 |
-
n_total_recalled += metrics["n_rare_tokens_recalled"]
|
| 90 |
-
n_docs += 1
|
| 91 |
-
recall = (
|
| 92 |
-
n_total_recalled / n_total_rare if n_total_rare > 0 else None
|
| 93 |
-
)
|
| 94 |
-
out[report.engine_name] = {
|
| 95 |
-
"n_rare_tokens": n_total_rare,
|
| 96 |
-
"n_recalled": n_total_recalled,
|
| 97 |
-
"recall": recall,
|
| 98 |
-
"n_docs": n_docs,
|
| 99 |
-
"max_freq": max_freq,
|
| 100 |
-
}
|
| 101 |
-
return out
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
# ──────────────────────────────────────────────────────────────────
|
| 105 |
-
# Co-occurrence taxonomique (Sprint 75)
|
| 106 |
-
# ──────────────────────────────────────────────────────────────────
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
def compute_taxonomy_cooccurrence_section(
|
| 110 |
-
benchmark: "BenchmarkResult",
|
| 111 |
-
) -> Optional[dict]:
|
| 112 |
-
"""Calcule la matrice de co-occurrence taxonomique corpus-wide.
|
| 113 |
-
|
| 114 |
-
Pour chaque document, on collecte l'union des classes d'erreur
|
| 115 |
-
apparues sur ce document tous moteurs confondus, puis on calcule
|
| 116 |
-
l'indice de Jaccard entre paires de classes au niveau corpus.
|
| 117 |
|
| 118 |
-
|
| 119 |
-
:func:`picarones.measurements.taxonomy_cooccurrence.compute_taxonomy_cooccurrence`,
|
| 120 |
-
ou ``None`` si aucune classification taxonomique n'est disponible.
|
| 121 |
-
"""
|
| 122 |
-
# Map doc_id → index dans per_doc_classes pour merger correctement
|
| 123 |
-
# les classes des moteurs additionnels qui évaluent le même doc.
|
| 124 |
-
# **Bug évité** : ne PAS utiliser un set pour retrouver l'index — un
|
| 125 |
-
# set n'a pas d'ordre garanti, ``list(set).index(x)`` retourne un
|
| 126 |
-
# index qui ne correspond pas à la position dans la liste parallèle.
|
| 127 |
-
doc_id_to_idx: dict[str, int] = {}
|
| 128 |
-
per_doc_classes: list[set[str]] = []
|
| 129 |
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
for cls, count in (dr.taxonomy.get("counts") or {}).items()
|
| 137 |
-
if count > 0
|
| 138 |
-
}
|
| 139 |
-
if not classes:
|
| 140 |
-
continue
|
| 141 |
-
idx = doc_id_to_idx.get(dr.doc_id)
|
| 142 |
-
if idx is None:
|
| 143 |
-
doc_id_to_idx[dr.doc_id] = len(per_doc_classes)
|
| 144 |
-
per_doc_classes.append(classes)
|
| 145 |
-
else:
|
| 146 |
-
# Doc déjà vu (autre moteur) : merger les classes.
|
| 147 |
-
per_doc_classes[idx] |= classes
|
| 148 |
-
|
| 149 |
-
if not per_doc_classes:
|
| 150 |
-
return None
|
| 151 |
-
return compute_taxonomy_cooccurrence(per_doc_classes)
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
# ──────────────────────────────────────────────────────────────────
|
| 155 |
-
# Heatmap intra-document class × position (Sprint 76)
|
| 156 |
-
# ──────────────────────────────────────────────────────────────────
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
def compute_taxonomy_intra_doc_section(
|
| 160 |
-
benchmark: "BenchmarkResult",
|
| 161 |
-
n_bins: int = 10,
|
| 162 |
-
) -> Optional[dict]:
|
| 163 |
-
"""Heatmap agrégée class × position binnée sur l'ensemble du corpus.
|
| 164 |
-
|
| 165 |
-
Pour chaque doc unique on garde le heatmap calculé par le **premier**
|
| 166 |
-
moteur (déduplication : un même doc évalué par N moteurs ne compte
|
| 167 |
-
qu'une fois). Puis on somme par classe et bin de position.
|
| 168 |
-
|
| 169 |
-
Retourne un dict compatible avec
|
| 170 |
-
:func:`picarones.report.taxonomy_intra_doc_render.build_taxonomy_intra_doc_html`
|
| 171 |
-
(clés ``n_bins``, ``per_class``, ``total_errors``, ``n_words_gt``).
|
| 172 |
-
Retourne ``None`` si aucun document n'a de signal exploitable.
|
| 173 |
-
"""
|
| 174 |
-
aggregated: dict[str, list[int]] = {}
|
| 175 |
-
seen_doc_ids: set[str] = set()
|
| 176 |
-
total_errors = 0
|
| 177 |
-
n_words_gt = 0
|
| 178 |
-
|
| 179 |
-
for report in benchmark.engine_reports:
|
| 180 |
-
for dr in report.document_results:
|
| 181 |
-
if dr.doc_id in seen_doc_ids:
|
| 182 |
-
continue # déduplication : ne pas compter un doc 2 fois
|
| 183 |
-
if dr.metrics.error is not None or not dr.ground_truth:
|
| 184 |
-
continue
|
| 185 |
-
heatmap = compute_taxonomy_position_heatmap(
|
| 186 |
-
dr.ground_truth, dr.hypothesis, n_bins=n_bins,
|
| 187 |
-
)
|
| 188 |
-
if heatmap is None:
|
| 189 |
-
continue
|
| 190 |
-
seen_doc_ids.add(dr.doc_id)
|
| 191 |
-
n_words_gt += len(dr.ground_truth.split())
|
| 192 |
-
per_class = heatmap.get("per_class", {})
|
| 193 |
-
for cls, counts in per_class.items():
|
| 194 |
-
cls_total = sum(counts)
|
| 195 |
-
if cls_total == 0:
|
| 196 |
-
continue
|
| 197 |
-
total_errors += cls_total
|
| 198 |
-
if cls not in aggregated:
|
| 199 |
-
aggregated[cls] = [0] * n_bins
|
| 200 |
-
for i in range(n_bins):
|
| 201 |
-
aggregated[cls][i] += counts[i] if i < len(counts) else 0
|
| 202 |
-
|
| 203 |
-
if not aggregated:
|
| 204 |
-
return None
|
| 205 |
-
return {
|
| 206 |
-
"n_bins": n_bins,
|
| 207 |
-
"n_docs_with_data": len(seen_doc_ids),
|
| 208 |
-
"total_errors": total_errors,
|
| 209 |
-
"n_words_gt": n_words_gt,
|
| 210 |
-
"per_class": aggregated,
|
| 211 |
-
}
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
# ──────────────────────────────────────────────────────────────────
|
| 215 |
-
# Coût marginal inter-moteurs (Sprint 91)
|
| 216 |
-
# ──────────────────────────────────────────────────────────────────
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
def compute_marginal_cost_section(
|
| 220 |
-
engines_summary: list[dict],
|
| 221 |
-
) -> Optional[list[dict]]:
|
| 222 |
-
"""Matrice de coût marginal entre paires de moteurs.
|
| 223 |
-
|
| 224 |
-
Lit ``cost`` (attaché par :func:`attach_engine_costs`) et estime
|
| 225 |
-
le nombre d'erreurs. Pour chaque paire ``A → B``, calcule le coût
|
| 226 |
-
additionnel par erreur évitée.
|
| 227 |
-
|
| 228 |
-
**Note d'estimation** : le nombre d'erreurs est dérivé de
|
| 229 |
-
``cer × n_caractères_corpus`` quand la longueur moyenne de doc
|
| 230 |
-
est disponible, sinon repli sur ``cer × 1000`` (proxy pour
|
| 231 |
-
1000 caractères standardisés). Les coûts marginaux affichés sont
|
| 232 |
-
des estimations pessimistes — pour un benchmark de corpus
|
| 233 |
-
homogène, l'ordonnancement est fiable ; pour un mix de
|
| 234 |
-
types de documents, à interpréter avec prudence.
|
| 235 |
-
|
| 236 |
-
Retour : liste de dicts (sortie ``["pairs"]`` de
|
| 237 |
-
:func:`compute_marginal_cost_matrix`) triée par coût marginal
|
| 238 |
-
croissant, ou ``None`` si moins de 2 moteurs ont des données
|
| 239 |
-
coût + erreur exploitables.
|
| 240 |
-
"""
|
| 241 |
-
per_engine: dict[str, dict] = {}
|
| 242 |
-
for entry in engines_summary:
|
| 243 |
-
cost = entry.get("cost") or {}
|
| 244 |
-
cost_per_1k = cost.get("cost_per_1k_pages_eur")
|
| 245 |
-
cer = entry.get("cer")
|
| 246 |
-
doc_count = entry.get("doc_count") or 0
|
| 247 |
-
if cost_per_1k is None or cer is None or doc_count == 0:
|
| 248 |
-
continue
|
| 249 |
-
# Proxy : cer × 1000 caractères / page (échelle stable cohérente
|
| 250 |
-
# avec ``cost_per_1k_pages_eur``).
|
| 251 |
-
estimated_errors = cer * 1000.0
|
| 252 |
-
per_engine[entry["name"]] = {
|
| 253 |
-
"cost": cost_per_1k,
|
| 254 |
-
"errors": estimated_errors,
|
| 255 |
-
}
|
| 256 |
-
if len(per_engine) < 2:
|
| 257 |
-
return None
|
| 258 |
-
result = compute_marginal_cost_matrix(per_engine)
|
| 259 |
-
if not result:
|
| 260 |
-
return None
|
| 261 |
-
# ``compute_marginal_cost_matrix`` retourne ``{"pairs": [...]}``.
|
| 262 |
-
# On expose la liste ``pairs`` pour que le renderer reçoive un
|
| 263 |
-
# itérable de dicts (pas un wrapper).
|
| 264 |
-
return result.get("pairs") or None
|
| 265 |
-
|
| 266 |
-
|
| 267 |
-
__all__ = [
|
| 268 |
-
"compute_rare_token_recall_per_engine",
|
| 269 |
-
"compute_taxonomy_cooccurrence_section",
|
| 270 |
-
"compute_taxonomy_intra_doc_section",
|
| 271 |
-
"compute_marginal_cost_section",
|
| 272 |
-
]
|
|
|
|
| 1 |
+
"""``picarones.report.report_data.extra_metrics`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html.data.extra_metrics`. Phase 5.E
|
| 4 |
+
du retrait du legacy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
from picarones.reports_v2.html.data.extra_metrics import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
warnings.warn(
|
| 14 |
+
"picarones.report.report_data.extra_metrics is deprecated and will be removed in 2.0. "
|
| 15 |
+
"Import from picarones.reports_v2.html.data.extra_metrics instead.",
|
| 16 |
+
DeprecationWarning,
|
| 17 |
+
stacklevel=2,
|
| 18 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,159 +1,18 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
- ``cost`` — CER vs coût € / 1000 pages.
|
| 6 |
-
- ``speed`` — CER vs durée moyenne par page.
|
| 7 |
-
- ``co2`` — CER vs empreinte carbone (g CO₂ / 1000 pages, expérimental).
|
| 8 |
-
|
| 9 |
-
API
|
| 10 |
-
---
|
| 11 |
-
Deux fonctions séparées pour rendre le contrat explicite :
|
| 12 |
-
|
| 13 |
-
1. :func:`attach_engine_costs` — **mute en place** ``engines_summary``
|
| 14 |
-
en y ajoutant ``mean_duration_seconds`` et ``cost`` (extraits du
|
| 15 |
-
benchmark et de la table de pricing). Le nom dit clairement qu'il
|
| 16 |
-
y a mutation.
|
| 17 |
-
2. :func:`build_pareto_section` — **fonction pure**, lit les coûts
|
| 18 |
-
déjà attachés à ``engines_summary``. Retourne le dict ``pareto``
|
| 19 |
-
prêt pour le template.
|
| 20 |
-
|
| 21 |
-
L'orchestrateur (``__init__.py``) appelle les deux dans l'ordre.
|
| 22 |
-
Cette séparation rend possible :
|
| 23 |
-
|
| 24 |
-
- Tester :func:`build_pareto_section` indépendamment avec un
|
| 25 |
-
``engines_summary`` pré-fabriqué.
|
| 26 |
-
- Réutiliser les coûts attachés sans recalculer Pareto.
|
| 27 |
"""
|
| 28 |
|
| 29 |
from __future__ import annotations
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
from picarones.measurements.pricing import (
|
| 34 |
-
build_costs_for_benchmark,
|
| 35 |
-
load_pricing_database,
|
| 36 |
-
)
|
| 37 |
-
from picarones.measurements.statistics import compute_pareto_front
|
| 38 |
-
|
| 39 |
-
if TYPE_CHECKING:
|
| 40 |
-
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
def attach_engine_costs(
|
| 44 |
-
engines_summary: list[dict], benchmark: "BenchmarkResult",
|
| 45 |
-
) -> None:
|
| 46 |
-
"""Annote chaque entrée de ``engines_summary`` avec son coût.
|
| 47 |
-
|
| 48 |
-
**Mute en place** : ajoute deux champs à chaque dict moteur :
|
| 49 |
-
|
| 50 |
-
- ``mean_duration_seconds`` (float ou ``None`` si pas de durée).
|
| 51 |
-
- ``cost`` : dict de la forme ``{cost_per_1k_pages_eur: ...,
|
| 52 |
-
co2_per_1k_pages_g: ..., ...}`` ou ``None`` si pricing
|
| 53 |
-
indisponible.
|
| 54 |
|
| 55 |
-
|
| 56 |
-
ces deux champs.
|
| 57 |
-
"""
|
| 58 |
-
durations_by_engine: dict[str, float] = {}
|
| 59 |
-
for report in benchmark.engine_reports:
|
| 60 |
-
durs = [
|
| 61 |
-
dr.duration_seconds
|
| 62 |
-
for dr in report.document_results
|
| 63 |
-
if dr.duration_seconds is not None
|
| 64 |
-
]
|
| 65 |
-
if durs:
|
| 66 |
-
durations_by_engine[report.engine_name] = sum(durs) / len(durs)
|
| 67 |
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
round(durations_by_engine.get(name, 0.0), 4)
|
| 75 |
-
if name in durations_by_engine else None
|
| 76 |
-
)
|
| 77 |
-
entry["cost"] = costs_by_engine.get(name)
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
def build_pareto_section(engines_summary: list[dict]) -> dict:
|
| 81 |
-
"""Construit le bloc ``pareto`` du dict de rapport.
|
| 82 |
-
|
| 83 |
-
**Fonction pure** : ne mute rien. Lit ``mean_duration_seconds``
|
| 84 |
-
et ``cost`` qui doivent avoir été attachés en amont par
|
| 85 |
-
:func:`attach_engine_costs`. Si ces champs sont absents, le
|
| 86 |
-
moteur est silencieusement omis du front (cohérent avec un
|
| 87 |
-
moteur qui n'a pas de prix connu).
|
| 88 |
-
|
| 89 |
-
Retour
|
| 90 |
-
------
|
| 91 |
-
dict
|
| 92 |
-
Trois fronts Pareto (``cost``, ``speed``, ``co2``) plus
|
| 93 |
-
``pricing_meta`` (table de pricing utilisée).
|
| 94 |
-
"""
|
| 95 |
-
pricing_defaults, _ = load_pricing_database()
|
| 96 |
-
|
| 97 |
-
pareto_points = []
|
| 98 |
-
for entry in engines_summary:
|
| 99 |
-
cer = entry.get("cer")
|
| 100 |
-
cost = (entry.get("cost") or {}).get("cost_per_1k_pages_eur")
|
| 101 |
-
if cer is None or cost is None:
|
| 102 |
-
continue
|
| 103 |
-
pareto_points.append({"engine": entry["name"], "cer": cer, "cost": cost})
|
| 104 |
-
pareto_front_engines = compute_pareto_front(
|
| 105 |
-
pareto_points, objectives=("cer", "cost"),
|
| 106 |
-
)
|
| 107 |
-
|
| 108 |
-
pareto_speed_points = []
|
| 109 |
-
for entry in engines_summary:
|
| 110 |
-
cer = entry.get("cer")
|
| 111 |
-
dur = entry.get("mean_duration_seconds")
|
| 112 |
-
if cer is None or dur is None:
|
| 113 |
-
continue
|
| 114 |
-
pareto_speed_points.append({"engine": entry["name"], "cer": cer, "dur": dur})
|
| 115 |
-
pareto_front_speed = compute_pareto_front(
|
| 116 |
-
pareto_speed_points, objectives=("cer", "dur"),
|
| 117 |
-
)
|
| 118 |
-
|
| 119 |
-
pareto_co2_points = []
|
| 120 |
-
for entry in engines_summary:
|
| 121 |
-
cer = entry.get("cer")
|
| 122 |
-
co2 = (entry.get("cost") or {}).get("co2_per_1k_pages_g")
|
| 123 |
-
if cer is None or co2 is None:
|
| 124 |
-
continue
|
| 125 |
-
pareto_co2_points.append({"engine": entry["name"], "cer": cer, "co2": co2})
|
| 126 |
-
pareto_front_co2 = compute_pareto_front(
|
| 127 |
-
pareto_co2_points, objectives=("cer", "co2"),
|
| 128 |
-
)
|
| 129 |
-
|
| 130 |
-
return {
|
| 131 |
-
"cost": {
|
| 132 |
-
"points": pareto_points,
|
| 133 |
-
"front": pareto_front_engines,
|
| 134 |
-
"axis_label": "Coût (€ / 1000 pages)",
|
| 135 |
-
},
|
| 136 |
-
"speed": {
|
| 137 |
-
"points": pareto_speed_points,
|
| 138 |
-
"front": pareto_front_speed,
|
| 139 |
-
"axis_label": "Temps moyen (s / page)",
|
| 140 |
-
},
|
| 141 |
-
"co2": {
|
| 142 |
-
"points": pareto_co2_points,
|
| 143 |
-
"front": pareto_front_co2,
|
| 144 |
-
"axis_label": (
|
| 145 |
-
"Empreinte carbone (g CO₂ / 1000 pages, expérimental)"
|
| 146 |
-
),
|
| 147 |
-
},
|
| 148 |
-
"pricing_meta": {
|
| 149 |
-
"last_updated": pricing_defaults.last_updated,
|
| 150 |
-
"currency": pricing_defaults.currency,
|
| 151 |
-
"hourly_rate_local_cpu_eur": pricing_defaults.hourly_rate_local_cpu_eur,
|
| 152 |
-
"hourly_rate_local_gpu_eur": pricing_defaults.hourly_rate_local_gpu_eur,
|
| 153 |
-
"grid_intensity_local": pricing_defaults.grid_intensity_local,
|
| 154 |
-
"grid_intensity_cloud": pricing_defaults.grid_intensity_cloud,
|
| 155 |
-
},
|
| 156 |
-
}
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
__all__ = ["attach_engine_costs", "build_pareto_section"]
|
|
|
|
| 1 |
+
"""``picarones.report.report_data.pareto`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html.data.pareto`. Phase 5.E
|
| 4 |
+
du retrait du legacy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
from picarones.reports_v2.html.data.pareto import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
warnings.warn(
|
| 14 |
+
"picarones.report.report_data.pareto is deprecated and will be removed in 2.0. "
|
| 15 |
+
"Import from picarones.reports_v2.html.data.pareto instead.",
|
| 16 |
+
DeprecationWarning,
|
| 17 |
+
stacklevel=2,
|
| 18 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,56 +1,18 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
- ``ratio_vs_anchor`` — ratio de longueur OCR/GT vs score d'ancrage,
|
| 6 |
-
par moteur (révèle les hallucinations VLM).
|
| 7 |
"""
|
| 8 |
|
| 9 |
from __future__ import annotations
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
-
from picarones.
|
| 14 |
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
gini_vs_cer: list[dict] = []
|
| 22 |
-
for report in benchmark.engine_reports:
|
| 23 |
-
line_metrics = report.aggregated_line_metrics
|
| 24 |
-
gini_val = line_metrics.get("gini_mean") if line_metrics else None
|
| 25 |
-
cer_val = report.mean_cer
|
| 26 |
-
if gini_val is not None and cer_val is not None:
|
| 27 |
-
gini_vs_cer.append({
|
| 28 |
-
"engine": report.engine_name,
|
| 29 |
-
"cer": safe_round(cer_val),
|
| 30 |
-
"gini": safe_round(gini_val),
|
| 31 |
-
"is_pipeline": report.is_pipeline,
|
| 32 |
-
})
|
| 33 |
-
return gini_vs_cer
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
def build_ratio_vs_anchor(benchmark: "BenchmarkResult") -> list[dict]:
|
| 37 |
-
"""Scatter ratio de longueur vs score d'ancrage (détection VLM)."""
|
| 38 |
-
ratio_vs_anchor: list[dict] = []
|
| 39 |
-
for report in benchmark.engine_reports:
|
| 40 |
-
halluc = report.aggregated_hallucination
|
| 41 |
-
if not halluc:
|
| 42 |
-
continue
|
| 43 |
-
ratio_vs_anchor.append({
|
| 44 |
-
"engine": report.engine_name,
|
| 45 |
-
"length_ratio": safe_round(halluc.get("length_ratio_mean", 1.0)),
|
| 46 |
-
"anchor_score": safe_round(halluc.get("anchor_score_mean", 1.0)),
|
| 47 |
-
"hallucinating_rate": safe_round(halluc.get("hallucinating_doc_rate", 0.0)),
|
| 48 |
-
"is_vlm": (
|
| 49 |
-
report.pipeline_info.get("is_vlm", False)
|
| 50 |
-
if report.pipeline_info else False
|
| 51 |
-
),
|
| 52 |
-
})
|
| 53 |
-
return ratio_vs_anchor
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
__all__ = ["build_gini_vs_cer", "build_ratio_vs_anchor"]
|
|
|
|
| 1 |
+
"""``picarones.report.report_data.scatter`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html.data.scatter`. Phase 5.E
|
| 4 |
+
du retrait du legacy.
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
| 10 |
|
| 11 |
+
from picarones.reports_v2.html.data.scatter import * # noqa: F401, F403
|
| 12 |
|
| 13 |
+
warnings.warn(
|
| 14 |
+
"picarones.report.report_data.scatter is deprecated and will be removed in 2.0. "
|
| 15 |
+
"Import from picarones.reports_v2.html.data.scatter instead.",
|
| 16 |
+
DeprecationWarning,
|
| 17 |
+
stacklevel=2,
|
| 18 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,216 +1,18 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
- ``pairwise_wilcoxon`` — tests de Wilcoxon par paire de moteurs.
|
| 6 |
-
- ``bootstrap_cis`` — intervalles de confiance bootstrap par moteur.
|
| 7 |
-
- ``friedman`` + ``nemenyi`` — Sprint 17, multi-moteurs.
|
| 8 |
-
- ``reliability_curves`` — courbes de fiabilité par moteur.
|
| 9 |
-
- ``venn_data`` — diagramme de Venn des erreurs communes/exclusives.
|
| 10 |
-
- ``error_clusters`` — clustering des patterns d'erreurs.
|
| 11 |
-
- ``correlation_per_engine`` — matrice de corrélation par moteur.
|
| 12 |
"""
|
| 13 |
|
| 14 |
from __future__ import annotations
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
from picarones.core.diff_utils import compute_word_diff
|
| 19 |
-
from picarones.measurements.statistics import (
|
| 20 |
-
bootstrap_ci,
|
| 21 |
-
cluster_errors,
|
| 22 |
-
compute_correlation_matrix,
|
| 23 |
-
compute_pairwise_stats,
|
| 24 |
-
compute_reliability_curve,
|
| 25 |
-
compute_venn_data,
|
| 26 |
-
friedman_test,
|
| 27 |
-
nemenyi_posthoc,
|
| 28 |
-
)
|
| 29 |
-
from picarones.report.report_data._helpers import safe_round
|
| 30 |
-
|
| 31 |
-
if TYPE_CHECKING:
|
| 32 |
-
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
def _engine_cer_values(benchmark: "BenchmarkResult") -> dict[str, list[float]]:
|
| 36 |
-
"""Map ``engine_name → [cer_individuels valides]``."""
|
| 37 |
-
out: dict[str, list[float]] = {}
|
| 38 |
-
for report in benchmark.engine_reports:
|
| 39 |
-
vals = [
|
| 40 |
-
safe_round(dr.metrics.cer)
|
| 41 |
-
for dr in report.document_results
|
| 42 |
-
if dr.metrics.error is None
|
| 43 |
-
]
|
| 44 |
-
if vals:
|
| 45 |
-
out[report.engine_name] = vals
|
| 46 |
-
return out
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
def build_pairwise_wilcoxon(benchmark: "BenchmarkResult") -> list[dict]:
|
| 50 |
-
"""Tests de Wilcoxon par paire de moteurs (Sprint 7)."""
|
| 51 |
-
return compute_pairwise_stats(_engine_cer_values(benchmark))
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
def build_bootstrap_cis(benchmark: "BenchmarkResult") -> list[dict]:
|
| 55 |
-
"""Intervalles de confiance bootstrap par moteur (Sprint 7)."""
|
| 56 |
-
bootstrap_cis: list[dict] = []
|
| 57 |
-
for engine_name, vals in _engine_cer_values(benchmark).items():
|
| 58 |
-
lo, hi = bootstrap_ci(vals)
|
| 59 |
-
mean_v = sum(vals) / len(vals) if vals else 0.0
|
| 60 |
-
bootstrap_cis.append({
|
| 61 |
-
"engine": engine_name,
|
| 62 |
-
"mean": safe_round(mean_v),
|
| 63 |
-
"ci_lower": safe_round(lo),
|
| 64 |
-
"ci_upper": safe_round(hi),
|
| 65 |
-
})
|
| 66 |
-
return bootstrap_cis
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
def build_friedman_and_nemenyi(benchmark: "BenchmarkResult") -> dict:
|
| 70 |
-
"""Test de Friedman + post-hoc Nemenyi (Sprint 17, multi-moteurs).
|
| 71 |
-
|
| 72 |
-
Alignement strict sur le même ordre de documents : on reconstruit
|
| 73 |
-
la map à partir des documents communs à tous les moteurs, sinon
|
| 74 |
-
Friedman n'est pas applicable.
|
| 75 |
-
|
| 76 |
-
Returns
|
| 77 |
-
-------
|
| 78 |
-
dict
|
| 79 |
-
``{"friedman": {...}, "nemenyi": {...}}`` à fusionner dans
|
| 80 |
-
la section ``statistics`` du rapport.
|
| 81 |
-
"""
|
| 82 |
-
# Liste ordonnée des doc_ids selon l'ordre d'apparition.
|
| 83 |
-
seen: set[str] = set()
|
| 84 |
-
doc_ids_ordered: list[str] = []
|
| 85 |
-
for report in benchmark.engine_reports:
|
| 86 |
-
for dr in report.document_results:
|
| 87 |
-
if dr.doc_id not in seen:
|
| 88 |
-
seen.add(dr.doc_id)
|
| 89 |
-
doc_ids_ordered.append(dr.doc_id)
|
| 90 |
|
| 91 |
-
|
| 92 |
-
for report in benchmark.engine_reports:
|
| 93 |
-
doc_ids = {dr.doc_id for dr in report.document_results if dr.metrics.error is None}
|
| 94 |
-
common_doc_ids = doc_ids if common_doc_ids is None else common_doc_ids & doc_ids
|
| 95 |
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
safe_round(dr_by_id[d].metrics.cer) for d in ordered_common
|
| 103 |
-
]
|
| 104 |
-
|
| 105 |
-
if engine_cer_aligned:
|
| 106 |
-
friedman = friedman_test(engine_cer_aligned)
|
| 107 |
-
nemenyi = nemenyi_posthoc(engine_cer_aligned)
|
| 108 |
-
else:
|
| 109 |
-
friedman = {
|
| 110 |
-
"statistic": 0.0, "p_value": 1.0, "significant": False,
|
| 111 |
-
"df": 0, "n_blocks": 0, "n_engines": 0, "mean_ranks": {},
|
| 112 |
-
"interpretation": "Test de Friedman non calculé — aucun document commun.",
|
| 113 |
-
"error": "no_common_documents",
|
| 114 |
-
}
|
| 115 |
-
nemenyi = {
|
| 116 |
-
"alpha": 0.05, "critical_distance": 0.0, "q_alpha": 0.0,
|
| 117 |
-
"n_blocks": 0, "n_engines": 0, "mean_ranks": {},
|
| 118 |
-
"engines_sorted": [], "significant_matrix": [], "tied_groups": [],
|
| 119 |
-
"error": "no_common_documents",
|
| 120 |
-
}
|
| 121 |
-
return {"friedman": friedman, "nemenyi": nemenyi}
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
def build_reliability_curves(benchmark: "BenchmarkResult") -> list[dict]:
|
| 125 |
-
"""Courbes de fiabilité par moteur (Sprint 7)."""
|
| 126 |
-
reliability_curves: list[dict] = []
|
| 127 |
-
for report in benchmark.engine_reports:
|
| 128 |
-
vals = [
|
| 129 |
-
safe_round(dr.metrics.cer)
|
| 130 |
-
for dr in report.document_results
|
| 131 |
-
if dr.metrics.error is None
|
| 132 |
-
]
|
| 133 |
-
curve = compute_reliability_curve(vals)
|
| 134 |
-
reliability_curves.append({
|
| 135 |
-
"engine": report.engine_name,
|
| 136 |
-
"points": curve,
|
| 137 |
-
})
|
| 138 |
-
return reliability_curves
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
def build_venn_data(benchmark: "BenchmarkResult") -> dict:
|
| 142 |
-
"""Venn des erreurs communes / exclusives (Sprint 7).
|
| 143 |
-
|
| 144 |
-
Construit les ensembles d'erreurs par moteur :
|
| 145 |
-
``{engine → set("doc_id:gt_tok:hyp_tok")}``.
|
| 146 |
-
"""
|
| 147 |
-
venn_error_sets: dict[str, set[str]] = {}
|
| 148 |
-
for report in benchmark.engine_reports:
|
| 149 |
-
error_set: set[str] = set()
|
| 150 |
-
for dr in report.document_results:
|
| 151 |
-
ops = compute_word_diff(dr.ground_truth, dr.hypothesis)
|
| 152 |
-
for op in ops:
|
| 153 |
-
if op["op"] in ("replace", "delete", "insert"):
|
| 154 |
-
key = (
|
| 155 |
-
f"{dr.doc_id}:"
|
| 156 |
-
f"{op.get('old', op.get('text', ''))}:"
|
| 157 |
-
f"{op.get('new', op.get('text', ''))}"
|
| 158 |
-
)
|
| 159 |
-
error_set.add(key)
|
| 160 |
-
venn_error_sets[report.engine_name] = error_set
|
| 161 |
-
return compute_venn_data(venn_error_sets)
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
def build_error_clusters(benchmark: "BenchmarkResult") -> list[dict]:
|
| 165 |
-
"""Clustering des patterns d'erreurs (Sprint 7)."""
|
| 166 |
-
error_data_all: list[dict] = []
|
| 167 |
-
for report in benchmark.engine_reports:
|
| 168 |
-
for dr in report.document_results:
|
| 169 |
-
error_data_all.append({
|
| 170 |
-
"engine": report.engine_name,
|
| 171 |
-
"gt": dr.ground_truth,
|
| 172 |
-
"hypothesis": dr.hypothesis,
|
| 173 |
-
})
|
| 174 |
-
error_clusters_raw = cluster_errors(error_data_all, max_clusters=8)
|
| 175 |
-
return [c.as_dict() for c in error_clusters_raw]
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
def build_correlation_per_engine(benchmark: "BenchmarkResult") -> list[dict]:
|
| 179 |
-
"""Matrice de corrélation par moteur entre métriques métiers (Sprint 7)."""
|
| 180 |
-
correlation_per_engine: list[dict] = []
|
| 181 |
-
for report in benchmark.engine_reports:
|
| 182 |
-
metrics_list: list[dict[str, float]] = []
|
| 183 |
-
for dr in report.document_results:
|
| 184 |
-
if dr.metrics.error is not None:
|
| 185 |
-
continue
|
| 186 |
-
entry: dict[str, float] = {
|
| 187 |
-
"cer": safe_round(dr.metrics.cer),
|
| 188 |
-
"wer": safe_round(dr.metrics.wer),
|
| 189 |
-
"mer": safe_round(dr.metrics.mer),
|
| 190 |
-
"wil": safe_round(dr.metrics.wil),
|
| 191 |
-
}
|
| 192 |
-
if dr.image_quality:
|
| 193 |
-
entry["quality_score"] = safe_round(dr.image_quality.get("quality_score", 0.5))
|
| 194 |
-
entry["sharpness"] = safe_round(dr.image_quality.get("sharpness_score", 0.5))
|
| 195 |
-
if dr.char_scores:
|
| 196 |
-
entry["ligature"] = safe_round(dr.char_scores.get("ligature", {}).get("score", 0.5))
|
| 197 |
-
entry["diacritic"] = safe_round(dr.char_scores.get("diacritic", {}).get("score", 0.5))
|
| 198 |
-
metrics_list.append(entry)
|
| 199 |
-
if metrics_list:
|
| 200 |
-
corr = compute_correlation_matrix(metrics_list)
|
| 201 |
-
correlation_per_engine.append({
|
| 202 |
-
"engine": report.engine_name,
|
| 203 |
-
**corr,
|
| 204 |
-
})
|
| 205 |
-
return correlation_per_engine
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
__all__ = [
|
| 209 |
-
"build_pairwise_wilcoxon",
|
| 210 |
-
"build_bootstrap_cis",
|
| 211 |
-
"build_friedman_and_nemenyi",
|
| 212 |
-
"build_reliability_curves",
|
| 213 |
-
"build_venn_data",
|
| 214 |
-
"build_error_clusters",
|
| 215 |
-
"build_correlation_per_engine",
|
| 216 |
-
]
|
|
|
|
| 1 |
+
"""``picarones.report.report_data.statistics`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html.data.statistics`. Phase 5.E
|
| 4 |
+
du retrait du legacy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
from picarones.reports_v2.html.data.statistics import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
warnings.warn(
|
| 14 |
+
"picarones.report.report_data.statistics is deprecated and will be removed in 2.0. "
|
| 15 |
+
"Import from picarones.reports_v2.html.data.statistics instead.",
|
| 16 |
+
DeprecationWarning,
|
| 17 |
+
stacklevel=2,
|
| 18 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,266 +1,18 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
2026 doit pouvoir comprendre exactement quelle table de prix, quelle
|
| 6 |
-
définition de métrique, quel profil de normalisation, et quelle
|
| 7 |
-
version de Picarones ont produit les chiffres affichés.
|
| 8 |
-
|
| 9 |
-
Avant le Sprint 27, le rapport intégrait uniquement
|
| 10 |
-
``pareto.pricing_meta.last_updated`` — une simple date de mise à jour
|
| 11 |
-
qui ne disait rien sur le contenu de la table. Si quelqu'un modifiait
|
| 12 |
-
``picarones/data/pricing.yaml`` après génération, il était impossible
|
| 13 |
-
de reconstituer ce qu'avait vu le lecteur du rapport.
|
| 14 |
-
|
| 15 |
-
Quatre snapshots sont produits par ce module et embarqués dans
|
| 16 |
-
``report_data.snapshots`` :
|
| 17 |
-
|
| 18 |
-
- ``pricing`` — YAML brut intégral de la table de prix.
|
| 19 |
-
- ``glossary`` — entrées du glossaire pour la langue du rapport.
|
| 20 |
-
- ``normalization`` — profil de normalisation effectivement appliqué.
|
| 21 |
-
- ``environment`` — version Picarones, Python, plateforme, commit git
|
| 22 |
-
si dispo, liste figée des dépendances installées.
|
| 23 |
-
|
| 24 |
-
Garanties
|
| 25 |
-
---------
|
| 26 |
-
- **Déterminisme** : sur entrées identiques, ``snapshot_all()`` produit
|
| 27 |
-
un dict bit-à-bit identique. Les listes sont triées, les timestamps
|
| 28 |
-
sont absents.
|
| 29 |
-
- **Pas d'effet de bord** : le module ne modifie aucun état global ;
|
| 30 |
-
les chemins YAML sont uniquement lus, jamais écrits.
|
| 31 |
-
- **Dégradé non bloquant** : si pyyaml est absent, si ``pricing.yaml``
|
| 32 |
-
n'existe pas, si git n'est pas installé, le snapshot retourne un
|
| 33 |
-
dict ``{"available": False, "reason": "..."}`` plutôt que de lever.
|
| 34 |
"""
|
| 35 |
|
| 36 |
from __future__ import annotations
|
| 37 |
|
| 38 |
-
import
|
| 39 |
-
import platform
|
| 40 |
-
import subprocess
|
| 41 |
-
import sys
|
| 42 |
-
from importlib.metadata import distributions
|
| 43 |
-
from pathlib import Path
|
| 44 |
-
from typing import Any, Optional
|
| 45 |
-
|
| 46 |
-
from picarones import __version__
|
| 47 |
-
|
| 48 |
-
logger = logging.getLogger(__name__)
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
# ---------------------------------------------------------------------------
|
| 52 |
-
# Pricing snapshot
|
| 53 |
-
# ---------------------------------------------------------------------------
|
| 54 |
-
|
| 55 |
-
def pricing_snapshot(pricing_path: Optional[Path] = None) -> dict[str, Any]:
|
| 56 |
-
"""Retourne le YAML brut + dict parsé de la table de prix utilisée.
|
| 57 |
-
|
| 58 |
-
Si ``pricing_path`` n'est pas fourni, utilise le chemin par défaut
|
| 59 |
-
de ``picarones.measurements.pricing._DEFAULT_PRICING_PATH``.
|
| 60 |
-
"""
|
| 61 |
-
if pricing_path is None:
|
| 62 |
-
try:
|
| 63 |
-
from picarones.measurements.pricing import _DEFAULT_PRICING_PATH
|
| 64 |
-
pricing_path = _DEFAULT_PRICING_PATH
|
| 65 |
-
except ImportError:
|
| 66 |
-
return {"available": False, "reason": "module pricing introuvable"}
|
| 67 |
-
|
| 68 |
-
pricing_path = Path(pricing_path)
|
| 69 |
-
if not pricing_path.exists():
|
| 70 |
-
return {
|
| 71 |
-
"available": False,
|
| 72 |
-
"reason": f"pricing.yaml introuvable : {pricing_path}",
|
| 73 |
-
"expected_path": str(pricing_path),
|
| 74 |
-
}
|
| 75 |
-
|
| 76 |
-
try:
|
| 77 |
-
raw = pricing_path.read_text(encoding="utf-8")
|
| 78 |
-
except OSError as exc:
|
| 79 |
-
return {
|
| 80 |
-
"available": False,
|
| 81 |
-
"reason": f"lecture impossible : {exc}",
|
| 82 |
-
"expected_path": str(pricing_path),
|
| 83 |
-
}
|
| 84 |
-
|
| 85 |
-
try:
|
| 86 |
-
import yaml
|
| 87 |
-
data = yaml.safe_load(raw) or {}
|
| 88 |
-
except (ImportError, Exception) as exc:
|
| 89 |
-
# Pas de yaml ou parsing en échec — on garde le brut quand même.
|
| 90 |
-
logger.warning("[snapshot] parsing pricing.yaml échoué : %s", exc)
|
| 91 |
-
data = {}
|
| 92 |
-
|
| 93 |
-
return {
|
| 94 |
-
"available": True,
|
| 95 |
-
"source_path": str(pricing_path),
|
| 96 |
-
"filename": pricing_path.name,
|
| 97 |
-
"size_bytes": len(raw.encode("utf-8")),
|
| 98 |
-
"raw_yaml": raw,
|
| 99 |
-
"data": data,
|
| 100 |
-
}
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
# ---------------------------------------------------------------------------
|
| 104 |
-
# Glossary snapshot
|
| 105 |
-
# ---------------------------------------------------------------------------
|
| 106 |
-
|
| 107 |
-
def glossary_snapshot(
|
| 108 |
-
lang: str = "fr",
|
| 109 |
-
used_keys: Optional[list[str] | set[str]] = None,
|
| 110 |
-
) -> dict[str, Any]:
|
| 111 |
-
"""Retourne les entrées du glossaire qui figurent dans le rapport.
|
| 112 |
-
|
| 113 |
-
``used_keys`` permet de ne snapshotter que les termes effectivement
|
| 114 |
-
référencés (réduit la taille). ``None`` → toutes les entrées de la
|
| 115 |
-
langue (mode conservateur).
|
| 116 |
-
"""
|
| 117 |
-
try:
|
| 118 |
-
from picarones.reports_v2.glossary import load_glossary, SUPPORTED_LANGS
|
| 119 |
-
except ImportError:
|
| 120 |
-
return {"available": False, "reason": "module glossary introuvable"}
|
| 121 |
-
|
| 122 |
-
full = load_glossary(lang) or {}
|
| 123 |
-
if not full:
|
| 124 |
-
return {
|
| 125 |
-
"available": False,
|
| 126 |
-
"reason": f"aucune entrée pour lang={lang!r}",
|
| 127 |
-
"supported_langs": SUPPORTED_LANGS,
|
| 128 |
-
}
|
| 129 |
-
|
| 130 |
-
if used_keys is not None:
|
| 131 |
-
keys = set(used_keys)
|
| 132 |
-
entries = {k: v for k, v in full.items() if k in keys}
|
| 133 |
-
else:
|
| 134 |
-
entries = dict(full)
|
| 135 |
-
|
| 136 |
-
# Tri pour reproductibilité bit-à-bit.
|
| 137 |
-
entries_sorted = {k: entries[k] for k in sorted(entries)}
|
| 138 |
-
|
| 139 |
-
return {
|
| 140 |
-
"available": True,
|
| 141 |
-
"lang": lang,
|
| 142 |
-
"entry_count": len(entries_sorted),
|
| 143 |
-
"entries": entries_sorted,
|
| 144 |
-
}
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
# ---------------------------------------------------------------------------
|
| 148 |
-
# Normalization profile snapshot
|
| 149 |
-
# ---------------------------------------------------------------------------
|
| 150 |
-
|
| 151 |
-
def normalization_snapshot(profile: Any) -> dict[str, Any]:
|
| 152 |
-
"""Sérialise un ``NormalizationProfile``.
|
| 153 |
-
|
| 154 |
-
Couvre les profils built-in (``medieval_french``, ``nfc``, …) et les
|
| 155 |
-
profils custom YAML chargés au runtime — l'objectif est qu'un
|
| 156 |
-
lecteur du rapport puisse régénérer exactement la même
|
| 157 |
-
normalisation à partir de ce snapshot.
|
| 158 |
-
"""
|
| 159 |
-
if profile is None:
|
| 160 |
-
return {"available": False, "reason": "aucun profil fourni"}
|
| 161 |
-
|
| 162 |
-
# NormalizationProfile est un dataclass — on accède aux champs par
|
| 163 |
-
# nom plutôt que via ``asdict`` pour bien contrôler le format.
|
| 164 |
-
try:
|
| 165 |
-
return {
|
| 166 |
-
"available": True,
|
| 167 |
-
"name": getattr(profile, "name", "unknown"),
|
| 168 |
-
"nfc": bool(getattr(profile, "nfc", True)),
|
| 169 |
-
"caseless": bool(getattr(profile, "caseless", False)),
|
| 170 |
-
"diplomatic_table": dict(getattr(profile, "diplomatic_table", {}) or {}),
|
| 171 |
-
"exclude_chars": sorted(getattr(profile, "exclude_chars", set()) or set()),
|
| 172 |
-
"description": getattr(profile, "description", ""),
|
| 173 |
-
}
|
| 174 |
-
except Exception as exc:
|
| 175 |
-
return {"available": False, "reason": f"sérialisation échouée : {exc}"}
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
# ---------------------------------------------------------------------------
|
| 179 |
-
# Environment snapshot
|
| 180 |
-
# ---------------------------------------------------------------------------
|
| 181 |
-
|
| 182 |
-
def _git_commit(repo_path: Optional[Path] = None) -> Optional[str]:
|
| 183 |
-
"""Retourne le commit git court (12 chars) si on est dans un repo, sinon None."""
|
| 184 |
-
cwd = repo_path or Path(__file__).resolve().parents[2]
|
| 185 |
-
try:
|
| 186 |
-
out = subprocess.check_output(
|
| 187 |
-
["git", "rev-parse", "HEAD"],
|
| 188 |
-
cwd=str(cwd),
|
| 189 |
-
stderr=subprocess.DEVNULL,
|
| 190 |
-
text=True,
|
| 191 |
-
timeout=2,
|
| 192 |
-
).strip()
|
| 193 |
-
return out[:12] if out else None
|
| 194 |
-
except (subprocess.CalledProcessError, FileNotFoundError, subprocess.TimeoutExpired):
|
| 195 |
-
return None
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
def _installed_packages(limit: int = 200) -> list[str]:
|
| 199 |
-
"""Liste figée des paquets installés au format ``name==version``.
|
| 200 |
-
|
| 201 |
-
Triée par nom (case-insensitive) pour reproductibilité. Cappée à
|
| 202 |
-
``limit`` paquets pour ne pas exploser le poids du rapport.
|
| 203 |
-
"""
|
| 204 |
-
try:
|
| 205 |
-
pkgs: list[str] = []
|
| 206 |
-
seen: set[str] = set()
|
| 207 |
-
for d in distributions():
|
| 208 |
-
try:
|
| 209 |
-
name = (d.metadata.get("Name") or "").strip()
|
| 210 |
-
version = (d.version or "").strip()
|
| 211 |
-
except Exception:
|
| 212 |
-
continue
|
| 213 |
-
if not name or name.lower() in seen:
|
| 214 |
-
continue
|
| 215 |
-
seen.add(name.lower())
|
| 216 |
-
pkgs.append(f"{name}=={version}")
|
| 217 |
-
pkgs.sort(key=str.lower)
|
| 218 |
-
return pkgs[:limit]
|
| 219 |
-
except Exception as exc: # pragma: no cover — défense en profondeur
|
| 220 |
-
logger.warning("[snapshot] enum dépendances échoué : %s", exc)
|
| 221 |
-
return []
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
def environment_snapshot(repo_path: Optional[Path] = None) -> dict[str, Any]:
|
| 225 |
-
"""Retourne version Picarones, Python, plateforme, commit, deps figées."""
|
| 226 |
-
return {
|
| 227 |
-
"available": True,
|
| 228 |
-
"picarones_version": __version__,
|
| 229 |
-
"python_version": platform.python_version(),
|
| 230 |
-
"python_implementation": platform.python_implementation(),
|
| 231 |
-
"platform": platform.platform(),
|
| 232 |
-
"executable": sys.executable,
|
| 233 |
-
"git_commit": _git_commit(repo_path),
|
| 234 |
-
"installed_packages": _installed_packages(),
|
| 235 |
-
}
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
# ---------------------------------------------------------------------------
|
| 239 |
-
# API agrégée
|
| 240 |
-
# ---------------------------------------------------------------------------
|
| 241 |
-
|
| 242 |
-
def snapshot_all(
|
| 243 |
-
*,
|
| 244 |
-
lang: str = "fr",
|
| 245 |
-
glossary_used_keys: Optional[list[str] | set[str]] = None,
|
| 246 |
-
pricing_path: Optional[Path] = None,
|
| 247 |
-
normalization_profile: Any = None,
|
| 248 |
-
repo_path: Optional[Path] = None,
|
| 249 |
-
) -> dict[str, Any]:
|
| 250 |
-
"""Construit le bloc ``snapshots`` à embarquer dans ``report_data``."""
|
| 251 |
-
return {
|
| 252 |
-
"pricing": pricing_snapshot(pricing_path=pricing_path),
|
| 253 |
-
"glossary": glossary_snapshot(lang=lang, used_keys=glossary_used_keys),
|
| 254 |
-
"normalization": normalization_snapshot(normalization_profile),
|
| 255 |
-
"environment": environment_snapshot(repo_path=repo_path),
|
| 256 |
-
"schema_version": 1,
|
| 257 |
-
}
|
| 258 |
|
|
|
|
| 259 |
|
| 260 |
-
|
| 261 |
-
"
|
| 262 |
-
"
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
]
|
|
|
|
| 1 |
+
"""``picarones.report.snapshot`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
|
| 3 |
+
Canonique : :mod:`picarones.reports_v2.html.snapshot`. Phase 5.E
|
| 4 |
+
du retrait du legacy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
| 8 |
|
| 9 |
+
import warnings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
from picarones.reports_v2.html.snapshot import * # noqa: F401, F403
|
| 12 |
|
| 13 |
+
warnings.warn(
|
| 14 |
+
"picarones.report.snapshot is deprecated and will be removed in 2.0. "
|
| 15 |
+
"Import from picarones.reports_v2.html.snapshot instead.",
|
| 16 |
+
DeprecationWarning,
|
| 17 |
+
stacklevel=2,
|
| 18 |
+
)
|
|
|
|
@@ -1,39 +0,0 @@
|
|
| 1 |
-
<!-- ── Critical Difference Diagram (Sprint 17) ─────────────────────── -->
|
| 2 |
-
<section class="cdd-card" aria-labelledby="cdd-title">
|
| 3 |
-
<header class="cdd-header">
|
| 4 |
-
<h2 id="cdd-title" data-i18n="cdd_title">Test multi-moteurs — Friedman & Nemenyi</h2>
|
| 5 |
-
<button type="button" class="cdd-info-btn" aria-label="Comment lire ce diagramme"
|
| 6 |
-
onclick="toggleCDDHelp()" title="Aide">?</button>
|
| 7 |
-
</header>
|
| 8 |
-
|
| 9 |
-
{% if friedman.error %}
|
| 10 |
-
<p class="cdd-note cdd-note-muted">{{ friedman.interpretation }}</p>
|
| 11 |
-
{% else %}
|
| 12 |
-
<p class="cdd-friedman">
|
| 13 |
-
<strong>Friedman</strong> :
|
| 14 |
-
Q = {{ "%.3f"|format(friedman.statistic) }},
|
| 15 |
-
df = {{ friedman.df }},
|
| 16 |
-
<em>n</em> = {{ friedman.n_blocks }} documents,
|
| 17 |
-
<em>k</em> = {{ friedman.n_engines }} moteurs,
|
| 18 |
-
p = {{ "%.4f"|format(friedman.p_value) }}
|
| 19 |
-
{% if friedman.significant %}
|
| 20 |
-
<span class="cdd-badge cdd-badge-sig" title="Différence globale significative">p < 0,05</span>
|
| 21 |
-
{% else %}
|
| 22 |
-
<span class="cdd-badge cdd-badge-nsig" title="Pas de différence globale détectée">p ≥ 0,05</span>
|
| 23 |
-
{% endif %}
|
| 24 |
-
</p>
|
| 25 |
-
<div class="cdd-svg-wrapper">
|
| 26 |
-
{{ critical_difference_svg | safe }}
|
| 27 |
-
</div>
|
| 28 |
-
{% endif %}
|
| 29 |
-
|
| 30 |
-
<div id="cdd-help" class="cdd-help" hidden>
|
| 31 |
-
<p><strong data-i18n="cdd_help_title">Comment lire ce diagramme ?</strong></p>
|
| 32 |
-
<ul>
|
| 33 |
-
<li><span data-i18n="cdd_help_axis">L'axe horizontal montre le rang moyen de chaque moteur (1 = meilleur, k = pire).</span></li>
|
| 34 |
-
<li><span data-i18n="cdd_help_bars">Les barres horizontales épaisses relient les moteurs statistiquement <em>indiscernables</em> au seuil α = 0,05 (test post-hoc de Nemenyi).</span></li>
|
| 35 |
-
<li><span data-i18n="cdd_help_cd">La barre rouge <em>CD</em> en haut à gauche donne la distance critique de référence : deux moteurs dont les rangs moyens diffèrent de moins que <em>CD</em> ne peuvent pas être distingués.</span></li>
|
| 36 |
-
<li><span data-i18n="cdd_help_ref">Référence : Demšar (2006), <em>Statistical Comparisons of Classifiers over Multiple Data Sets</em>, JMLR 7:1-30.</span></li>
|
| 37 |
-
</ul>
|
| 38 |
-
</div>
|
| 39 |
-
</section>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,6 +0,0 @@
|
|
| 1 |
-
</main>
|
| 2 |
-
|
| 3 |
-
<footer>
|
| 4 |
-
<span data-i18n="footer_by">par Picarones</span> v{{ picarones_version }}
|
| 5 |
-
— <span id="footer-date"></span>
|
| 6 |
-
</footer>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,35 +0,0 @@
|
|
| 1 |
-
|
| 2 |
-
<!-- ── Skip-to-content (Sprint A6, B-10) ───────────────────────────────
|
| 3 |
-
Lien WCAG 2.4.1 (Bypass Blocks) — premier enfant tabbable du body,
|
| 4 |
-
visible uniquement au focus, saute la nav et le bandeau pour
|
| 5 |
-
l'utilisateur clavier ou lecteur d'écran. -->
|
| 6 |
-
<a class="skip-link" href="#main" data-i18n="skip_to_content">Aller au contenu</a>
|
| 7 |
-
|
| 8 |
-
<!-- ── Navigation ─────────────────────────────────────────────────── -->
|
| 9 |
-
<nav>
|
| 10 |
-
<div class="brand">
|
| 11 |
-
Picarones
|
| 12 |
-
<span data-i18n="nav_report">| rapport OCR</span>
|
| 13 |
-
</div>
|
| 14 |
-
<div class="tabs">
|
| 15 |
-
<button class="tab-btn active" onclick="showView('ranking')" data-i18n="tab_ranking">Classement</button>
|
| 16 |
-
<button class="tab-btn" onclick="showView('gallery')" data-i18n="tab_gallery">Galerie</button>
|
| 17 |
-
<button class="tab-btn" onclick="showView('document')" data-i18n="tab_document">Document</button>
|
| 18 |
-
<button class="tab-btn" onclick="showView('characters')" data-i18n="tab_characters">Caractères</button>
|
| 19 |
-
<button class="tab-btn" onclick="showView('analyses')" data-i18n="tab_analyses">Analyses</button>
|
| 20 |
-
</div>
|
| 21 |
-
<div class="meta" id="nav-meta">—</div>
|
| 22 |
-
<button class="btn-export-csv" onclick="exportCSV()" title="⬇ CSV">⬇ CSV</button>
|
| 23 |
-
<button class="btn-customize" id="btn-customize" onclick="openCustomize()"
|
| 24 |
-
title="Mode avancé" data-i18n="btn_customize">⚙ Avancé</button>
|
| 25 |
-
<button class="btn-present" id="btn-present" onclick="togglePresentMode()" data-i18n="btn_present">⊞ Présentation</button>
|
| 26 |
-
</nav>
|
| 27 |
-
|
| 28 |
-
<!-- ── Bandeau exclusion globale ───────────────────────────────────── -->
|
| 29 |
-
<div id="global-exclusion-banner" style="display:none;background:#fef3c7;border-bottom:2px solid #f59e0b;padding:.5rem 1.5rem;font-size:.85rem;font-weight:600;color:#92400e;text-align:center">
|
| 30 |
-
<span id="global-exclusion-text"></span>
|
| 31 |
-
<button onclick="resetAllExclusions()" data-i18n="reset_all" style="margin-left:1rem;font-size:.75rem;padding:.15rem .5rem;border:1px solid #d97706;background:#fff;border-radius:.25rem;cursor:pointer">Réinitialiser</button>
|
| 32 |
-
</div>
|
| 33 |
-
|
| 34 |
-
<!-- ── Main (Sprint A6, B-10 : id=main pour le skip-link) ──────────── -->
|
| 35 |
-
<main id="main" role="main">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,22 +0,0 @@
|
|
| 1 |
-
<!-- ── Synthèse factuelle (Sprint 18) ─────────────────────────────── -->
|
| 2 |
-
{% if synthesis and synthesis.sentences %}
|
| 3 |
-
<section class="synth-card" aria-labelledby="synth-title">
|
| 4 |
-
<header class="synth-header">
|
| 5 |
-
<h2 id="synth-title" data-i18n="synth_title">Synthèse factuelle</h2>
|
| 6 |
-
<span class="synth-hint" data-i18n="synth_hint">
|
| 7 |
-
Générée mécaniquement depuis les résultats — aucun LLM, reproductible.
|
| 8 |
-
</span>
|
| 9 |
-
</header>
|
| 10 |
-
<ul class="synth-list">
|
| 11 |
-
{% for sentence in synthesis.sentences %}
|
| 12 |
-
<li>{{ sentence }}</li>
|
| 13 |
-
{% endfor %}
|
| 14 |
-
</ul>
|
| 15 |
-
<p class="synth-cases-link" data-i18n="synth_cases_link">
|
| 16 |
-
Pour comprendre comment d'autres équipes ont raisonné sur des problèmes
|
| 17 |
-
similaires, voir
|
| 18 |
-
<a href="https://github.com/maribakulj/Picarones/tree/main/docs/case-studies"
|
| 19 |
-
target="_blank" rel="noopener">les études de cas</a>.
|
| 20 |
-
</p>
|
| 21 |
-
</section>
|
| 22 |
-
{% endif %}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,76 +0,0 @@
|
|
| 1 |
-
<!-- ── Panneau latéral glossaire (Sprint 20) ─────────────────────── -->
|
| 2 |
-
<aside id="glossary-panel" class="side-panel" hidden aria-hidden="true"
|
| 3 |
-
aria-labelledby="glossary-panel-title">
|
| 4 |
-
<header class="side-panel-header">
|
| 5 |
-
<h3 id="glossary-panel-title" class="side-panel-title">Glossaire</h3>
|
| 6 |
-
<button type="button" class="side-panel-close" aria-label="Fermer"
|
| 7 |
-
onclick="closeGlossary()">×</button>
|
| 8 |
-
</header>
|
| 9 |
-
<div id="glossary-panel-body" class="side-panel-body"></div>
|
| 10 |
-
</aside>
|
| 11 |
-
|
| 12 |
-
<!-- ── Panneau latéral personnalisation (Sprint 20) ──────────────── -->
|
| 13 |
-
<aside id="customize-panel" class="side-panel" hidden aria-hidden="true"
|
| 14 |
-
aria-labelledby="customize-panel-title">
|
| 15 |
-
<header class="side-panel-header">
|
| 16 |
-
<h3 id="customize-panel-title" class="side-panel-title"
|
| 17 |
-
data-i18n="customize_title">Mode avancé — personnalisation</h3>
|
| 18 |
-
<button type="button" class="side-panel-close" aria-label="Fermer"
|
| 19 |
-
onclick="closeCustomize()">×</button>
|
| 20 |
-
</header>
|
| 21 |
-
<div class="side-panel-body">
|
| 22 |
-
|
| 23 |
-
<section class="custom-section">
|
| 24 |
-
<h4 data-i18n="customize_columns">Colonnes visibles</h4>
|
| 25 |
-
<div id="customize-columns-list" class="custom-col-list"></div>
|
| 26 |
-
</section>
|
| 27 |
-
|
| 28 |
-
<section class="custom-section">
|
| 29 |
-
<h4 data-i18n="customize_filters">Filtres par strate</h4>
|
| 30 |
-
<div id="customize-filters-list" class="custom-filters-list">
|
| 31 |
-
<p class="custom-note" data-i18n="customize_filters_empty">
|
| 32 |
-
Aucune strate détectée dans les métadonnées du corpus.
|
| 33 |
-
</p>
|
| 34 |
-
</div>
|
| 35 |
-
</section>
|
| 36 |
-
|
| 37 |
-
<section class="custom-section">
|
| 38 |
-
<h4>
|
| 39 |
-
<span data-i18n="customize_weights">Score composite personnel</span>
|
| 40 |
-
<button type="button" class="custom-weights-toggle" id="custom-weights-toggle"
|
| 41 |
-
onclick="toggleCustomWeights()" data-i18n="customize_weights_enable">
|
| 42 |
-
Activer
|
| 43 |
-
</button>
|
| 44 |
-
</h4>
|
| 45 |
-
<p class="custom-warning" data-i18n="customize_weights_warning">
|
| 46 |
-
Ces poids reflètent votre cas d'usage. Il n'existe pas de pondération
|
| 47 |
-
universellement valide — Picarones ne suggère aucune pondération par défaut.
|
| 48 |
-
</p>
|
| 49 |
-
<div id="custom-weights-controls" hidden>
|
| 50 |
-
<div id="custom-weights-list"></div>
|
| 51 |
-
<div class="custom-formula" id="custom-formula"></div>
|
| 52 |
-
</div>
|
| 53 |
-
</section>
|
| 54 |
-
|
| 55 |
-
<!-- Sprint A7 (m-5) — toggle palette daltonien-friendly. -->
|
| 56 |
-
<section class="custom-section">
|
| 57 |
-
<h4 data-i18n="palette_toggle">Mode daltonien-friendly</h4>
|
| 58 |
-
<p class="custom-warning" data-i18n="palette_toggle_help">
|
| 59 |
-
Bascule la palette du rapport vers Okabe-Ito (palette WCAG AA
|
| 60 |
-
recommandée pour la déficience de la vision des couleurs).
|
| 61 |
-
</p>
|
| 62 |
-
<label class="palette-toggle-row">
|
| 63 |
-
<input type="checkbox" id="palette-toggle-cb"
|
| 64 |
-
onchange="togglePalette(this.checked)"
|
| 65 |
-
aria-describedby="palette-toggle-desc">
|
| 66 |
-
<span id="palette-toggle-desc" data-i18n="palette_toggle">Mode daltonien-friendly</span>
|
| 67 |
-
</label>
|
| 68 |
-
</section>
|
| 69 |
-
|
| 70 |
-
<section class="custom-section">
|
| 71 |
-
<button type="button" class="custom-reset" onclick="resetCustomization()"
|
| 72 |
-
data-i18n="customize_reset">Réinitialiser la vue</button>
|
| 73 |
-
</section>
|
| 74 |
-
|
| 75 |
-
</div>
|
| 76 |
-
</aside>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,326 +0,0 @@
|
|
| 1 |
-
|
| 2 |
-
<!-- ════ Vue 4 : Analyses ══════════════════════════════════════════ -->
|
| 3 |
-
<div id="view-analyses" class="view">
|
| 4 |
-
<div class="charts-grid">
|
| 5 |
-
|
| 6 |
-
<div class="chart-card">
|
| 7 |
-
<h3 data-i18n="h_cer_dist">Distribution du CER par moteur</h3>
|
| 8 |
-
<div class="chart-canvas-wrap">
|
| 9 |
-
<canvas id="chart-cer-hist" role="img" aria-label="Distribution des CER par moteur" data-a11y-label="Distribution des CER par moteur"></canvas>
|
| 10 |
-
</div>
|
| 11 |
-
</div>
|
| 12 |
-
|
| 13 |
-
<div class="chart-card">
|
| 14 |
-
<h3 data-i18n="h_radar">Profil des moteurs (radar)</h3>
|
| 15 |
-
<div class="chart-canvas-wrap">
|
| 16 |
-
<canvas id="chart-radar" role="img" aria-label="Profil radar par moteur" data-a11y-label="Profil radar par moteur"></canvas>
|
| 17 |
-
</div>
|
| 18 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.5rem" data-i18n="radar_note">
|
| 19 |
-
Axe radar : CER, WER, MER, WIL — valeurs inversées (plus c'est haut, meilleur est le moteur).
|
| 20 |
-
</div>
|
| 21 |
-
</div>
|
| 22 |
-
|
| 23 |
-
<div class="chart-card">
|
| 24 |
-
<h3 data-i18n="h_cer_doc">CER par document (tous moteurs)</h3>
|
| 25 |
-
<div class="chart-canvas-wrap">
|
| 26 |
-
<canvas id="chart-cer-doc" role="img" aria-label="CER par document" data-a11y-label="CER par document"></canvas>
|
| 27 |
-
</div>
|
| 28 |
-
</div>
|
| 29 |
-
|
| 30 |
-
<div class="chart-card">
|
| 31 |
-
<h3 data-i18n="h_duration">Temps d'exécution moyen (secondes/document)</h3>
|
| 32 |
-
<div class="chart-canvas-wrap">
|
| 33 |
-
<canvas id="chart-duration" role="img" aria-label="Durée d'inférence par moteur" data-a11y-label="Durée d'inférence par moteur"></canvas>
|
| 34 |
-
</div>
|
| 35 |
-
</div>
|
| 36 |
-
|
| 37 |
-
<div class="chart-card">
|
| 38 |
-
<h3 data-i18n="h_quality_cer">Qualité image ↔ CER (scatter plot)</h3>
|
| 39 |
-
<div class="chart-canvas-wrap">
|
| 40 |
-
<canvas id="chart-quality-cer" role="img" aria-label="Corrélation qualité d'image / CER" data-a11y-label="Corrélation qualité d'image / CER"></canvas>
|
| 41 |
-
</div>
|
| 42 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.4rem" data-i18n="quality_cer_note">
|
| 43 |
-
Chaque point = un document. Axe X = score qualité image [0–1]. Axe Y = CER. Corrélation négative attendue.
|
| 44 |
-
</div>
|
| 45 |
-
</div>
|
| 46 |
-
|
| 47 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 48 |
-
<h3 data-i18n="h_taxonomy">Taxonomie des erreurs par moteur</h3>
|
| 49 |
-
<div class="chart-canvas-wrap" style="max-height:300px">
|
| 50 |
-
<canvas id="chart-taxonomy" role="img" aria-label="Taxonomie d'erreurs par moteur" data-a11y-label="Taxonomie d'erreurs par moteur"></canvas>
|
| 51 |
-
</div>
|
| 52 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.4rem" data-i18n="taxonomy_note">
|
| 53 |
-
Distribution des classes d'erreurs (classes 1–9 de la taxonomie Picarones).
|
| 54 |
-
</div>
|
| 55 |
-
</div>
|
| 56 |
-
|
| 57 |
-
<!-- Sprint 7 — Courbe de fiabilité -->
|
| 58 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 59 |
-
<h3 data-i18n="h_reliability">Courbes de fiabilité</h3>
|
| 60 |
-
<div class="chart-canvas-wrap" style="max-height:300px">
|
| 61 |
-
<canvas id="chart-reliability" role="img" aria-label="Diagramme de fiabilité (calibration)" data-a11y-label="Diagramme de fiabilité (calibration)"></canvas>
|
| 62 |
-
</div>
|
| 63 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.4rem" data-i18n="reliability_note">
|
| 64 |
-
Pour les X% documents les plus faciles (triés par CER croissant), quel est le CER moyen cumulé ?
|
| 65 |
-
Une courbe basse = moteur performant même sur les documents faciles.
|
| 66 |
-
</div>
|
| 67 |
-
</div>
|
| 68 |
-
|
| 69 |
-
<!-- Sprint 7 — Intervalles de confiance -->
|
| 70 |
-
<div class="chart-card">
|
| 71 |
-
<h3 data-i18n="h_bootstrap">Intervalles de confiance à 95 % (bootstrap)</h3>
|
| 72 |
-
<div class="chart-canvas-wrap">
|
| 73 |
-
<canvas id="chart-bootstrap-ci" role="img" aria-label="Intervalles de confiance bootstrap" data-a11y-label="Intervalles de confiance bootstrap"></canvas>
|
| 74 |
-
</div>
|
| 75 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.4rem" data-i18n="bootstrap_note">
|
| 76 |
-
IC à 95% sur le CER moyen par moteur (1000 itérations bootstrap).
|
| 77 |
-
</div>
|
| 78 |
-
</div>
|
| 79 |
-
|
| 80 |
-
<!-- Sprint 7 — Diagramme de Venn -->
|
| 81 |
-
<div class="chart-card">
|
| 82 |
-
<h3 data-i18n="h_venn">Erreurs communes / exclusives (Venn)</h3>
|
| 83 |
-
<div id="venn-container" style="min-height:260px;display:flex;align-items:center;justify-content:center"></div>
|
| 84 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.4rem technical" data-i18n="venn_note">
|
| 85 |
-
Intersection des ensembles d'erreurs entre les 2 ou 3 premiers concurrents.
|
| 86 |
-
Erreurs communes = segments partagés.
|
| 87 |
-
</div>
|
| 88 |
-
</div>
|
| 89 |
-
|
| 90 |
-
<!-- Sprint 7 — Tests de Wilcoxon -->
|
| 91 |
-
<div class="chart-card technical">
|
| 92 |
-
<h3 data-i18n="h_pairwise">Tests de Wilcoxon — comparaisons par paires</h3>
|
| 93 |
-
<div id="wilcoxon-table-container" style="overflow-x:auto"></div>
|
| 94 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.4rem" data-i18n="pairwise_note">
|
| 95 |
-
Test signé-rangé de Wilcoxon (non-paramétrique). Seuil α = 0.05.
|
| 96 |
-
</div>
|
| 97 |
-
</div>
|
| 98 |
-
|
| 99 |
-
<!-- Sprint 7 — Clustering des erreurs -->
|
| 100 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 101 |
-
<h3 data-i18n="h_clusters">Clustering des patterns d'erreurs</h3>
|
| 102 |
-
<div id="error-clusters-container"></div>
|
| 103 |
-
</div>
|
| 104 |
-
|
| 105 |
-
<!-- Sprint 10 — Scatter Gini vs CER moyen -->
|
| 106 |
-
<div class="chart-card">
|
| 107 |
-
<h3 data-i18n="h_gini_cer">Gini vs CER moyen <span style="font-size:.72rem;font-weight:400;color:var(--text-muted)" data-i18n="gini_cer_ideal">— idéal : bas-gauche</span></h3>
|
| 108 |
-
<div class="chart-canvas-wrap">
|
| 109 |
-
<canvas id="chart-gini-cer" role="img" aria-label="Gini vs CER" data-a11y-label="Gini vs CER"></canvas>
|
| 110 |
-
</div>
|
| 111 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.4rem" data-i18n="gini_cer_note">
|
| 112 |
-
Axe X = CER moyen, Axe Y = coefficient de Gini. Un moteur idéal a CER bas ET Gini bas (erreurs rares et uniformes).
|
| 113 |
-
</div>
|
| 114 |
-
</div>
|
| 115 |
-
|
| 116 |
-
<!-- Sprint 10 — Scatter ratio longueur vs ancrage -->
|
| 117 |
-
<div class="chart-card">
|
| 118 |
-
<h3 data-i18n="h_ratio_anchor">Ratio longueur vs ancrage <span style="font-size:.72rem;font-weight:400;color:var(--text-muted)" data-i18n="ratio_anchor_subtitle">— hallucinations VLM</span></h3>
|
| 119 |
-
<div class="chart-canvas-wrap">
|
| 120 |
-
<canvas id="chart-ratio-anchor" role="img" aria-label="Score d'ancrage par moteur" data-a11y-label="Score d'ancrage par moteur"></canvas>
|
| 121 |
-
</div>
|
| 122 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.4rem" data-i18n="ratio_anchor_note">
|
| 123 |
-
Axe X = score d'ancrage trigrammes [0–1]. Axe Y = ratio longueur sortie/GT.
|
| 124 |
-
Zone ⚠️ : ancrage < 0.5 ou ratio > 1.2 → hallucinations probables.
|
| 125 |
-
</div>
|
| 126 |
-
</div>
|
| 127 |
-
|
| 128 |
-
<!-- Sprint 19 — Vue Pareto coût/qualité ────────────────────────── -->
|
| 129 |
-
<div class="chart-card pareto-card" style="grid-column:1/-1">
|
| 130 |
-
<h3 data-i18n="h_pareto">Compromis qualité / coût</h3>
|
| 131 |
-
<div class="pareto-toolbar">
|
| 132 |
-
<button class="pareto-toggle active" data-axis="cost" onclick="setParetoAxis('cost')"
|
| 133 |
-
data-i18n="pareto_axis_cost">Coût € / 1000 pages</button>
|
| 134 |
-
<button class="pareto-toggle" data-axis="speed" onclick="setParetoAxis('speed')"
|
| 135 |
-
data-i18n="pareto_axis_speed">Vitesse (s / page)</button>
|
| 136 |
-
<button class="pareto-toggle pareto-experimental" data-axis="co2"
|
| 137 |
-
onclick="setParetoAxis('co2')" data-i18n="pareto_axis_co2"
|
| 138 |
-
title="Estimation expérimentale">Carbone (g CO₂)</button>
|
| 139 |
-
</div>
|
| 140 |
-
<div class="chart-canvas-wrap"><canvas id="pareto-chart" role="img" aria-label="Front Pareto coût/qualité" data-a11y-label="Front Pareto coût/qualité"></canvas></div>
|
| 141 |
-
<div id="pareto-method-note" class="pareto-note" data-i18n="pareto_note">
|
| 142 |
-
Les moteurs sur la frontière de Pareto (en évidence) sont ceux pour
|
| 143 |
-
lesquels aucun autre moteur n'offre simultanément un meilleur CER ET
|
| 144 |
-
un meilleur coût. Prix indicatifs (table interne, datée). Le mode
|
| 145 |
-
carbone est expérimental.
|
| 146 |
-
</div>
|
| 147 |
-
<details class="pareto-assumptions">
|
| 148 |
-
<summary data-i18n="pareto_assumptions_summary">Hypothèses détaillées par moteur</summary>
|
| 149 |
-
<ul id="pareto-assumptions-list"></ul>
|
| 150 |
-
</details>
|
| 151 |
-
</div>
|
| 152 |
-
|
| 153 |
-
<!-- Sprint 43 — Calibration des moteurs (ECE, MCE, reliability diagram) -->
|
| 154 |
-
{% if calibration_summary_html or reliability_diagrams_html %}
|
| 155 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 156 |
-
<h3 data-i18n="h_calibration">Calibration des moteurs</h3>
|
| 157 |
-
<div class="calibration-grid"
|
| 158 |
-
style="display:grid;gap:1.2rem;align-items:start">
|
| 159 |
-
{% if calibration_summary_html %}
|
| 160 |
-
<div>{{ calibration_summary_html }}</div>
|
| 161 |
-
{% endif %}
|
| 162 |
-
{% if reliability_diagrams_html %}
|
| 163 |
-
<div>{{ reliability_diagrams_html }}</div>
|
| 164 |
-
{% endif %}
|
| 165 |
-
</div>
|
| 166 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.6rem"
|
| 167 |
-
data-i18n="calibration_note">
|
| 168 |
-
ECE (Expected Calibration Error) : moyenne pondérée des écarts
|
| 169 |
-
|confiance − précision| par bin. Plus l'ECE est bas, plus le
|
| 170 |
-
moteur est honnête sur sa fiabilité — la diagonale du diagramme
|
| 171 |
-
représente la calibration parfaite. Un ECE élevé signale qu'on
|
| 172 |
-
ne peut pas se fier au score de confiance pour cibler la
|
| 173 |
-
relecture humaine.
|
| 174 |
-
</div>
|
| 175 |
-
</div>
|
| 176 |
-
{% endif %}
|
| 177 |
-
|
| 178 |
-
<!-- Sprint 41 — Précision sur entités nommées (NER) -->
|
| 179 |
-
{% if ner_summary_html or ner_per_category_html %}
|
| 180 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 181 |
-
<h3 data-i18n="h_ner">Précision sur entités nommées</h3>
|
| 182 |
-
<div class="ner-grid"
|
| 183 |
-
style="display:grid;gap:1.2rem;align-items:start">
|
| 184 |
-
{% if ner_summary_html %}
|
| 185 |
-
<div>{{ ner_summary_html }}</div>
|
| 186 |
-
{% endif %}
|
| 187 |
-
{% if ner_per_category_html %}
|
| 188 |
-
<div>{{ ner_per_category_html }}</div>
|
| 189 |
-
{% endif %}
|
| 190 |
-
</div>
|
| 191 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.6rem"
|
| 192 |
-
data-i18n="ner_note">
|
| 193 |
-
F1 calculé par alignement IoU ≥ 0,5 sur les spans (labels
|
| 194 |
-
case-insensitive). Plus le F1 est haut, plus le moteur restitue
|
| 195 |
-
fidèlement les entités nommées (personnes, lieux, dates) — ce
|
| 196 |
-
qui prédit l'utilité aval pour l'indexation prosopographique.
|
| 197 |
-
Cette métrique mesure conjointement OCR + extracteur NER ; le
|
| 198 |
-
modèle d'extraction lui-même peut halluciner.
|
| 199 |
-
</div>
|
| 200 |
-
</div>
|
| 201 |
-
{% endif %}
|
| 202 |
-
|
| 203 |
-
<!-- Sprint 62 — Profil philologique (6 sections : unicode_blocks,
|
| 204 |
-
abbreviations, mufi, early_modern, modern_archives, roman_numerals).
|
| 205 |
-
Adaptive : n'apparaît que si au moins un module a du signal. -->
|
| 206 |
-
{% if philological_profile_html %}
|
| 207 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 208 |
-
{{ philological_profile_html }}
|
| 209 |
-
</div>
|
| 210 |
-
{% endif %}
|
| 211 |
-
|
| 212 |
-
<!-- Sprint 86 — A.II.5 : recherchabilité fuzzy + précision sur
|
| 213 |
-
séquences numériques. Adaptive : n'apparaît que si au moins
|
| 214 |
-
un moteur a du signal. -->
|
| 215 |
-
{% if searchability_html %}
|
| 216 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 217 |
-
{{ searchability_html }}
|
| 218 |
-
</div>
|
| 219 |
-
{% endif %}
|
| 220 |
-
{% if numerical_sequences_html %}
|
| 221 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 222 |
-
{{ numerical_sequences_html }}
|
| 223 |
-
</div>
|
| 224 |
-
{% endif %}
|
| 225 |
-
|
| 226 |
-
<!-- Sprint 87 — A.II.2 : lisibilité (delta Flesch). Adaptive :
|
| 227 |
-
n'apparaît que si au moins un moteur a du signal. -->
|
| 228 |
-
{% if readability_html %}
|
| 229 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 230 |
-
{{ readability_html }}
|
| 231 |
-
</div>
|
| 232 |
-
{% endif %}
|
| 233 |
-
|
| 234 |
-
<!-- Sprint 89 — A.II.8b : spécialisation inter-moteurs.
|
| 235 |
-
Adaptive : n'apparaît que si ≥ 2 moteurs avec taxonomie. -->
|
| 236 |
-
{% if specialization_html %}
|
| 237 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 238 |
-
{{ specialization_html }}
|
| 239 |
-
</div>
|
| 240 |
-
{% endif %}
|
| 241 |
-
|
| 242 |
-
<!-- Sprint 37 — Analyse inter-moteurs (divergence taxonomique + oracle gap) -->
|
| 243 |
-
{% if divergence_matrix_html or oracle_gap_html %}
|
| 244 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 245 |
-
<h3 data-i18n="h_inter_engine">Analyse inter-moteurs</h3>
|
| 246 |
-
<div class="inter-engine-grid"
|
| 247 |
-
style="display:grid;grid-template-columns:1fr 1fr;gap:1.2rem;align-items:start">
|
| 248 |
-
{% if divergence_matrix_html %}
|
| 249 |
-
<div>{{ divergence_matrix_html }}</div>
|
| 250 |
-
{% endif %}
|
| 251 |
-
{% if oracle_gap_html %}
|
| 252 |
-
<div>{{ oracle_gap_html }}</div>
|
| 253 |
-
{% endif %}
|
| 254 |
-
</div>
|
| 255 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.6rem"
|
| 256 |
-
data-i18n="inter_engine_note">
|
| 257 |
-
Plus la divergence est élevée, plus deux moteurs se trompent sur des
|
| 258 |
-
classes d'erreurs différentes — ils sont alors candidats à un voting
|
| 259 |
-
ensemble. L'oracle est la borne supérieure du recall token-level
|
| 260 |
-
atteignable par ce voting (proxy bag-of-words).
|
| 261 |
-
</div>
|
| 262 |
-
</div>
|
| 263 |
-
{% endif %}
|
| 264 |
-
|
| 265 |
-
<!-- Chantier 3 (post-Sprint 97) — vues thématiques composées
|
| 266 |
-
qui regroupent les renderers orphelins en sections
|
| 267 |
-
collapsibles. Adaptive : ne s'affichent que si la vue
|
| 268 |
-
retourne du contenu (au moins une sous-section avec signal). -->
|
| 269 |
-
{% if economics_view_html %}
|
| 270 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 271 |
-
{{ economics_view_html }}
|
| 272 |
-
</div>
|
| 273 |
-
{% endif %}
|
| 274 |
-
{% if advanced_taxonomy_view_html %}
|
| 275 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 276 |
-
{{ advanced_taxonomy_view_html }}
|
| 277 |
-
</div>
|
| 278 |
-
{% endif %}
|
| 279 |
-
{% if diagnostics_view_html %}
|
| 280 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 281 |
-
{{ diagnostics_view_html }}
|
| 282 |
-
</div>
|
| 283 |
-
{% endif %}
|
| 284 |
-
|
| 285 |
-
<!-- Sprint « câblage des modules test-only » (mai 2026) :
|
| 286 |
-
4 sections issues de ``report_data.extra_metrics``.
|
| 287 |
-
Adaptive : ne s'affichent que si le calcul a remonté du signal. -->
|
| 288 |
-
{% if rare_token_recall_html %}
|
| 289 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 290 |
-
{{ rare_token_recall_html }}
|
| 291 |
-
</div>
|
| 292 |
-
{% endif %}
|
| 293 |
-
{% if taxonomy_cooccurrence_html %}
|
| 294 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 295 |
-
{{ taxonomy_cooccurrence_html }}
|
| 296 |
-
</div>
|
| 297 |
-
{% endif %}
|
| 298 |
-
{% if taxonomy_intra_doc_html %}
|
| 299 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 300 |
-
{{ taxonomy_intra_doc_html }}
|
| 301 |
-
</div>
|
| 302 |
-
{% endif %}
|
| 303 |
-
{% if marginal_cost_html %}
|
| 304 |
-
<div class="chart-card" style="grid-column:1/-1">
|
| 305 |
-
{{ marginal_cost_html }}
|
| 306 |
-
</div>
|
| 307 |
-
{% endif %}
|
| 308 |
-
|
| 309 |
-
<!-- Sprint 7 — Matrice de corrélation -->
|
| 310 |
-
<div class="chart-card technical" style="grid-column:1/-1">
|
| 311 |
-
<h3 data-i18n="h_correlation">Matrice de corrélation entre métriques</h3>
|
| 312 |
-
<div style="margin-bottom:.5rem">
|
| 313 |
-
<label style="font-size:.82rem;font-weight:600"><span data-i18n="corr_engine_label">Moteur :</span>
|
| 314 |
-
<select id="corr-engine-select" onchange="renderCorrelationMatrix()"
|
| 315 |
-
style="padding:.25rem .5rem;border-radius:6px;border:1px solid var(--border);margin-left:.25rem"></select>
|
| 316 |
-
</label>
|
| 317 |
-
</div>
|
| 318 |
-
<div id="corr-matrix-container" style="overflow-x:auto"></div>
|
| 319 |
-
<div style="font-size:.72rem;color:var(--text-muted);margin-top:.4rem" data-i18n="corr_note">
|
| 320 |
-
Coefficient de Pearson entre les métriques CER, WER, qualité image, ligatures, diacritiques.
|
| 321 |
-
Vert = corrélation positive, Rouge = corrélation négative.
|
| 322 |
-
</div>
|
| 323 |
-
</div>
|
| 324 |
-
|
| 325 |
-
</div>
|
| 326 |
-
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,32 +0,0 @@
|
|
| 1 |
-
<!-- ════ Vue 5 : Caractères ════════════════════════════════════════ -->
|
| 2 |
-
<div id="view-characters" class="view">
|
| 3 |
-
<div class="card">
|
| 4 |
-
<h2 data-i18n="h_characters">Analyse des caractères</h2>
|
| 5 |
-
|
| 6 |
-
<!-- Sélecteur de moteur -->
|
| 7 |
-
<div class="stat-row" style="margin-bottom:1rem">
|
| 8 |
-
<label for="char-engine-select" style="font-weight:600;margin-right:.5rem" data-i18n="char_engine_label">Moteur :</label>
|
| 9 |
-
<select id="char-engine-select" onchange="renderCharView()"
|
| 10 |
-
style="padding:.35rem .7rem;border-radius:6px;border:1px solid var(--border)"></select>
|
| 11 |
-
</div>
|
| 12 |
-
|
| 13 |
-
<!-- Scores ligatures / diacritiques -->
|
| 14 |
-
<div class="stat-row" id="char-scores-row" style="gap:1.5rem;margin-bottom:1.5rem"></div>
|
| 15 |
-
|
| 16 |
-
<!-- Matrice de confusion unicode -->
|
| 17 |
-
<h3 style="margin-bottom:.75rem">Matrice de confusion unicode
|
| 18 |
-
<span style="font-size:.75rem;font-weight:400;color:var(--text-muted)">
|
| 19 |
-
— substitutions les plus fréquentes (caractère GT → caractère OCR)
|
| 20 |
-
</span>
|
| 21 |
-
</h3>
|
| 22 |
-
<div id="confusion-heatmap" style="overflow-x:auto;margin-bottom:1.5rem"></div>
|
| 23 |
-
|
| 24 |
-
<!-- Détail ligatures par type -->
|
| 25 |
-
<h3 style="margin-bottom:.75rem">Reconnaissance des ligatures</h3>
|
| 26 |
-
<div id="ligature-detail" style="margin-bottom:1.5rem"></div>
|
| 27 |
-
|
| 28 |
-
<!-- Taxonomie détaillée -->
|
| 29 |
-
<h3 style="margin-bottom:.75rem">Distribution taxonomique des erreurs</h3>
|
| 30 |
-
<div id="taxonomy-detail"></div>
|
| 31 |
-
</div>
|
| 32 |
-
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,83 +0,0 @@
|
|
| 1 |
-
|
| 2 |
-
<!-- ════ Vue 3 : Document ══════════════════════════════════════════ -->
|
| 3 |
-
<div id="view-document" class="view">
|
| 4 |
-
<div class="doc-layout">
|
| 5 |
-
<!-- Sidebar -->
|
| 6 |
-
<aside class="doc-sidebar">
|
| 7 |
-
<div class="doc-sidebar-header" data-i18n="doc_sidebar_header">Documents</div>
|
| 8 |
-
<div id="doc-list"></div>
|
| 9 |
-
</aside>
|
| 10 |
-
|
| 11 |
-
<!-- Contenu principal -->
|
| 12 |
-
<div>
|
| 13 |
-
<div class="card" id="doc-detail-header">
|
| 14 |
-
<div style="display:flex; align-items:baseline; justify-content:space-between; flex-wrap:wrap; gap:.5rem">
|
| 15 |
-
<h2 id="doc-detail-title" data-i18n="doc_title_default">Sélectionner un document</h2>
|
| 16 |
-
<div class="stat-row" id="doc-detail-metrics"></div>
|
| 17 |
-
</div>
|
| 18 |
-
</div>
|
| 19 |
-
|
| 20 |
-
<!-- Image zoomable -->
|
| 21 |
-
<div class="card">
|
| 22 |
-
<h3 data-i18n="h_image">Image originale</h3>
|
| 23 |
-
<div class="doc-image-wrap" id="doc-image-wrap"
|
| 24 |
-
onwheel="handleZoom(event)"
|
| 25 |
-
onmousedown="startDrag(event)"
|
| 26 |
-
onmousemove="doDrag(event)"
|
| 27 |
-
onmouseup="endDrag()"
|
| 28 |
-
onmouseleave="endDrag()">
|
| 29 |
-
<div class="doc-image-placeholder" id="doc-image-placeholder">
|
| 30 |
-
<span style="font-size:2rem">🖼</span>
|
| 31 |
-
<span>Sélectionnez un document</span>
|
| 32 |
-
</div>
|
| 33 |
-
<img id="doc-image" src="" alt="Image du document" style="display:none">
|
| 34 |
-
<div class="zoom-controls">
|
| 35 |
-
<button class="zoom-btn" onclick="zoom(1.25)" title="Zoom +">+</button>
|
| 36 |
-
<button class="zoom-btn" onclick="zoom(0.8)" title="Zoom −">−</button>
|
| 37 |
-
<button class="zoom-btn" onclick="resetZoom()" title="Réinitialiser">↺</button>
|
| 38 |
-
</div>
|
| 39 |
-
</div>
|
| 40 |
-
</div>
|
| 41 |
-
|
| 42 |
-
<!-- Diff côte à côte GT / OCR -->
|
| 43 |
-
<div class="card" id="doc-sidebyside-card">
|
| 44 |
-
<div class="sbs-header">
|
| 45 |
-
<h3 data-i18n="h_diff">Comparaison GT / OCR</h3>
|
| 46 |
-
<div class="sbs-engine-select" id="sbs-engine-select" style="display:none">
|
| 47 |
-
<label data-i18n="sbs_engine_label">Concurrent :</label>
|
| 48 |
-
<select id="sbs-engine-dropdown" onchange="renderSideBySide(currentDocId)"></select>
|
| 49 |
-
</div>
|
| 50 |
-
</div>
|
| 51 |
-
<div class="sbs-columns" id="sbs-columns">
|
| 52 |
-
<div class="sbs-col sbs-col-gt">
|
| 53 |
-
<div class="sbs-col-header sbs-gt-header">
|
| 54 |
-
<span>✓ Vérité terrain (GT)</span>
|
| 55 |
-
</div>
|
| 56 |
-
<div class="sbs-col-body" id="sbs-gt-body">—</div>
|
| 57 |
-
</div>
|
| 58 |
-
<div class="sbs-col sbs-col-ocr">
|
| 59 |
-
<div class="sbs-col-header sbs-ocr-header" id="sbs-ocr-header">
|
| 60 |
-
<span id="sbs-ocr-engine-name">OCR</span>
|
| 61 |
-
<span class="cer-badge" id="sbs-ocr-cer" style="display:none"></span>
|
| 62 |
-
</div>
|
| 63 |
-
<div class="sbs-col-body" id="sbs-ocr-body">—</div>
|
| 64 |
-
</div>
|
| 65 |
-
</div>
|
| 66 |
-
<!-- Pipeline triple-diff (affiché en dessous si applicable) -->
|
| 67 |
-
<div id="sbs-triple-diff" style="display:none"></div>
|
| 68 |
-
</div>
|
| 69 |
-
|
| 70 |
-
<!-- Sprint 10 — Distribution CER par ligne -->
|
| 71 |
-
<div class="card" id="doc-line-metrics-card" style="display:none">
|
| 72 |
-
<h3 data-i18n="h_line_metrics">Distribution des erreurs par ligne</h3>
|
| 73 |
-
<div id="doc-line-metrics-content"></div>
|
| 74 |
-
</div>
|
| 75 |
-
|
| 76 |
-
<!-- Sprint 10 — Hallucinations détectées -->
|
| 77 |
-
<div class="card" id="doc-hallucination-card" style="display:none">
|
| 78 |
-
<h3 data-i18n="h_hallucination">Analyse des hallucinations</h3>
|
| 79 |
-
<div id="doc-hallucination-content"></div>
|
| 80 |
-
</div>
|
| 81 |
-
</div>
|
| 82 |
-
</div>
|
| 83 |
-
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,35 +0,0 @@
|
|
| 1 |
-
|
| 2 |
-
<!-- ════ Vue 2 : Galerie ═══════════════════════════════════════════ -->
|
| 3 |
-
<div id="view-gallery" class="view">
|
| 4 |
-
<div class="card">
|
| 5 |
-
<h2 data-i18n="h_gallery">Galerie des documents</h2>
|
| 6 |
-
<div class="gallery-controls">
|
| 7 |
-
<label><span data-i18n="gallery_sort_label">Trier par :</span>
|
| 8 |
-
<select id="gallery-sort" onchange="renderGallery()">
|
| 9 |
-
<option value="doc_id" data-i18n-opt="gallery_sort_id">Identifiant</option>
|
| 10 |
-
<option value="mean_cer" data-i18n-opt="gallery_sort_cer">CER moyen</option>
|
| 11 |
-
<option value="difficulty_score" data-i18n-opt="gallery_sort_difficulty">Difficulté</option>
|
| 12 |
-
<option value="best_engine" data-i18n-opt="gallery_sort_best">Meilleur moteur</option>
|
| 13 |
-
</select>
|
| 14 |
-
</label>
|
| 15 |
-
<label><span data-i18n="gallery_filter_cer_label">Filtrer CER ></span>
|
| 16 |
-
<input type="number" id="gallery-filter-cer" min="0" max="100" value="0" step="1"
|
| 17 |
-
style="width:60px" onchange="renderGallery()"> %
|
| 18 |
-
</label>
|
| 19 |
-
<label><span data-i18n="gallery_filter_engine_label">Moteur :</span>
|
| 20 |
-
<select id="gallery-engine-select" onchange="renderGallery()">
|
| 21 |
-
<option value="" data-i18n-opt="gallery_filter_all">Tous</option>
|
| 22 |
-
</select>
|
| 23 |
-
</label>
|
| 24 |
-
<button class="btn-secondary" onclick="resetGalleryExclusions()" id="gallery-reset-btn"
|
| 25 |
-
title="Réinitialiser toutes les exclusions manuelles" style="display:none">
|
| 26 |
-
↺ Réinitialiser exclusions
|
| 27 |
-
</button>
|
| 28 |
-
</div>
|
| 29 |
-
<div id="gallery-exclusion-info" style="font-size:.82rem;color:var(--text-muted);margin:.4rem 0;display:none"></div>
|
| 30 |
-
<div id="gallery-grid" class="gallery-grid"></div>
|
| 31 |
-
<div id="gallery-empty" class="empty-state" style="display:none" data-i18n="gallery_empty">
|
| 32 |
-
Aucun document ne correspond aux filtres.
|
| 33 |
-
</div>
|
| 34 |
-
</div>
|
| 35 |
-
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,91 +0,0 @@
|
|
| 1 |
-
|
| 2 |
-
<!-- ════ Vue 1 : Classement ════════════════════════════════════════ -->
|
| 3 |
-
<div id="view-ranking" class="view active">
|
| 4 |
-
<div class="card">
|
| 5 |
-
<h2 data-i18n="h_ranking">Classement des moteurs</h2>
|
| 6 |
-
<div class="stat-row" id="ranking-stats"></div>
|
| 7 |
-
<div class="table-wrap">
|
| 8 |
-
<table id="ranking-table">
|
| 9 |
-
<thead>
|
| 10 |
-
<tr>
|
| 11 |
-
<th scope="col" data-col="rank" class="sortable sorted" data-dir="asc" data-i18n="col_rank">#<i class="sort-icon">↑</i></th>
|
| 12 |
-
<th scope="col" data-col="name" class="sortable" data-i18n="col_engine">Concurrent<i class="sort-icon">↕</i></th>
|
| 13 |
-
<th scope="col" data-col="cer" class="sortable" data-glossary-key="cer" data-i18n="col_cer">CER exact<i class="sort-icon">↕</i></th>
|
| 14 |
-
<th scope="col" data-col="cer_diplomatic" class="sortable" id="th-cer-diplo" data-glossary-key="cer_diplomatic" data-i18n="col_cer_diplo">CER diplo.<i class="sort-icon">↕</i></th>
|
| 15 |
-
<th scope="col" data-col="wer" class="sortable" data-glossary-key="wer" data-i18n="col_wer">WER<i class="sort-icon">↕</i></th>
|
| 16 |
-
<th scope="col" data-col="mer" class="sortable" data-glossary-key="mer" data-i18n="col_mer">MER<i class="sort-icon">↕</i></th>
|
| 17 |
-
<th scope="col" data-col="wil" class="sortable" data-glossary-key="wil" data-i18n="col_wil">WIL<i class="sort-icon">↕</i></th>
|
| 18 |
-
<th scope="col" data-col="ligature_score" class="sortable" id="th-ligatures" data-glossary-key="ligature_score" data-i18n="col_ligatures">Ligatures<i class="sort-icon">↕</i></th>
|
| 19 |
-
<th scope="col" data-col="diacritic_score" class="sortable" id="th-diacritics" data-glossary-key="diacritic_score" data-i18n="col_diacritics">Diacritiques<i class="sort-icon">↕</i></th>
|
| 20 |
-
<th scope="col" data-col="gini" class="sortable" id="th-gini" data-glossary-key="gini" data-i18n="col_gini">Gini<i class="sort-icon">↕</i></th>
|
| 21 |
-
<th scope="col" data-col="anchor_score" class="sortable" id="th-anchor" data-glossary-key="anchor_score" data-i18n="col_anchor">Ancrage<i class="sort-icon">↕</i></th>
|
| 22 |
-
<th scope="col" data-i18n="col_cer_median">CER médian</th>
|
| 23 |
-
<th scope="col" data-i18n="col_cer_min">CER min</th>
|
| 24 |
-
<th scope="col" data-i18n="col_cer_max">CER max</th>
|
| 25 |
-
<th scope="col" id="th-overnorm" data-i18n="col_overnorm">Sur-norm.</th>
|
| 26 |
-
<th scope="col" data-i18n="col_docs">Docs</th>
|
| 27 |
-
</tr>
|
| 28 |
-
</thead>
|
| 29 |
-
<tbody id="ranking-tbody"></tbody>
|
| 30 |
-
</table>
|
| 31 |
-
</div>
|
| 32 |
-
<div class="stat-row" style="margin-top:.75rem">
|
| 33 |
-
<div class="legend-row">
|
| 34 |
-
<span class="legend-dot" style="background:#16a34a"></span>CER < 5 %
|
| 35 |
-
</div>
|
| 36 |
-
<div class="legend-row">
|
| 37 |
-
<span class="legend-dot" style="background:#ca8a04"></span>5–15 %
|
| 38 |
-
</div>
|
| 39 |
-
<div class="legend-row">
|
| 40 |
-
<span class="legend-dot" style="background:#ea580c"></span>15–30 %
|
| 41 |
-
</div>
|
| 42 |
-
<div class="legend-row">
|
| 43 |
-
<span class="legend-dot" style="background:#dc2626"></span>> 30 %
|
| 44 |
-
</div>
|
| 45 |
-
</div>
|
| 46 |
-
|
| 47 |
-
<!-- Sprint 46 — vue stratifiée par script_type (rapport adaptatif :
|
| 48 |
-
section omise quand aucune strate n'est disponible) -->
|
| 49 |
-
{% if stratified_ranking_html %}
|
| 50 |
-
{{ stratified_ranking_html }}
|
| 51 |
-
{% endif %}
|
| 52 |
-
</div>
|
| 53 |
-
|
| 54 |
-
<!-- ── Métriques robustes ────────────────────────────────────── -->
|
| 55 |
-
<div class="card" id="robust-metrics-card">
|
| 56 |
-
<h2 data-i18n="h_robust">Analyse robuste (sans hallucinations)</h2>
|
| 57 |
-
<p style="font-size:.82rem;color:var(--text-muted);margin-bottom:.75rem" data-i18n="robust_desc">
|
| 58 |
-
Recalcule CER, WER, MER, WIL, Gini et ancrage en excluant les documents détectés comme hallucinés ou problématiques.
|
| 59 |
-
Cochez/décochez des documents dans la Galerie pour les exclure manuellement.
|
| 60 |
-
</p>
|
| 61 |
-
<div class="robust-controls">
|
| 62 |
-
<label>
|
| 63 |
-
<button class="robust-toggle" id="robust-cer-toggle" data-active="true"
|
| 64 |
-
onclick="toggleRobustCriterion('cer',this)">✓</button>
|
| 65 |
-
<span data-i18n="robust_cer_label">CER > seuil :</span>
|
| 66 |
-
<input type="range" id="robust-cer" min="0" max="100" step="1" value="100"
|
| 67 |
-
oninput="document.getElementById('robust-cer-val').textContent=parseInt(this.value)+'%';_computeHallucinationExclusions();recalculateAll()">
|
| 68 |
-
<span id="robust-cer-val" class="slider-val">100%</span>
|
| 69 |
-
</label>
|
| 70 |
-
<label>
|
| 71 |
-
<button class="robust-toggle" id="robust-anchor-toggle" data-active="true"
|
| 72 |
-
onclick="toggleRobustCriterion('anchor',this)">✓</button>
|
| 73 |
-
<span data-i18n="robust_anchor_label">Ancrage < seuil :</span>
|
| 74 |
-
<input type="range" id="robust-anchor" min="0" max="1" step="0.05" value="0.5"
|
| 75 |
-
oninput="document.getElementById('robust-anchor-val').textContent=parseFloat(this.value).toFixed(2);_computeHallucinationExclusions();recalculateAll()">
|
| 76 |
-
<span id="robust-anchor-val" class="slider-val">0.50</span>
|
| 77 |
-
</label>
|
| 78 |
-
<label>
|
| 79 |
-
<button class="robust-toggle" id="robust-ratio-toggle" data-active="true"
|
| 80 |
-
onclick="toggleRobustCriterion('ratio',this)">✓</button>
|
| 81 |
-
<span data-i18n="robust_ratio_label">Ratio longueur > seuil :</span>
|
| 82 |
-
<input type="range" id="robust-ratio" min="1" max="3" step="0.1" value="1.5"
|
| 83 |
-
oninput="document.getElementById('robust-ratio-val').textContent=parseFloat(this.value).toFixed(1);_computeHallucinationExclusions();recalculateAll()">
|
| 84 |
-
<span id="robust-ratio-val" class="slider-val">1.5</span>
|
| 85 |
-
</label>
|
| 86 |
-
</div>
|
| 87 |
-
<div id="robust-summary" style="font-size:.85rem;font-weight:600;margin:.75rem 0;padding:.5rem .75rem;background:var(--bg);border-radius:.4rem;border:1px solid var(--border)"></div>
|
| 88 |
-
<div id="robust-table-wrap" class="table-wrap"></div>
|
| 89 |
-
<div id="robust-excluded-docs" style="margin-top:.75rem;font-size:.82rem"></div>
|
| 90 |
-
</div>
|
| 91 |
-
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -21,6 +21,7 @@ Usage
|
|
| 21 |
|
| 22 |
from __future__ import annotations
|
| 23 |
|
|
|
|
| 24 |
from picarones.reports_v2.html.render import HtmlReportRenderer
|
| 25 |
|
| 26 |
-
__all__ = ["HtmlReportRenderer"]
|
|
|
|
| 21 |
|
| 22 |
from __future__ import annotations
|
| 23 |
|
| 24 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 25 |
from picarones.reports_v2.html.render import HtmlReportRenderer
|
| 26 |
|
| 27 |
+
__all__ = ["HtmlReportRenderer", "ReportGenerator"]
|
|
@@ -0,0 +1,414 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Comparaison de deux runs de benchmark (Sprint 28).
|
| 2 |
+
|
| 3 |
+
Phase 5.E — module relocalisé depuis ``picarones.report.comparison``
|
| 4 |
+
vers ``picarones.reports_v2.html.comparison``. Le chemin legacy
|
| 5 |
+
reste disponible via un shim avec ``DeprecationWarning`` ;
|
| 6 |
+
suppression prévue en 2.0.
|
| 7 |
+
|
| 8 |
+
Le Sprint 8 a livré la persistance longitudinale via SQLite
|
| 9 |
+
(``picarones.measurements.history``) et un détecteur de régression CLI. Mais
|
| 10 |
+
aucun outil n'exposait la **comparaison** de deux runs côté rapport :
|
| 11 |
+
un chercheur qui itère sur 8 prompts ne pouvait pas voir d'un coup
|
| 12 |
+
*« Tesseract → GPT-4o version V2 a régressé de 0,8 pp en CER moyen
|
| 13 |
+
sur la strate paroissiaux par rapport à V1 »*.
|
| 14 |
+
|
| 15 |
+
Ce module fournit :
|
| 16 |
+
|
| 17 |
+
- ``load_benchmark_json(path)`` — charge le JSON produit par
|
| 18 |
+
``BenchmarkResult.as_dict()`` ou ``picarones run -o results.json``.
|
| 19 |
+
- ``compare_benchmarks(a, b)`` — calcule les deltas par moteur
|
| 20 |
+
(CER mean, WER mean, comptes de documents traités/échoués) et
|
| 21 |
+
par strate quand la métadonnée est présente.
|
| 22 |
+
- ``detect_regressions(diff, threshold)`` — liste les moteurs en
|
| 23 |
+
régression (delta CER > threshold) et en amélioration
|
| 24 |
+
(delta CER < -threshold).
|
| 25 |
+
- ``render_comparison_html(diff, output_path)`` — rendu HTML
|
| 26 |
+
auto-contenu minimal via Jinja2 pour partage.
|
| 27 |
+
|
| 28 |
+
Conventions
|
| 29 |
+
-----------
|
| 30 |
+
- Les deltas sont calculés ``b - a`` (donc positif = ``b`` est pire).
|
| 31 |
+
- Un moteur présent dans un seul run apparaît dans ``only_in_a`` /
|
| 32 |
+
``only_in_b``, jamais dans ``deltas``.
|
| 33 |
+
- Un moteur dont le ``mean_cer`` est ``None`` (échec total) est
|
| 34 |
+
signalé mais ne génère pas de delta numérique.
|
| 35 |
+
- ``threshold`` est en absolu (CER en fraction, pas en %). Défaut
|
| 36 |
+
0.005 = 0,5 pp.
|
| 37 |
+
"""
|
| 38 |
+
|
| 39 |
+
from __future__ import annotations
|
| 40 |
+
|
| 41 |
+
import json
|
| 42 |
+
import logging
|
| 43 |
+
from dataclasses import dataclass, field
|
| 44 |
+
from pathlib import Path
|
| 45 |
+
from typing import Any, Optional
|
| 46 |
+
|
| 47 |
+
logger = logging.getLogger(__name__)
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
# ---------------------------------------------------------------------------
|
| 51 |
+
# Modèles
|
| 52 |
+
# ---------------------------------------------------------------------------
|
| 53 |
+
|
| 54 |
+
@dataclass
|
| 55 |
+
class EngineDelta:
|
| 56 |
+
"""Différence ``b - a`` pour un moteur donné."""
|
| 57 |
+
engine: str
|
| 58 |
+
cer_a: Optional[float]
|
| 59 |
+
cer_b: Optional[float]
|
| 60 |
+
delta_cer: Optional[float]
|
| 61 |
+
wer_a: Optional[float]
|
| 62 |
+
wer_b: Optional[float]
|
| 63 |
+
delta_wer: Optional[float]
|
| 64 |
+
docs_a: int
|
| 65 |
+
docs_b: int
|
| 66 |
+
failed_a: int
|
| 67 |
+
failed_b: int
|
| 68 |
+
is_regression: bool = False
|
| 69 |
+
is_improvement: bool = False
|
| 70 |
+
|
| 71 |
+
def as_dict(self) -> dict[str, Any]:
|
| 72 |
+
return {
|
| 73 |
+
"engine": self.engine,
|
| 74 |
+
"cer_a": self.cer_a,
|
| 75 |
+
"cer_b": self.cer_b,
|
| 76 |
+
"delta_cer": self.delta_cer,
|
| 77 |
+
"wer_a": self.wer_a,
|
| 78 |
+
"wer_b": self.wer_b,
|
| 79 |
+
"delta_wer": self.delta_wer,
|
| 80 |
+
"docs_a": self.docs_a,
|
| 81 |
+
"docs_b": self.docs_b,
|
| 82 |
+
"failed_a": self.failed_a,
|
| 83 |
+
"failed_b": self.failed_b,
|
| 84 |
+
"is_regression": self.is_regression,
|
| 85 |
+
"is_improvement": self.is_improvement,
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
@dataclass
|
| 90 |
+
class ComparisonResult:
|
| 91 |
+
"""Résultat d'une comparaison ``b - a`` entre deux runs."""
|
| 92 |
+
label_a: str
|
| 93 |
+
label_b: str
|
| 94 |
+
run_date_a: Optional[str]
|
| 95 |
+
run_date_b: Optional[str]
|
| 96 |
+
corpus_a: Optional[str]
|
| 97 |
+
corpus_b: Optional[str]
|
| 98 |
+
deltas: list[EngineDelta] = field(default_factory=list)
|
| 99 |
+
only_in_a: list[str] = field(default_factory=list)
|
| 100 |
+
only_in_b: list[str] = field(default_factory=list)
|
| 101 |
+
threshold: float = 0.005
|
| 102 |
+
|
| 103 |
+
def as_dict(self) -> dict[str, Any]:
|
| 104 |
+
return {
|
| 105 |
+
"label_a": self.label_a,
|
| 106 |
+
"label_b": self.label_b,
|
| 107 |
+
"run_date_a": self.run_date_a,
|
| 108 |
+
"run_date_b": self.run_date_b,
|
| 109 |
+
"corpus_a": self.corpus_a,
|
| 110 |
+
"corpus_b": self.corpus_b,
|
| 111 |
+
"threshold": self.threshold,
|
| 112 |
+
"deltas": [d.as_dict() for d in self.deltas],
|
| 113 |
+
"only_in_a": list(self.only_in_a),
|
| 114 |
+
"only_in_b": list(self.only_in_b),
|
| 115 |
+
"regressions": [d.as_dict() for d in self.deltas if d.is_regression],
|
| 116 |
+
"improvements": [d.as_dict() for d in self.deltas if d.is_improvement],
|
| 117 |
+
}
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
# ---------------------------------------------------------------------------
|
| 121 |
+
# Chargement
|
| 122 |
+
# ---------------------------------------------------------------------------
|
| 123 |
+
|
| 124 |
+
def load_benchmark_json(path: str | Path) -> dict[str, Any]:
|
| 125 |
+
"""Charge un JSON de benchmark depuis disque.
|
| 126 |
+
|
| 127 |
+
Accepte :
|
| 128 |
+
- le format ``BenchmarkResult.as_dict()`` (clé ``ranking``,
|
| 129 |
+
``engine_reports`` ou ``engines``) ;
|
| 130 |
+
- un dict déjà parsé ; dans ce cas, ``path`` peut être un dict.
|
| 131 |
+
"""
|
| 132 |
+
if isinstance(path, dict):
|
| 133 |
+
return path
|
| 134 |
+
p = Path(path)
|
| 135 |
+
if not p.exists():
|
| 136 |
+
raise FileNotFoundError(f"Fichier benchmark introuvable : {p}")
|
| 137 |
+
with p.open(encoding="utf-8") as fh:
|
| 138 |
+
data = json.load(fh)
|
| 139 |
+
if not isinstance(data, dict):
|
| 140 |
+
raise ValueError(f"Le JSON {p} doit être un dict.")
|
| 141 |
+
return data
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
# ---------------------------------------------------------------------------
|
| 145 |
+
# Comparaison
|
| 146 |
+
# ---------------------------------------------------------------------------
|
| 147 |
+
|
| 148 |
+
def _ranking_index(data: dict[str, Any]) -> dict[str, dict[str, Any]]:
|
| 149 |
+
"""Indexe ``ranking`` par nom de moteur — robuste aux deux formats.
|
| 150 |
+
|
| 151 |
+
Un ``BenchmarkResult.as_dict()`` expose ``ranking`` directement
|
| 152 |
+
(clés ``engine``, ``mean_cer``, …). Le format alternatif ``engines``
|
| 153 |
+
expose le même contenu sous des clés légèrement différentes —
|
| 154 |
+
on normalise vers le format ``ranking``.
|
| 155 |
+
"""
|
| 156 |
+
ranking = data.get("ranking")
|
| 157 |
+
if isinstance(ranking, list) and ranking:
|
| 158 |
+
return {
|
| 159 |
+
r["engine"]: {
|
| 160 |
+
"engine": r["engine"],
|
| 161 |
+
"mean_cer": r.get("mean_cer"),
|
| 162 |
+
"mean_wer": r.get("mean_wer"),
|
| 163 |
+
"documents": int(r.get("documents") or 0),
|
| 164 |
+
"failed": int(r.get("failed") or 0),
|
| 165 |
+
}
|
| 166 |
+
for r in ranking
|
| 167 |
+
if isinstance(r, dict) and r.get("engine")
|
| 168 |
+
}
|
| 169 |
+
# Fallback : ``engines`` (format report_data)
|
| 170 |
+
engines = data.get("engines") or []
|
| 171 |
+
out: dict[str, dict[str, Any]] = {}
|
| 172 |
+
if isinstance(engines, list):
|
| 173 |
+
for e in engines:
|
| 174 |
+
if not isinstance(e, dict):
|
| 175 |
+
continue
|
| 176 |
+
name = e.get("name") or e.get("engine")
|
| 177 |
+
if not name:
|
| 178 |
+
continue
|
| 179 |
+
out[name] = {
|
| 180 |
+
"engine": name,
|
| 181 |
+
"mean_cer": e.get("cer"),
|
| 182 |
+
"mean_wer": e.get("wer"),
|
| 183 |
+
"documents": int(e.get("documents") or 0),
|
| 184 |
+
"failed": int(e.get("failed") or 0),
|
| 185 |
+
}
|
| 186 |
+
return out
|
| 187 |
+
|
| 188 |
+
|
| 189 |
+
def _label_of(data: dict[str, Any], default: str) -> str:
|
| 190 |
+
meta = data.get("meta") or {}
|
| 191 |
+
return (
|
| 192 |
+
meta.get("corpus_name")
|
| 193 |
+
or (data.get("corpus") or {}).get("name")
|
| 194 |
+
or default
|
| 195 |
+
)
|
| 196 |
+
|
| 197 |
+
|
| 198 |
+
def _run_date_of(data: dict[str, Any]) -> Optional[str]:
|
| 199 |
+
return (
|
| 200 |
+
data.get("run_date")
|
| 201 |
+
or (data.get("meta") or {}).get("run_date")
|
| 202 |
+
)
|
| 203 |
+
|
| 204 |
+
|
| 205 |
+
def _corpus_of(data: dict[str, Any]) -> Optional[str]:
|
| 206 |
+
meta = data.get("meta") or {}
|
| 207 |
+
return (
|
| 208 |
+
meta.get("corpus_source")
|
| 209 |
+
or (data.get("corpus") or {}).get("source")
|
| 210 |
+
or meta.get("corpus_name")
|
| 211 |
+
)
|
| 212 |
+
|
| 213 |
+
|
| 214 |
+
def _safe_delta(a: Optional[float], b: Optional[float]) -> Optional[float]:
|
| 215 |
+
if a is None or b is None:
|
| 216 |
+
return None
|
| 217 |
+
return float(b) - float(a)
|
| 218 |
+
|
| 219 |
+
|
| 220 |
+
def compare_benchmarks(
|
| 221 |
+
a: str | Path | dict[str, Any],
|
| 222 |
+
b: str | Path | dict[str, Any],
|
| 223 |
+
*,
|
| 224 |
+
threshold: float = 0.005,
|
| 225 |
+
label_a: str = "A",
|
| 226 |
+
label_b: str = "B",
|
| 227 |
+
) -> ComparisonResult:
|
| 228 |
+
"""Compare deux runs et retourne les deltas par moteur.
|
| 229 |
+
|
| 230 |
+
Convention : un delta CER positif signifie que ``b`` est *moins bon*
|
| 231 |
+
que ``a`` (régression). Un seuil ``threshold`` strictement positif
|
| 232 |
+
(en fraction, ex. 0,005 = 0,5 pp) discrimine régression / bruit.
|
| 233 |
+
"""
|
| 234 |
+
da = load_benchmark_json(a) if not isinstance(a, dict) else a
|
| 235 |
+
db = load_benchmark_json(b) if not isinstance(b, dict) else b
|
| 236 |
+
|
| 237 |
+
idx_a = _ranking_index(da)
|
| 238 |
+
idx_b = _ranking_index(db)
|
| 239 |
+
|
| 240 |
+
common = sorted(set(idx_a) & set(idx_b))
|
| 241 |
+
only_a = sorted(set(idx_a) - set(idx_b))
|
| 242 |
+
only_b = sorted(set(idx_b) - set(idx_a))
|
| 243 |
+
|
| 244 |
+
deltas: list[EngineDelta] = []
|
| 245 |
+
for name in common:
|
| 246 |
+
ea = idx_a[name]
|
| 247 |
+
eb = idx_b[name]
|
| 248 |
+
delta_cer = _safe_delta(ea["mean_cer"], eb["mean_cer"])
|
| 249 |
+
delta_wer = _safe_delta(ea["mean_wer"], eb["mean_wer"])
|
| 250 |
+
regression = bool(delta_cer is not None and delta_cer > threshold)
|
| 251 |
+
improvement = bool(delta_cer is not None and delta_cer < -threshold)
|
| 252 |
+
deltas.append(
|
| 253 |
+
EngineDelta(
|
| 254 |
+
engine=name,
|
| 255 |
+
cer_a=ea["mean_cer"],
|
| 256 |
+
cer_b=eb["mean_cer"],
|
| 257 |
+
delta_cer=delta_cer,
|
| 258 |
+
wer_a=ea["mean_wer"],
|
| 259 |
+
wer_b=eb["mean_wer"],
|
| 260 |
+
delta_wer=delta_wer,
|
| 261 |
+
docs_a=int(ea["documents"]),
|
| 262 |
+
docs_b=int(eb["documents"]),
|
| 263 |
+
failed_a=int(ea["failed"]),
|
| 264 |
+
failed_b=int(eb["failed"]),
|
| 265 |
+
is_regression=regression,
|
| 266 |
+
is_improvement=improvement,
|
| 267 |
+
)
|
| 268 |
+
)
|
| 269 |
+
|
| 270 |
+
# Tri : régressions (delta décroissant) puis améliorations (delta croissant).
|
| 271 |
+
deltas.sort(key=lambda d: (
|
| 272 |
+
not d.is_regression,
|
| 273 |
+
-(d.delta_cer if d.delta_cer is not None else 0.0),
|
| 274 |
+
))
|
| 275 |
+
|
| 276 |
+
return ComparisonResult(
|
| 277 |
+
label_a=label_a,
|
| 278 |
+
label_b=label_b,
|
| 279 |
+
run_date_a=_run_date_of(da),
|
| 280 |
+
run_date_b=_run_date_of(db),
|
| 281 |
+
corpus_a=_corpus_of(da),
|
| 282 |
+
corpus_b=_corpus_of(db),
|
| 283 |
+
deltas=deltas,
|
| 284 |
+
only_in_a=only_a,
|
| 285 |
+
only_in_b=only_b,
|
| 286 |
+
threshold=float(threshold),
|
| 287 |
+
)
|
| 288 |
+
|
| 289 |
+
|
| 290 |
+
def detect_regressions(
|
| 291 |
+
diff: ComparisonResult,
|
| 292 |
+
) -> list[EngineDelta]:
|
| 293 |
+
"""Retourne uniquement les moteurs en régression dans ``diff``."""
|
| 294 |
+
return [d for d in diff.deltas if d.is_regression]
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
# ---------------------------------------------------------------------------
|
| 298 |
+
# Rendu HTML
|
| 299 |
+
# ---------------------------------------------------------------------------
|
| 300 |
+
|
| 301 |
+
_COMPARISON_TEMPLATE = """<!DOCTYPE html>
|
| 302 |
+
<html lang="fr">
|
| 303 |
+
<head>
|
| 304 |
+
<meta charset="UTF-8">
|
| 305 |
+
<title>Picarones — Comparaison de runs</title>
|
| 306 |
+
<style>
|
| 307 |
+
body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
|
| 308 |
+
max-width: 980px; margin: 2em auto; padding: 0 1em; color: #111; }
|
| 309 |
+
h1 { border-bottom: 2px solid #333; padding-bottom: .4em; }
|
| 310 |
+
h2 { margin-top: 1.6em; color: #333; }
|
| 311 |
+
table { width: 100%; border-collapse: collapse; margin: 1em 0; }
|
| 312 |
+
th, td { padding: .5em .8em; text-align: left; border-bottom: 1px solid #ddd; }
|
| 313 |
+
th { background: #f3f3f3; }
|
| 314 |
+
td.num, th.num { text-align: right; font-variant-numeric: tabular-nums; }
|
| 315 |
+
tr.regression td { background: #fef0f0; }
|
| 316 |
+
tr.improvement td { background: #f0fef2; }
|
| 317 |
+
.delta-pos { color: #b0322a; font-weight: 600; }
|
| 318 |
+
.delta-neg { color: #1b8a3a; font-weight: 600; }
|
| 319 |
+
.badge { display: inline-block; padding: .15em .55em; border-radius: 4px;
|
| 320 |
+
font-size: .8em; font-weight: 600; }
|
| 321 |
+
.badge.reg { background: #fde2e0; color: #8a1c14; }
|
| 322 |
+
.badge.imp { background: #e0f8e6; color: #0a5e22; }
|
| 323 |
+
.meta { color: #666; font-size: .9em; }
|
| 324 |
+
.empty { color: #999; font-style: italic; }
|
| 325 |
+
</style>
|
| 326 |
+
</head>
|
| 327 |
+
<body>
|
| 328 |
+
<h1>Comparaison : {{ diff.label_a }} → {{ diff.label_b }}</h1>
|
| 329 |
+
<p class="meta">
|
| 330 |
+
Run A : {{ diff.run_date_a or "?" }} · corpus {{ diff.corpus_a or "?" }}<br>
|
| 331 |
+
Run B : {{ diff.run_date_b or "?" }} · corpus {{ diff.corpus_b or "?" }}<br>
|
| 332 |
+
Seuil régression / amélioration : {{ "%.3f"|format(diff.threshold) }}
|
| 333 |
+
({{ "%.1f"|format(diff.threshold * 100) }} pp de CER absolu).
|
| 334 |
+
</p>
|
| 335 |
+
|
| 336 |
+
<h2>Moteurs comparés ({{ diff.deltas|length }})</h2>
|
| 337 |
+
{% if not diff.deltas %}
|
| 338 |
+
<p class="empty">Aucun moteur commun aux deux runs.</p>
|
| 339 |
+
{% else %}
|
| 340 |
+
<table>
|
| 341 |
+
<thead>
|
| 342 |
+
<tr>
|
| 343 |
+
<th scope=\"col\">Moteur</th>
|
| 344 |
+
<th scope=\"col\" class="num">CER A</th>
|
| 345 |
+
<th scope=\"col\" class="num">CER B</th>
|
| 346 |
+
<th scope=\"col\" class="num">Δ CER</th>
|
| 347 |
+
<th scope=\"col\" class="num">Docs A → B</th>
|
| 348 |
+
<th scope=\"col\">État</th>
|
| 349 |
+
</tr>
|
| 350 |
+
</thead>
|
| 351 |
+
<tbody>
|
| 352 |
+
{% for d in diff.deltas %}
|
| 353 |
+
<tr class="{% if d.is_regression %}regression{% elif d.is_improvement %}improvement{% endif %}">
|
| 354 |
+
<td>{{ d.engine }}</td>
|
| 355 |
+
<td class="num">{{ "%.3f"|format(d.cer_a) if d.cer_a is not none else "—" }}</td>
|
| 356 |
+
<td class="num">{{ "%.3f"|format(d.cer_b) if d.cer_b is not none else "—" }}</td>
|
| 357 |
+
<td class="num">
|
| 358 |
+
{% if d.delta_cer is none %}—
|
| 359 |
+
{% elif d.delta_cer > 0 %}<span class="delta-pos">+{{ "%.3f"|format(d.delta_cer) }}</span>
|
| 360 |
+
{% else %}<span class="delta-neg">{{ "%.3f"|format(d.delta_cer) }}</span>
|
| 361 |
+
{% endif %}
|
| 362 |
+
</td>
|
| 363 |
+
<td class="num">{{ d.docs_a }} → {{ d.docs_b }}</td>
|
| 364 |
+
<td>
|
| 365 |
+
{% if d.is_regression %}<span class="badge reg">régression</span>
|
| 366 |
+
{% elif d.is_improvement %}<span class="badge imp">amélioration</span>
|
| 367 |
+
{% else %}<span class="meta">stable</span>{% endif %}
|
| 368 |
+
</td>
|
| 369 |
+
</tr>
|
| 370 |
+
{% endfor %}
|
| 371 |
+
</tbody>
|
| 372 |
+
</table>
|
| 373 |
+
{% endif %}
|
| 374 |
+
|
| 375 |
+
{% if diff.only_in_a %}
|
| 376 |
+
<h2>Présents uniquement dans A</h2>
|
| 377 |
+
<ul>{% for n in diff.only_in_a %}<li>{{ n }}</li>{% endfor %}</ul>
|
| 378 |
+
{% endif %}
|
| 379 |
+
|
| 380 |
+
{% if diff.only_in_b %}
|
| 381 |
+
<h2>Présents uniquement dans B</h2>
|
| 382 |
+
<ul>{% for n in diff.only_in_b %}<li>{{ n }}</li>{% endfor %}</ul>
|
| 383 |
+
{% endif %}
|
| 384 |
+
|
| 385 |
+
<p class="meta">Picarones — Sprint 28 · rapport de comparaison de runs.</p>
|
| 386 |
+
</body>
|
| 387 |
+
</html>
|
| 388 |
+
"""
|
| 389 |
+
|
| 390 |
+
|
| 391 |
+
def render_comparison_html(
|
| 392 |
+
diff: ComparisonResult,
|
| 393 |
+
output_path: str | Path,
|
| 394 |
+
) -> Path:
|
| 395 |
+
"""Sérialise un ``ComparisonResult`` en rapport HTML auto-contenu."""
|
| 396 |
+
from jinja2 import Environment, select_autoescape
|
| 397 |
+
|
| 398 |
+
env = Environment(autoescape=select_autoescape(["html", "j2"]))
|
| 399 |
+
template = env.from_string(_COMPARISON_TEMPLATE)
|
| 400 |
+
html = template.render(diff=diff)
|
| 401 |
+
out = Path(output_path)
|
| 402 |
+
out.parent.mkdir(parents=True, exist_ok=True)
|
| 403 |
+
out.write_text(html, encoding="utf-8")
|
| 404 |
+
return out
|
| 405 |
+
|
| 406 |
+
|
| 407 |
+
__all__ = [
|
| 408 |
+
"EngineDelta",
|
| 409 |
+
"ComparisonResult",
|
| 410 |
+
"load_benchmark_json",
|
| 411 |
+
"compare_benchmarks",
|
| 412 |
+
"detect_regressions",
|
| 413 |
+
"render_comparison_html",
|
| 414 |
+
]
|
|
@@ -0,0 +1,132 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Construction du dict de données consommé par le template Jinja.
|
| 2 |
+
|
| 3 |
+
Avant le découpage, ``picarones.report.generator._build_report_data``
|
| 4 |
+
faisait 463 lignes pour transformer un :class:`BenchmarkResult` en
|
| 5 |
+
dict prêt pour Jinja. Cette fonction empilait par sprint des blocs
|
| 6 |
+
indépendants — engines, documents, statistiques, scatter plots,
|
| 7 |
+
front Pareto, etc.
|
| 8 |
+
|
| 9 |
+
Ce sous-package éclate la construction en modules thématiques :
|
| 10 |
+
|
| 11 |
+
- :mod:`engines` — résumé par moteur (``engines_summary``).
|
| 12 |
+
- :mod:`documents` — vue galerie + détail + difficulté Sprint 7.
|
| 13 |
+
- :mod:`statistics` — Wilcoxon, Friedman, Nemenyi, bootstrap CIs,
|
| 14 |
+
reliability curves, Venn, error clusters, corrélations.
|
| 15 |
+
- :mod:`scatter` — Sprint 10 : Gini vs CER, ratio vs anchor.
|
| 16 |
+
- :mod:`pareto` — Sprint 19 : 3 fronts Pareto + métadonnées pricing.
|
| 17 |
+
Expose deux fonctions séparées : :func:`attach_engine_costs`
|
| 18 |
+
(mute) et :func:`build_pareto_section` (pure).
|
| 19 |
+
|
| 20 |
+
L'API publique :func:`build_report_data` orchestre ces modules dans
|
| 21 |
+
le bon ordre. La séquence Pareto en deux temps
|
| 22 |
+
(``attach_engine_costs`` → ``build_pareto_section``) rend la
|
| 23 |
+
mutation explicite — les fonctions ``build_*`` du sous-package
|
| 24 |
+
sont pures sauf ``attach_engine_costs`` dont le nom le dit.
|
| 25 |
+
"""
|
| 26 |
+
|
| 27 |
+
from __future__ import annotations
|
| 28 |
+
|
| 29 |
+
from typing import TYPE_CHECKING
|
| 30 |
+
|
| 31 |
+
if TYPE_CHECKING:
|
| 32 |
+
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 33 |
+
|
| 34 |
+
from picarones.reports_v2.html.data.documents import (
|
| 35 |
+
annotate_documents_with_difficulty,
|
| 36 |
+
build_documents,
|
| 37 |
+
)
|
| 38 |
+
from picarones.reports_v2.html.data.engines import build_engines_summary
|
| 39 |
+
from picarones.reports_v2.html.data.extra_metrics import (
|
| 40 |
+
compute_marginal_cost_section,
|
| 41 |
+
compute_rare_token_recall_per_engine,
|
| 42 |
+
compute_taxonomy_cooccurrence_section,
|
| 43 |
+
compute_taxonomy_intra_doc_section,
|
| 44 |
+
)
|
| 45 |
+
from picarones.reports_v2.html.data.pareto import (
|
| 46 |
+
attach_engine_costs,
|
| 47 |
+
build_pareto_section,
|
| 48 |
+
)
|
| 49 |
+
from picarones.reports_v2.html.data.scatter import (
|
| 50 |
+
build_gini_vs_cer,
|
| 51 |
+
build_ratio_vs_anchor,
|
| 52 |
+
)
|
| 53 |
+
from picarones.reports_v2.html.data.statistics import (
|
| 54 |
+
build_bootstrap_cis,
|
| 55 |
+
build_correlation_per_engine,
|
| 56 |
+
build_error_clusters,
|
| 57 |
+
build_friedman_and_nemenyi,
|
| 58 |
+
build_pairwise_wilcoxon,
|
| 59 |
+
build_reliability_curves,
|
| 60 |
+
build_venn_data,
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def build_report_data(
|
| 65 |
+
benchmark: "BenchmarkResult", images_b64: dict[str, str],
|
| 66 |
+
) -> dict:
|
| 67 |
+
"""Transforme un :class:`BenchmarkResult` en dict pour le rapport HTML.
|
| 68 |
+
|
| 69 |
+
Ordre critique :
|
| 70 |
+
|
| 71 |
+
1. Construire ``engines_summary`` (pur).
|
| 72 |
+
2. Construire ``documents`` puis annoter avec la difficulté (mute
|
| 73 |
+
``documents``).
|
| 74 |
+
3. **Attacher** les coûts à ``engines_summary`` (mute, nom
|
| 75 |
+
explicite).
|
| 76 |
+
4. **Construire** le bloc Pareto (pure, lit les coûts attachés).
|
| 77 |
+
"""
|
| 78 |
+
engines_summary = build_engines_summary(benchmark)
|
| 79 |
+
documents = build_documents(benchmark, images_b64)
|
| 80 |
+
annotate_documents_with_difficulty(benchmark, documents)
|
| 81 |
+
|
| 82 |
+
attach_engine_costs(engines_summary, benchmark)
|
| 83 |
+
pareto_data = build_pareto_section(engines_summary)
|
| 84 |
+
|
| 85 |
+
return {
|
| 86 |
+
"meta": {
|
| 87 |
+
"corpus_name": benchmark.corpus_name,
|
| 88 |
+
"corpus_source": benchmark.corpus_source,
|
| 89 |
+
"document_count": benchmark.document_count,
|
| 90 |
+
"run_date": benchmark.run_date,
|
| 91 |
+
"picarones_version": benchmark.picarones_version,
|
| 92 |
+
"metadata": benchmark.metadata,
|
| 93 |
+
},
|
| 94 |
+
"ranking": benchmark.ranking(),
|
| 95 |
+
"engines": engines_summary,
|
| 96 |
+
"documents": documents,
|
| 97 |
+
# Sprint 7
|
| 98 |
+
"statistics": {
|
| 99 |
+
"pairwise_wilcoxon": build_pairwise_wilcoxon(benchmark),
|
| 100 |
+
"bootstrap_cis": build_bootstrap_cis(benchmark),
|
| 101 |
+
**build_friedman_and_nemenyi(benchmark),
|
| 102 |
+
},
|
| 103 |
+
"reliability_curves": build_reliability_curves(benchmark),
|
| 104 |
+
"venn_data": build_venn_data(benchmark),
|
| 105 |
+
"error_clusters": build_error_clusters(benchmark),
|
| 106 |
+
"correlation_per_engine": build_correlation_per_engine(benchmark),
|
| 107 |
+
# Sprint 10
|
| 108 |
+
"gini_vs_cer": build_gini_vs_cer(benchmark),
|
| 109 |
+
"ratio_vs_anchor": build_ratio_vs_anchor(benchmark),
|
| 110 |
+
# Sprint 19 — vue Pareto coût/qualité avec variantes d'axe
|
| 111 |
+
"pareto": pareto_data,
|
| 112 |
+
# Sprint 36 — analyse inter-moteurs (divergence taxonomique +
|
| 113 |
+
# complémentarité / oracle). ``None`` si moins de 2 moteurs.
|
| 114 |
+
"inter_engine_analysis": benchmark.inter_engine_analysis,
|
| 115 |
+
# Sprint 45-46 — stratification par script_type
|
| 116 |
+
"available_strata": benchmark.available_strata(),
|
| 117 |
+
"stratified_ranking": benchmark.stratified_ranking() or None,
|
| 118 |
+
"corpus_homogeneity": benchmark.corpus_homogeneity(),
|
| 119 |
+
# Sprint « câblage des modules test-only » (mai 2026) — métriques
|
| 120 |
+
# corpus-wide qui jusque-là n'étaient pas remontées dans le rapport.
|
| 121 |
+
# Sprint 71 (A.I.1) : recall sur tokens rares (hapax + dis legomena).
|
| 122 |
+
"rare_token_recall": compute_rare_token_recall_per_engine(benchmark),
|
| 123 |
+
# Sprint 75 (A.I.4) : co-occurrence taxonomique inter-classes.
|
| 124 |
+
"taxonomy_cooccurrence": compute_taxonomy_cooccurrence_section(benchmark),
|
| 125 |
+
# Sprint 76 (A.I.4) : heatmap class × position (intra-document).
|
| 126 |
+
"taxonomy_intra_doc": compute_taxonomy_intra_doc_section(benchmark),
|
| 127 |
+
# Sprint 91 (A.II.6) : matrice de coût marginal entre paires de moteurs.
|
| 128 |
+
"marginal_cost": compute_marginal_cost_section(engines_summary),
|
| 129 |
+
}
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
__all__ = ["build_report_data"]
|
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Helpers numériques internes au sous-package report_data.
|
| 2 |
+
|
| 3 |
+
Petites fonctions utilitaires partagées par tous les builders de
|
| 4 |
+
sections (engines, documents, statistics, scatter, pareto). Ne pas
|
| 5 |
+
importer depuis l'extérieur du sous-package — ces helpers sont
|
| 6 |
+
spécifiques aux conventions du dict JSON consommé par le template.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
from __future__ import annotations
|
| 10 |
+
|
| 11 |
+
from typing import Optional
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def safe_round(v: Optional[float], decimals: int = 4) -> float:
|
| 15 |
+
"""Arrondit un float optionnel ; ``None`` devient ``0.0``."""
|
| 16 |
+
return round(v or 0.0, decimals)
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def percent_string(v: Optional[float], decimals: int = 2) -> str:
|
| 20 |
+
"""Formate un ratio ∈ [0, 1] en chaîne pourcentage : ``0.4723 → "47.23 %"``.
|
| 21 |
+
|
| 22 |
+
``None`` → ``"—"``. Conservé pour rétrocompat avec d'éventuels
|
| 23 |
+
callers externes (Sprint 7 historique).
|
| 24 |
+
"""
|
| 25 |
+
if v is None:
|
| 26 |
+
return "—"
|
| 27 |
+
return f"{v * 100:.{decimals}f} %"
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
__all__ = ["safe_round", "percent_string"]
|
|
@@ -0,0 +1,167 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Construction de la liste ``documents`` (vue galerie + vue détail).
|
| 2 |
+
|
| 3 |
+
Pour chaque document du corpus, agrège les hypothèses de tous les
|
| 4 |
+
moteurs avec leurs métriques, le diff caractère par caractère, et
|
| 5 |
+
les champs spécifiques aux pipelines OCR+LLM (intermédiaire, mode,
|
| 6 |
+
sur-normalisation).
|
| 7 |
+
|
| 8 |
+
:func:`annotate_documents_with_difficulty` enrichit ensuite chaque
|
| 9 |
+
document avec son score de difficulté intrinsèque (Sprint 7).
|
| 10 |
+
"""
|
| 11 |
+
|
| 12 |
+
from __future__ import annotations
|
| 13 |
+
|
| 14 |
+
from typing import TYPE_CHECKING
|
| 15 |
+
|
| 16 |
+
from picarones.evaluation import compute_char_diff, compute_word_diff
|
| 17 |
+
from picarones.evaluation.metrics.difficulty import (
|
| 18 |
+
compute_all_difficulties,
|
| 19 |
+
difficulty_label,
|
| 20 |
+
)
|
| 21 |
+
from picarones.reports_v2.html.data._helpers import safe_round
|
| 22 |
+
|
| 23 |
+
if TYPE_CHECKING:
|
| 24 |
+
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
def build_documents(
|
| 28 |
+
benchmark: "BenchmarkResult", images_b64: dict[str, str],
|
| 29 |
+
) -> list[dict]:
|
| 30 |
+
"""Retourne la liste ordonnée des documents prêts pour le template.
|
| 31 |
+
|
| 32 |
+
L'ordre des documents préserve l'ordre d'apparition (premier moteur
|
| 33 |
+
d'abord, puis compléments depuis les moteurs suivants si certains
|
| 34 |
+
documents ne sont pas couverts par tous les moteurs).
|
| 35 |
+
"""
|
| 36 |
+
seen_doc_ids: set[str] = set()
|
| 37 |
+
doc_ids_ordered: list[str] = []
|
| 38 |
+
for report in benchmark.engine_reports:
|
| 39 |
+
for dr in report.document_results:
|
| 40 |
+
if dr.doc_id not in seen_doc_ids:
|
| 41 |
+
seen_doc_ids.add(dr.doc_id)
|
| 42 |
+
doc_ids_ordered.append(dr.doc_id)
|
| 43 |
+
|
| 44 |
+
# Index croisé : doc_id → {engine_name → DocumentResult}
|
| 45 |
+
doc_engine_map: dict[str, dict] = {did: {} for did in doc_ids_ordered}
|
| 46 |
+
for report in benchmark.engine_reports:
|
| 47 |
+
for dr in report.document_results:
|
| 48 |
+
doc_engine_map.setdefault(dr.doc_id, {})[report.engine_name] = dr
|
| 49 |
+
|
| 50 |
+
documents: list[dict] = []
|
| 51 |
+
engine_names = [r.engine_name for r in benchmark.engine_reports]
|
| 52 |
+
for doc_id in doc_ids_ordered:
|
| 53 |
+
engine_results: list[dict] = []
|
| 54 |
+
gt = ""
|
| 55 |
+
image_path = ""
|
| 56 |
+
for engine_name in engine_names:
|
| 57 |
+
dr = doc_engine_map[doc_id].get(engine_name)
|
| 58 |
+
if dr is None:
|
| 59 |
+
continue
|
| 60 |
+
gt = dr.ground_truth
|
| 61 |
+
image_path = dr.image_path
|
| 62 |
+
er_entry = _build_engine_result_entry(engine_name, dr)
|
| 63 |
+
engine_results.append(er_entry)
|
| 64 |
+
|
| 65 |
+
# CER moyen sur ce document (pour le badge galerie)
|
| 66 |
+
cer_values = [er["cer"] for er in engine_results if er["error"] is None]
|
| 67 |
+
mean_cer = sum(cer_values) / len(cer_values) if cer_values else 1.0
|
| 68 |
+
best_engine = min(engine_results, key=lambda x: x["cer"], default=None)
|
| 69 |
+
|
| 70 |
+
# Script type (depuis metadata par document si disponible)
|
| 71 |
+
script_type = ""
|
| 72 |
+
first_engine = engine_names[0] if engine_names else None
|
| 73 |
+
first_dr = doc_engine_map[doc_id].get(first_engine)
|
| 74 |
+
if first_dr and first_dr.image_quality:
|
| 75 |
+
script_type = first_dr.image_quality.get("script_type", "")
|
| 76 |
+
|
| 77 |
+
documents.append({
|
| 78 |
+
"doc_id": doc_id,
|
| 79 |
+
"image_path": image_path,
|
| 80 |
+
"image_b64": images_b64.get(doc_id, ""),
|
| 81 |
+
"ground_truth": gt,
|
| 82 |
+
"mean_cer": safe_round(mean_cer),
|
| 83 |
+
"best_engine": best_engine["engine"] if best_engine else "",
|
| 84 |
+
"engine_results": engine_results,
|
| 85 |
+
"script_type": script_type,
|
| 86 |
+
})
|
| 87 |
+
return documents
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
def _build_engine_result_entry(engine_name: str, dr) -> dict:
|
| 91 |
+
"""Construit une entrée moteur pour un document donné (extrait pour lisibilité)."""
|
| 92 |
+
diff_ops = compute_char_diff(dr.ground_truth, dr.hypothesis)
|
| 93 |
+
er_entry: dict = {
|
| 94 |
+
"engine": engine_name,
|
| 95 |
+
"hypothesis": dr.hypothesis,
|
| 96 |
+
"cer": safe_round(dr.metrics.cer),
|
| 97 |
+
"cer_diplomatic": safe_round(dr.metrics.cer_diplomatic) if dr.metrics.cer_diplomatic is not None else None,
|
| 98 |
+
"wer": safe_round(dr.metrics.wer),
|
| 99 |
+
"mer": safe_round(dr.metrics.mer),
|
| 100 |
+
"wil": safe_round(dr.metrics.wil),
|
| 101 |
+
"duration": dr.duration_seconds,
|
| 102 |
+
"error": dr.engine_error,
|
| 103 |
+
"diff": diff_ops,
|
| 104 |
+
}
|
| 105 |
+
# Champs spécifiques aux pipelines OCR+LLM
|
| 106 |
+
if dr.ocr_intermediate is not None:
|
| 107 |
+
er_entry["ocr_intermediate"] = dr.ocr_intermediate
|
| 108 |
+
er_entry["ocr_diff"] = compute_word_diff(dr.ground_truth, dr.ocr_intermediate)
|
| 109 |
+
er_entry["llm_correction_diff"] = compute_word_diff(dr.ocr_intermediate, dr.hypothesis)
|
| 110 |
+
if dr.pipeline_metadata:
|
| 111 |
+
on = dr.pipeline_metadata.get("over_normalization")
|
| 112 |
+
if on is not None:
|
| 113 |
+
er_entry["over_normalization"] = on
|
| 114 |
+
er_entry["pipeline_mode"] = dr.pipeline_metadata.get("pipeline_mode")
|
| 115 |
+
# Sprint 5 — métriques avancées par document
|
| 116 |
+
if dr.char_scores is not None:
|
| 117 |
+
er_entry["ligature_score"] = safe_round(dr.char_scores.get("ligature", {}).get("score"))
|
| 118 |
+
er_entry["diacritic_score"] = safe_round(dr.char_scores.get("diacritic", {}).get("score"))
|
| 119 |
+
if dr.taxonomy is not None:
|
| 120 |
+
er_entry["taxonomy"] = dr.taxonomy
|
| 121 |
+
if dr.structure is not None:
|
| 122 |
+
er_entry["structure"] = dr.structure
|
| 123 |
+
if dr.image_quality is not None:
|
| 124 |
+
er_entry["image_quality"] = dr.image_quality
|
| 125 |
+
# Sprint 10
|
| 126 |
+
if dr.line_metrics is not None:
|
| 127 |
+
er_entry["line_metrics"] = dr.line_metrics
|
| 128 |
+
if dr.hallucination_metrics is not None:
|
| 129 |
+
er_entry["hallucination_metrics"] = dr.hallucination_metrics
|
| 130 |
+
return er_entry
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
def annotate_documents_with_difficulty(
|
| 134 |
+
benchmark: "BenchmarkResult", documents: list[dict],
|
| 135 |
+
) -> None:
|
| 136 |
+
"""Annote chaque document du dict avec son score de difficulté (Sprint 7).
|
| 137 |
+
|
| 138 |
+
Modifie ``documents`` en place. Les valeurs par défaut ``0.5`` /
|
| 139 |
+
``"Modéré"`` sont retournées si la difficulté n'a pas pu être
|
| 140 |
+
calculée (par exemple corpus dégénéré).
|
| 141 |
+
"""
|
| 142 |
+
doc_ids_ordered = [d["doc_id"] for d in documents]
|
| 143 |
+
gt_map = {d["doc_id"]: d["ground_truth"] for d in documents}
|
| 144 |
+
cer_map: dict[str, dict[str, float]] = {d["doc_id"]: {} for d in documents}
|
| 145 |
+
iq_map: dict[str, float] = {}
|
| 146 |
+
for report in benchmark.engine_reports:
|
| 147 |
+
for dr in report.document_results:
|
| 148 |
+
cer_map.setdefault(dr.doc_id, {})[report.engine_name] = safe_round(dr.metrics.cer)
|
| 149 |
+
if dr.image_quality and "quality_score" in dr.image_quality:
|
| 150 |
+
iq_map[dr.doc_id] = dr.image_quality["quality_score"]
|
| 151 |
+
difficulty_scores = compute_all_difficulties(
|
| 152 |
+
doc_ids=doc_ids_ordered,
|
| 153 |
+
ground_truths=gt_map,
|
| 154 |
+
cer_map=cer_map,
|
| 155 |
+
image_quality_map=iq_map or None,
|
| 156 |
+
)
|
| 157 |
+
for doc in documents:
|
| 158 |
+
ds = difficulty_scores.get(doc["doc_id"])
|
| 159 |
+
if ds:
|
| 160 |
+
doc["difficulty_score"] = safe_round(ds.score)
|
| 161 |
+
doc["difficulty_label"] = difficulty_label(ds.score)
|
| 162 |
+
else:
|
| 163 |
+
doc["difficulty_score"] = 0.5
|
| 164 |
+
doc["difficulty_label"] = "Modéré"
|
| 165 |
+
|
| 166 |
+
|
| 167 |
+
__all__ = ["build_documents", "annotate_documents_with_difficulty"]
|
|
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Construction du résumé par moteur (``engines_summary``).
|
| 2 |
+
|
| 3 |
+
Pour chaque ``EngineReport``, accumule métriques agrégées (CER, WER,
|
| 4 |
+
MER, WIL), distribution CER pour l'histogramme, métriques avancées
|
| 5 |
+
patrimoniales (Sprint 5), distribution d'erreurs (Sprint 10), NER
|
| 6 |
+
(Sprint 41), calibration (Sprint 43), profil philologique (Sprint
|
| 7 |
+
62), recherchabilité + séquences numériques (Sprint 86), lisibilité
|
| 8 |
+
(Sprint 87) et indicateurs pipeline OCR+LLM.
|
| 9 |
+
|
| 10 |
+
Les coûts (durée moyenne, prix par 1k pages, CO₂) sont ajoutés
|
| 11 |
+
ultérieurement par :mod:`picarones.report.report_data.pareto` qui
|
| 12 |
+
en a besoin pour calculer les fronts.
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
from __future__ import annotations
|
| 16 |
+
|
| 17 |
+
from typing import TYPE_CHECKING
|
| 18 |
+
|
| 19 |
+
from picarones.reports_v2.html.data._helpers import safe_round
|
| 20 |
+
|
| 21 |
+
if TYPE_CHECKING:
|
| 22 |
+
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
def build_engines_summary(benchmark: "BenchmarkResult") -> list[dict]:
|
| 26 |
+
"""Retourne la liste des dicts moteur, une entrée par ``EngineReport``."""
|
| 27 |
+
engines_summary: list[dict] = []
|
| 28 |
+
for report in benchmark.engine_reports:
|
| 29 |
+
agg = report.aggregated_metrics
|
| 30 |
+
diplo_agg = agg.get("cer_diplomatic", {})
|
| 31 |
+
|
| 32 |
+
line_metrics = report.aggregated_line_metrics
|
| 33 |
+
halluc = report.aggregated_hallucination
|
| 34 |
+
|
| 35 |
+
entry: dict = {
|
| 36 |
+
"name": report.engine_name,
|
| 37 |
+
"version": report.engine_version,
|
| 38 |
+
"cer": safe_round(agg.get("cer", {}).get("mean")),
|
| 39 |
+
"wer": safe_round(agg.get("wer", {}).get("mean")),
|
| 40 |
+
"mer": safe_round(agg.get("mer", {}).get("mean")),
|
| 41 |
+
"wil": safe_round(agg.get("wil", {}).get("mean")),
|
| 42 |
+
"cer_median": safe_round(agg.get("cer", {}).get("median")),
|
| 43 |
+
"cer_min": safe_round(agg.get("cer", {}).get("min")),
|
| 44 |
+
"cer_max": safe_round(agg.get("cer", {}).get("max")),
|
| 45 |
+
"doc_count": agg.get("document_count", 0),
|
| 46 |
+
"failed": agg.get("failed_count", 0),
|
| 47 |
+
# CER diplomatique (après normalisation historique : ſ=s, u=v, i=j…)
|
| 48 |
+
"cer_diplomatic": safe_round(diplo_agg.get("mean")) if diplo_agg else None,
|
| 49 |
+
"cer_diplomatic_profile": diplo_agg.get("profile"),
|
| 50 |
+
# Distribution pour l'histogramme : liste des CER individuels
|
| 51 |
+
"cer_values": [
|
| 52 |
+
safe_round(dr.metrics.cer)
|
| 53 |
+
for dr in report.document_results
|
| 54 |
+
if dr.metrics.error is None
|
| 55 |
+
],
|
| 56 |
+
"cer_diplomatic_values": [
|
| 57 |
+
safe_round(dr.metrics.cer_diplomatic)
|
| 58 |
+
for dr in report.document_results
|
| 59 |
+
if dr.metrics.error is None and dr.metrics.cer_diplomatic is not None
|
| 60 |
+
],
|
| 61 |
+
# Champs pipeline OCR+LLM (vides pour les moteurs OCR seuls)
|
| 62 |
+
"is_pipeline": report.is_pipeline,
|
| 63 |
+
"pipeline_info": report.pipeline_info,
|
| 64 |
+
# Sprint 5 — métriques avancées patrimoniales
|
| 65 |
+
"ligature_score": safe_round(report.ligature_score) if report.ligature_score is not None else None,
|
| 66 |
+
"diacritic_score": safe_round(report.diacritic_score) if report.diacritic_score is not None else None,
|
| 67 |
+
"aggregated_confusion": report.aggregated_confusion,
|
| 68 |
+
"aggregated_taxonomy": report.aggregated_taxonomy,
|
| 69 |
+
"aggregated_structure": report.aggregated_structure,
|
| 70 |
+
"aggregated_image_quality": report.aggregated_image_quality,
|
| 71 |
+
# Sprint 10 — distribution des erreurs + hallucinations VLM
|
| 72 |
+
"gini": safe_round(line_metrics.get("gini_mean")) if line_metrics else None,
|
| 73 |
+
"cer_p90": safe_round(line_metrics.get("percentiles", {}).get("p90")) if line_metrics else None,
|
| 74 |
+
"cer_p99": safe_round(line_metrics.get("percentiles", {}).get("p99")) if line_metrics else None,
|
| 75 |
+
"catastrophic_rate_30": safe_round(line_metrics.get("catastrophic_rate", {}).get("0.3")) if line_metrics else None,
|
| 76 |
+
"aggregated_line_metrics": line_metrics,
|
| 77 |
+
"anchor_score": safe_round(halluc.get("anchor_score_mean")) if halluc else None,
|
| 78 |
+
"length_ratio": safe_round(halluc.get("length_ratio_mean")) if halluc else None,
|
| 79 |
+
"hallucinating_doc_rate": safe_round(halluc.get("hallucinating_doc_rate")) if halluc else None,
|
| 80 |
+
"aggregated_hallucination": halluc,
|
| 81 |
+
# Sprint 41 — NER agrégé (None si aucun calcul effectué)
|
| 82 |
+
"aggregated_ner": report.aggregated_ner,
|
| 83 |
+
# Sprint 43 — calibration agrégée (None si aucune confidence
|
| 84 |
+
# n'a été exposée par le moteur sur ce corpus)
|
| 85 |
+
"aggregated_calibration": report.aggregated_calibration,
|
| 86 |
+
# Sprint 62 — profil philologique agrégé (None si aucun
|
| 87 |
+
# signal philologique sur le corpus pour ce moteur)
|
| 88 |
+
"aggregated_philological": report.aggregated_philological,
|
| 89 |
+
# Sprint 86 — A.II.5 (recherchabilité fuzzy + séquences
|
| 90 |
+
# numériques). None si aucun document n'a de signal.
|
| 91 |
+
"aggregated_searchability": report.aggregated_searchability,
|
| 92 |
+
"aggregated_numerical_sequences": (
|
| 93 |
+
report.aggregated_numerical_sequences
|
| 94 |
+
),
|
| 95 |
+
# Sprint 87 — A.II.2 (delta Flesch agrégé)
|
| 96 |
+
"aggregated_readability": report.aggregated_readability,
|
| 97 |
+
"is_vlm": report.pipeline_info.get("is_vlm", False) if report.pipeline_info else False,
|
| 98 |
+
}
|
| 99 |
+
engines_summary.append(entry)
|
| 100 |
+
return engines_summary
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
__all__ = ["build_engines_summary"]
|
|
@@ -0,0 +1,272 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Métriques additionnelles consommées par le rapport HTML.
|
| 2 |
+
|
| 3 |
+
Sprint « câblage des modules test-only » (mai 2026) : intègre dans le
|
| 4 |
+
flux de génération du rapport des modules de mesure qui jusque-là
|
| 5 |
+
n'étaient appelés par aucun consommateur en production. Concrètement :
|
| 6 |
+
|
| 7 |
+
- :func:`compute_rare_token_recall_per_engine` — Sprint 71 (A.I.1) :
|
| 8 |
+
recall sur tokens rares (hapax + dis legomena) corpus-wide. Discrimine
|
| 9 |
+
un OCR qui rate les noms propres rares (critique pour l'indexation
|
| 10 |
+
prosopographique).
|
| 11 |
+
- :func:`compute_taxonomy_cooccurrence_section` — Sprint 75 (A.I.4
|
| 12 |
+
chantier 1) : indice de Jaccard inter-classes au niveau document.
|
| 13 |
+
- :func:`compute_taxonomy_intra_doc_section` — Sprint 76 (A.I.4
|
| 14 |
+
chantier 2) : heatmap class × position pour repérer les zones
|
| 15 |
+
concentrées d'erreur.
|
| 16 |
+
- :func:`compute_marginal_cost_section` — Sprint 91 (A.II.6) : coût
|
| 17 |
+
marginal d'un moteur B vs A par erreur évitée.
|
| 18 |
+
|
| 19 |
+
Toutes les fonctions sont **pures** (pas de mutation in-place) et
|
| 20 |
+
retournent ``None`` ou un dict vide quand les pré-requis ne sont pas
|
| 21 |
+
réunis (corpus vide, taxonomy absente, etc.) — pattern adaptive masking.
|
| 22 |
+
"""
|
| 23 |
+
|
| 24 |
+
from __future__ import annotations
|
| 25 |
+
|
| 26 |
+
from typing import TYPE_CHECKING, Optional
|
| 27 |
+
|
| 28 |
+
from picarones.evaluation.metrics.marginal_cost import compute_marginal_cost_matrix
|
| 29 |
+
from picarones.evaluation.metrics.rare_tokens import (
|
| 30 |
+
compute_rare_token_recall,
|
| 31 |
+
extract_rare_tokens,
|
| 32 |
+
)
|
| 33 |
+
from picarones.evaluation.metrics.taxonomy_cooccurrence import (
|
| 34 |
+
compute_taxonomy_cooccurrence,
|
| 35 |
+
)
|
| 36 |
+
from picarones.evaluation.metrics.taxonomy_intra_doc import (
|
| 37 |
+
compute_taxonomy_position_heatmap,
|
| 38 |
+
)
|
| 39 |
+
|
| 40 |
+
if TYPE_CHECKING:
|
| 41 |
+
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
# ──────────────────────────────────────────────────────────────────
|
| 45 |
+
# Rare-token recall (Sprint 71)
|
| 46 |
+
# ──────────────────────────────────────────────────────────────────
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def compute_rare_token_recall_per_engine(
|
| 50 |
+
benchmark: "BenchmarkResult",
|
| 51 |
+
max_freq: int = 2,
|
| 52 |
+
) -> dict[str, dict]:
|
| 53 |
+
"""Recall corpus-wide sur les tokens rares pour chaque moteur.
|
| 54 |
+
|
| 55 |
+
Étapes :
|
| 56 |
+
1. Extraire les tokens rares du corpus (apparaissent ≤ ``max_freq``
|
| 57 |
+
fois dans toutes les GT).
|
| 58 |
+
2. Pour chaque moteur, calculer le recall moyen pondéré par doc.
|
| 59 |
+
|
| 60 |
+
Retour : ``{engine_name: {n_rare_tokens, n_recalled, recall, n_docs}}``,
|
| 61 |
+
vide si aucun moteur ou aucun token rare détecté.
|
| 62 |
+
"""
|
| 63 |
+
if not benchmark.engine_reports:
|
| 64 |
+
return {}
|
| 65 |
+
# Liste des GT du corpus (premier moteur fait foi).
|
| 66 |
+
gts = [
|
| 67 |
+
dr.ground_truth
|
| 68 |
+
for dr in benchmark.engine_reports[0].document_results
|
| 69 |
+
if dr.ground_truth
|
| 70 |
+
]
|
| 71 |
+
if not gts:
|
| 72 |
+
return {}
|
| 73 |
+
rare_tokens = extract_rare_tokens(gts, max_freq=max_freq)
|
| 74 |
+
if not rare_tokens:
|
| 75 |
+
return {}
|
| 76 |
+
|
| 77 |
+
out: dict[str, dict] = {}
|
| 78 |
+
for report in benchmark.engine_reports:
|
| 79 |
+
n_total_rare = 0
|
| 80 |
+
n_total_recalled = 0
|
| 81 |
+
n_docs = 0
|
| 82 |
+
for dr in report.document_results:
|
| 83 |
+
if dr.metrics.error is not None:
|
| 84 |
+
continue
|
| 85 |
+
metrics = compute_rare_token_recall(
|
| 86 |
+
dr.ground_truth, dr.hypothesis, rare_tokens,
|
| 87 |
+
)
|
| 88 |
+
n_total_rare += metrics["n_rare_tokens_in_reference"]
|
| 89 |
+
n_total_recalled += metrics["n_rare_tokens_recalled"]
|
| 90 |
+
n_docs += 1
|
| 91 |
+
recall = (
|
| 92 |
+
n_total_recalled / n_total_rare if n_total_rare > 0 else None
|
| 93 |
+
)
|
| 94 |
+
out[report.engine_name] = {
|
| 95 |
+
"n_rare_tokens": n_total_rare,
|
| 96 |
+
"n_recalled": n_total_recalled,
|
| 97 |
+
"recall": recall,
|
| 98 |
+
"n_docs": n_docs,
|
| 99 |
+
"max_freq": max_freq,
|
| 100 |
+
}
|
| 101 |
+
return out
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
# ──────────────────────────────────────────────────────────────────
|
| 105 |
+
# Co-occurrence taxonomique (Sprint 75)
|
| 106 |
+
# ──────────────────────────────────────────────────────────────────
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
def compute_taxonomy_cooccurrence_section(
|
| 110 |
+
benchmark: "BenchmarkResult",
|
| 111 |
+
) -> Optional[dict]:
|
| 112 |
+
"""Calcule la matrice de co-occurrence taxonomique corpus-wide.
|
| 113 |
+
|
| 114 |
+
Pour chaque document, on collecte l'union des classes d'erreur
|
| 115 |
+
apparues sur ce document tous moteurs confondus, puis on calcule
|
| 116 |
+
l'indice de Jaccard entre paires de classes au niveau corpus.
|
| 117 |
+
|
| 118 |
+
Retour : sortie de
|
| 119 |
+
:func:`picarones.measurements.taxonomy_cooccurrence.compute_taxonomy_cooccurrence`,
|
| 120 |
+
ou ``None`` si aucune classification taxonomique n'est disponible.
|
| 121 |
+
"""
|
| 122 |
+
# Map doc_id → index dans per_doc_classes pour merger correctement
|
| 123 |
+
# les classes des moteurs additionnels qui évaluent le même doc.
|
| 124 |
+
# **Bug évité** : ne PAS utiliser un set pour retrouver l'index — un
|
| 125 |
+
# set n'a pas d'ordre garanti, ``list(set).index(x)`` retourne un
|
| 126 |
+
# index qui ne correspond pas à la position dans la liste parallèle.
|
| 127 |
+
doc_id_to_idx: dict[str, int] = {}
|
| 128 |
+
per_doc_classes: list[set[str]] = []
|
| 129 |
+
|
| 130 |
+
for report in benchmark.engine_reports:
|
| 131 |
+
for dr in report.document_results:
|
| 132 |
+
if dr.taxonomy is None:
|
| 133 |
+
continue
|
| 134 |
+
classes = {
|
| 135 |
+
cls
|
| 136 |
+
for cls, count in (dr.taxonomy.get("counts") or {}).items()
|
| 137 |
+
if count > 0
|
| 138 |
+
}
|
| 139 |
+
if not classes:
|
| 140 |
+
continue
|
| 141 |
+
idx = doc_id_to_idx.get(dr.doc_id)
|
| 142 |
+
if idx is None:
|
| 143 |
+
doc_id_to_idx[dr.doc_id] = len(per_doc_classes)
|
| 144 |
+
per_doc_classes.append(classes)
|
| 145 |
+
else:
|
| 146 |
+
# Doc déjà vu (autre moteur) : merger les classes.
|
| 147 |
+
per_doc_classes[idx] |= classes
|
| 148 |
+
|
| 149 |
+
if not per_doc_classes:
|
| 150 |
+
return None
|
| 151 |
+
return compute_taxonomy_cooccurrence(per_doc_classes)
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
# ──────────────────────────────────────────────────────────────────
|
| 155 |
+
# Heatmap intra-document class × position (Sprint 76)
|
| 156 |
+
# ──────────────────────────────────────────────────────────────────
|
| 157 |
+
|
| 158 |
+
|
| 159 |
+
def compute_taxonomy_intra_doc_section(
|
| 160 |
+
benchmark: "BenchmarkResult",
|
| 161 |
+
n_bins: int = 10,
|
| 162 |
+
) -> Optional[dict]:
|
| 163 |
+
"""Heatmap agrégée class × position binnée sur l'ensemble du corpus.
|
| 164 |
+
|
| 165 |
+
Pour chaque doc unique on garde le heatmap calculé par le **premier**
|
| 166 |
+
moteur (déduplication : un même doc évalué par N moteurs ne compte
|
| 167 |
+
qu'une fois). Puis on somme par classe et bin de position.
|
| 168 |
+
|
| 169 |
+
Retourne un dict compatible avec
|
| 170 |
+
:func:`picarones.report.taxonomy_intra_doc_render.build_taxonomy_intra_doc_html`
|
| 171 |
+
(clés ``n_bins``, ``per_class``, ``total_errors``, ``n_words_gt``).
|
| 172 |
+
Retourne ``None`` si aucun document n'a de signal exploitable.
|
| 173 |
+
"""
|
| 174 |
+
aggregated: dict[str, list[int]] = {}
|
| 175 |
+
seen_doc_ids: set[str] = set()
|
| 176 |
+
total_errors = 0
|
| 177 |
+
n_words_gt = 0
|
| 178 |
+
|
| 179 |
+
for report in benchmark.engine_reports:
|
| 180 |
+
for dr in report.document_results:
|
| 181 |
+
if dr.doc_id in seen_doc_ids:
|
| 182 |
+
continue # déduplication : ne pas compter un doc 2 fois
|
| 183 |
+
if dr.metrics.error is not None or not dr.ground_truth:
|
| 184 |
+
continue
|
| 185 |
+
heatmap = compute_taxonomy_position_heatmap(
|
| 186 |
+
dr.ground_truth, dr.hypothesis, n_bins=n_bins,
|
| 187 |
+
)
|
| 188 |
+
if heatmap is None:
|
| 189 |
+
continue
|
| 190 |
+
seen_doc_ids.add(dr.doc_id)
|
| 191 |
+
n_words_gt += len(dr.ground_truth.split())
|
| 192 |
+
per_class = heatmap.get("per_class", {})
|
| 193 |
+
for cls, counts in per_class.items():
|
| 194 |
+
cls_total = sum(counts)
|
| 195 |
+
if cls_total == 0:
|
| 196 |
+
continue
|
| 197 |
+
total_errors += cls_total
|
| 198 |
+
if cls not in aggregated:
|
| 199 |
+
aggregated[cls] = [0] * n_bins
|
| 200 |
+
for i in range(n_bins):
|
| 201 |
+
aggregated[cls][i] += counts[i] if i < len(counts) else 0
|
| 202 |
+
|
| 203 |
+
if not aggregated:
|
| 204 |
+
return None
|
| 205 |
+
return {
|
| 206 |
+
"n_bins": n_bins,
|
| 207 |
+
"n_docs_with_data": len(seen_doc_ids),
|
| 208 |
+
"total_errors": total_errors,
|
| 209 |
+
"n_words_gt": n_words_gt,
|
| 210 |
+
"per_class": aggregated,
|
| 211 |
+
}
|
| 212 |
+
|
| 213 |
+
|
| 214 |
+
# ──────────────────────────────────────────────────────────────────
|
| 215 |
+
# Coût marginal inter-moteurs (Sprint 91)
|
| 216 |
+
# ──────────────────────────────────────────────────────────────────
|
| 217 |
+
|
| 218 |
+
|
| 219 |
+
def compute_marginal_cost_section(
|
| 220 |
+
engines_summary: list[dict],
|
| 221 |
+
) -> Optional[list[dict]]:
|
| 222 |
+
"""Matrice de coût marginal entre paires de moteurs.
|
| 223 |
+
|
| 224 |
+
Lit ``cost`` (attaché par :func:`attach_engine_costs`) et estime
|
| 225 |
+
le nombre d'erreurs. Pour chaque paire ``A → B``, calcule le coût
|
| 226 |
+
additionnel par erreur évitée.
|
| 227 |
+
|
| 228 |
+
**Note d'estimation** : le nombre d'erreurs est dérivé de
|
| 229 |
+
``cer × n_caractères_corpus`` quand la longueur moyenne de doc
|
| 230 |
+
est disponible, sinon repli sur ``cer × 1000`` (proxy pour
|
| 231 |
+
1000 caractères standardisés). Les coûts marginaux affichés sont
|
| 232 |
+
des estimations pessimistes — pour un benchmark de corpus
|
| 233 |
+
homogène, l'ordonnancement est fiable ; pour un mix de
|
| 234 |
+
types de documents, à interpréter avec prudence.
|
| 235 |
+
|
| 236 |
+
Retour : liste de dicts (sortie ``["pairs"]`` de
|
| 237 |
+
:func:`compute_marginal_cost_matrix`) triée par coût marginal
|
| 238 |
+
croissant, ou ``None`` si moins de 2 moteurs ont des données
|
| 239 |
+
coût + erreur exploitables.
|
| 240 |
+
"""
|
| 241 |
+
per_engine: dict[str, dict] = {}
|
| 242 |
+
for entry in engines_summary:
|
| 243 |
+
cost = entry.get("cost") or {}
|
| 244 |
+
cost_per_1k = cost.get("cost_per_1k_pages_eur")
|
| 245 |
+
cer = entry.get("cer")
|
| 246 |
+
doc_count = entry.get("doc_count") or 0
|
| 247 |
+
if cost_per_1k is None or cer is None or doc_count == 0:
|
| 248 |
+
continue
|
| 249 |
+
# Proxy : cer × 1000 caractères / page (échelle stable cohérente
|
| 250 |
+
# avec ``cost_per_1k_pages_eur``).
|
| 251 |
+
estimated_errors = cer * 1000.0
|
| 252 |
+
per_engine[entry["name"]] = {
|
| 253 |
+
"cost": cost_per_1k,
|
| 254 |
+
"errors": estimated_errors,
|
| 255 |
+
}
|
| 256 |
+
if len(per_engine) < 2:
|
| 257 |
+
return None
|
| 258 |
+
result = compute_marginal_cost_matrix(per_engine)
|
| 259 |
+
if not result:
|
| 260 |
+
return None
|
| 261 |
+
# ``compute_marginal_cost_matrix`` retourne ``{"pairs": [...]}``.
|
| 262 |
+
# On expose la liste ``pairs`` pour que le renderer reçoive un
|
| 263 |
+
# itérable de dicts (pas un wrapper).
|
| 264 |
+
return result.get("pairs") or None
|
| 265 |
+
|
| 266 |
+
|
| 267 |
+
__all__ = [
|
| 268 |
+
"compute_rare_token_recall_per_engine",
|
| 269 |
+
"compute_taxonomy_cooccurrence_section",
|
| 270 |
+
"compute_taxonomy_intra_doc_section",
|
| 271 |
+
"compute_marginal_cost_section",
|
| 272 |
+
]
|
|
@@ -0,0 +1,159 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Front Pareto coût/qualité (Sprint 19).
|
| 2 |
+
|
| 3 |
+
Construit trois fronts Pareto avec des axes alternatifs :
|
| 4 |
+
|
| 5 |
+
- ``cost`` — CER vs coût € / 1000 pages.
|
| 6 |
+
- ``speed`` — CER vs durée moyenne par page.
|
| 7 |
+
- ``co2`` — CER vs empreinte carbone (g CO₂ / 1000 pages, expérimental).
|
| 8 |
+
|
| 9 |
+
API
|
| 10 |
+
---
|
| 11 |
+
Deux fonctions séparées pour rendre le contrat explicite :
|
| 12 |
+
|
| 13 |
+
1. :func:`attach_engine_costs` — **mute en place** ``engines_summary``
|
| 14 |
+
en y ajoutant ``mean_duration_seconds`` et ``cost`` (extraits du
|
| 15 |
+
benchmark et de la table de pricing). Le nom dit clairement qu'il
|
| 16 |
+
y a mutation.
|
| 17 |
+
2. :func:`build_pareto_section` — **fonction pure**, lit les coûts
|
| 18 |
+
déjà attachés à ``engines_summary``. Retourne le dict ``pareto``
|
| 19 |
+
prêt pour le template.
|
| 20 |
+
|
| 21 |
+
L'orchestrateur (``__init__.py``) appelle les deux dans l'ordre.
|
| 22 |
+
Cette séparation rend possible :
|
| 23 |
+
|
| 24 |
+
- Tester :func:`build_pareto_section` indépendamment avec un
|
| 25 |
+
``engines_summary`` pré-fabriqué.
|
| 26 |
+
- Réutiliser les coûts attachés sans recalculer Pareto.
|
| 27 |
+
"""
|
| 28 |
+
|
| 29 |
+
from __future__ import annotations
|
| 30 |
+
|
| 31 |
+
from typing import TYPE_CHECKING
|
| 32 |
+
|
| 33 |
+
from picarones.evaluation.metrics.pricing import (
|
| 34 |
+
build_costs_for_benchmark,
|
| 35 |
+
load_pricing_database,
|
| 36 |
+
)
|
| 37 |
+
from picarones.evaluation.statistics import compute_pareto_front
|
| 38 |
+
|
| 39 |
+
if TYPE_CHECKING:
|
| 40 |
+
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
def attach_engine_costs(
|
| 44 |
+
engines_summary: list[dict], benchmark: "BenchmarkResult",
|
| 45 |
+
) -> None:
|
| 46 |
+
"""Annote chaque entrée de ``engines_summary`` avec son coût.
|
| 47 |
+
|
| 48 |
+
**Mute en place** : ajoute deux champs à chaque dict moteur :
|
| 49 |
+
|
| 50 |
+
- ``mean_duration_seconds`` (float ou ``None`` si pas de durée).
|
| 51 |
+
- ``cost`` : dict de la forme ``{cost_per_1k_pages_eur: ...,
|
| 52 |
+
co2_per_1k_pages_g: ..., ...}`` ou ``None`` si pricing
|
| 53 |
+
indisponible.
|
| 54 |
+
|
| 55 |
+
Doit être appelée AVANT :func:`build_pareto_section`, qui lit
|
| 56 |
+
ces deux champs.
|
| 57 |
+
"""
|
| 58 |
+
durations_by_engine: dict[str, float] = {}
|
| 59 |
+
for report in benchmark.engine_reports:
|
| 60 |
+
durs = [
|
| 61 |
+
dr.duration_seconds
|
| 62 |
+
for dr in report.document_results
|
| 63 |
+
if dr.duration_seconds is not None
|
| 64 |
+
]
|
| 65 |
+
if durs:
|
| 66 |
+
durations_by_engine[report.engine_name] = sum(durs) / len(durs)
|
| 67 |
+
|
| 68 |
+
costs_by_engine = build_costs_for_benchmark(
|
| 69 |
+
engines_summary, durations_by_engine,
|
| 70 |
+
)
|
| 71 |
+
for entry in engines_summary:
|
| 72 |
+
name = entry["name"]
|
| 73 |
+
entry["mean_duration_seconds"] = (
|
| 74 |
+
round(durations_by_engine.get(name, 0.0), 4)
|
| 75 |
+
if name in durations_by_engine else None
|
| 76 |
+
)
|
| 77 |
+
entry["cost"] = costs_by_engine.get(name)
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
def build_pareto_section(engines_summary: list[dict]) -> dict:
|
| 81 |
+
"""Construit le bloc ``pareto`` du dict de rapport.
|
| 82 |
+
|
| 83 |
+
**Fonction pure** : ne mute rien. Lit ``mean_duration_seconds``
|
| 84 |
+
et ``cost`` qui doivent avoir été attachés en amont par
|
| 85 |
+
:func:`attach_engine_costs`. Si ces champs sont absents, le
|
| 86 |
+
moteur est silencieusement omis du front (cohérent avec un
|
| 87 |
+
moteur qui n'a pas de prix connu).
|
| 88 |
+
|
| 89 |
+
Retour
|
| 90 |
+
------
|
| 91 |
+
dict
|
| 92 |
+
Trois fronts Pareto (``cost``, ``speed``, ``co2``) plus
|
| 93 |
+
``pricing_meta`` (table de pricing utilisée).
|
| 94 |
+
"""
|
| 95 |
+
pricing_defaults, _ = load_pricing_database()
|
| 96 |
+
|
| 97 |
+
pareto_points = []
|
| 98 |
+
for entry in engines_summary:
|
| 99 |
+
cer = entry.get("cer")
|
| 100 |
+
cost = (entry.get("cost") or {}).get("cost_per_1k_pages_eur")
|
| 101 |
+
if cer is None or cost is None:
|
| 102 |
+
continue
|
| 103 |
+
pareto_points.append({"engine": entry["name"], "cer": cer, "cost": cost})
|
| 104 |
+
pareto_front_engines = compute_pareto_front(
|
| 105 |
+
pareto_points, objectives=("cer", "cost"),
|
| 106 |
+
)
|
| 107 |
+
|
| 108 |
+
pareto_speed_points = []
|
| 109 |
+
for entry in engines_summary:
|
| 110 |
+
cer = entry.get("cer")
|
| 111 |
+
dur = entry.get("mean_duration_seconds")
|
| 112 |
+
if cer is None or dur is None:
|
| 113 |
+
continue
|
| 114 |
+
pareto_speed_points.append({"engine": entry["name"], "cer": cer, "dur": dur})
|
| 115 |
+
pareto_front_speed = compute_pareto_front(
|
| 116 |
+
pareto_speed_points, objectives=("cer", "dur"),
|
| 117 |
+
)
|
| 118 |
+
|
| 119 |
+
pareto_co2_points = []
|
| 120 |
+
for entry in engines_summary:
|
| 121 |
+
cer = entry.get("cer")
|
| 122 |
+
co2 = (entry.get("cost") or {}).get("co2_per_1k_pages_g")
|
| 123 |
+
if cer is None or co2 is None:
|
| 124 |
+
continue
|
| 125 |
+
pareto_co2_points.append({"engine": entry["name"], "cer": cer, "co2": co2})
|
| 126 |
+
pareto_front_co2 = compute_pareto_front(
|
| 127 |
+
pareto_co2_points, objectives=("cer", "co2"),
|
| 128 |
+
)
|
| 129 |
+
|
| 130 |
+
return {
|
| 131 |
+
"cost": {
|
| 132 |
+
"points": pareto_points,
|
| 133 |
+
"front": pareto_front_engines,
|
| 134 |
+
"axis_label": "Coût (€ / 1000 pages)",
|
| 135 |
+
},
|
| 136 |
+
"speed": {
|
| 137 |
+
"points": pareto_speed_points,
|
| 138 |
+
"front": pareto_front_speed,
|
| 139 |
+
"axis_label": "Temps moyen (s / page)",
|
| 140 |
+
},
|
| 141 |
+
"co2": {
|
| 142 |
+
"points": pareto_co2_points,
|
| 143 |
+
"front": pareto_front_co2,
|
| 144 |
+
"axis_label": (
|
| 145 |
+
"Empreinte carbone (g CO₂ / 1000 pages, expérimental)"
|
| 146 |
+
),
|
| 147 |
+
},
|
| 148 |
+
"pricing_meta": {
|
| 149 |
+
"last_updated": pricing_defaults.last_updated,
|
| 150 |
+
"currency": pricing_defaults.currency,
|
| 151 |
+
"hourly_rate_local_cpu_eur": pricing_defaults.hourly_rate_local_cpu_eur,
|
| 152 |
+
"hourly_rate_local_gpu_eur": pricing_defaults.hourly_rate_local_gpu_eur,
|
| 153 |
+
"grid_intensity_local": pricing_defaults.grid_intensity_local,
|
| 154 |
+
"grid_intensity_cloud": pricing_defaults.grid_intensity_cloud,
|
| 155 |
+
},
|
| 156 |
+
}
|
| 157 |
+
|
| 158 |
+
|
| 159 |
+
__all__ = ["attach_engine_costs", "build_pareto_section"]
|
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Scatter plots du rapport (Sprint 10).
|
| 2 |
+
|
| 3 |
+
- ``gini_vs_cer`` — corrélation Gini (concentration des erreurs)
|
| 4 |
+
vs CER moyen, par moteur.
|
| 5 |
+
- ``ratio_vs_anchor`` — ratio de longueur OCR/GT vs score d'ancrage,
|
| 6 |
+
par moteur (révèle les hallucinations VLM).
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
from __future__ import annotations
|
| 10 |
+
|
| 11 |
+
from typing import TYPE_CHECKING
|
| 12 |
+
|
| 13 |
+
from picarones.reports_v2.html.data._helpers import safe_round
|
| 14 |
+
|
| 15 |
+
if TYPE_CHECKING:
|
| 16 |
+
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def build_gini_vs_cer(benchmark: "BenchmarkResult") -> list[dict]:
|
| 20 |
+
"""Scatter Gini de la distribution d'erreurs vs CER moyen."""
|
| 21 |
+
gini_vs_cer: list[dict] = []
|
| 22 |
+
for report in benchmark.engine_reports:
|
| 23 |
+
line_metrics = report.aggregated_line_metrics
|
| 24 |
+
gini_val = line_metrics.get("gini_mean") if line_metrics else None
|
| 25 |
+
cer_val = report.mean_cer
|
| 26 |
+
if gini_val is not None and cer_val is not None:
|
| 27 |
+
gini_vs_cer.append({
|
| 28 |
+
"engine": report.engine_name,
|
| 29 |
+
"cer": safe_round(cer_val),
|
| 30 |
+
"gini": safe_round(gini_val),
|
| 31 |
+
"is_pipeline": report.is_pipeline,
|
| 32 |
+
})
|
| 33 |
+
return gini_vs_cer
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def build_ratio_vs_anchor(benchmark: "BenchmarkResult") -> list[dict]:
|
| 37 |
+
"""Scatter ratio de longueur vs score d'ancrage (détection VLM)."""
|
| 38 |
+
ratio_vs_anchor: list[dict] = []
|
| 39 |
+
for report in benchmark.engine_reports:
|
| 40 |
+
halluc = report.aggregated_hallucination
|
| 41 |
+
if not halluc:
|
| 42 |
+
continue
|
| 43 |
+
ratio_vs_anchor.append({
|
| 44 |
+
"engine": report.engine_name,
|
| 45 |
+
"length_ratio": safe_round(halluc.get("length_ratio_mean", 1.0)),
|
| 46 |
+
"anchor_score": safe_round(halluc.get("anchor_score_mean", 1.0)),
|
| 47 |
+
"hallucinating_rate": safe_round(halluc.get("hallucinating_doc_rate", 0.0)),
|
| 48 |
+
"is_vlm": (
|
| 49 |
+
report.pipeline_info.get("is_vlm", False)
|
| 50 |
+
if report.pipeline_info else False
|
| 51 |
+
),
|
| 52 |
+
})
|
| 53 |
+
return ratio_vs_anchor
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
__all__ = ["build_gini_vs_cer", "build_ratio_vs_anchor"]
|
|
@@ -0,0 +1,216 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Sections statistiques du rapport (Sprint 7 + Sprint 17).
|
| 2 |
+
|
| 3 |
+
Construit les blocs :
|
| 4 |
+
|
| 5 |
+
- ``pairwise_wilcoxon`` — tests de Wilcoxon par paire de moteurs.
|
| 6 |
+
- ``bootstrap_cis`` — intervalles de confiance bootstrap par moteur.
|
| 7 |
+
- ``friedman`` + ``nemenyi`` — Sprint 17, multi-moteurs.
|
| 8 |
+
- ``reliability_curves`` — courbes de fiabilité par moteur.
|
| 9 |
+
- ``venn_data`` — diagramme de Venn des erreurs communes/exclusives.
|
| 10 |
+
- ``error_clusters`` — clustering des patterns d'erreurs.
|
| 11 |
+
- ``correlation_per_engine`` — matrice de corrélation par moteur.
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
from __future__ import annotations
|
| 15 |
+
|
| 16 |
+
from typing import TYPE_CHECKING, Optional
|
| 17 |
+
|
| 18 |
+
from picarones.evaluation import compute_word_diff
|
| 19 |
+
from picarones.evaluation.statistics import (
|
| 20 |
+
bootstrap_ci,
|
| 21 |
+
cluster_errors,
|
| 22 |
+
compute_correlation_matrix,
|
| 23 |
+
compute_pairwise_stats,
|
| 24 |
+
compute_reliability_curve,
|
| 25 |
+
compute_venn_data,
|
| 26 |
+
friedman_test,
|
| 27 |
+
nemenyi_posthoc,
|
| 28 |
+
)
|
| 29 |
+
from picarones.reports_v2.html.data._helpers import safe_round
|
| 30 |
+
|
| 31 |
+
if TYPE_CHECKING:
|
| 32 |
+
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def _engine_cer_values(benchmark: "BenchmarkResult") -> dict[str, list[float]]:
|
| 36 |
+
"""Map ``engine_name → [cer_individuels valides]``."""
|
| 37 |
+
out: dict[str, list[float]] = {}
|
| 38 |
+
for report in benchmark.engine_reports:
|
| 39 |
+
vals = [
|
| 40 |
+
safe_round(dr.metrics.cer)
|
| 41 |
+
for dr in report.document_results
|
| 42 |
+
if dr.metrics.error is None
|
| 43 |
+
]
|
| 44 |
+
if vals:
|
| 45 |
+
out[report.engine_name] = vals
|
| 46 |
+
return out
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def build_pairwise_wilcoxon(benchmark: "BenchmarkResult") -> list[dict]:
|
| 50 |
+
"""Tests de Wilcoxon par paire de moteurs (Sprint 7)."""
|
| 51 |
+
return compute_pairwise_stats(_engine_cer_values(benchmark))
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def build_bootstrap_cis(benchmark: "BenchmarkResult") -> list[dict]:
|
| 55 |
+
"""Intervalles de confiance bootstrap par moteur (Sprint 7)."""
|
| 56 |
+
bootstrap_cis: list[dict] = []
|
| 57 |
+
for engine_name, vals in _engine_cer_values(benchmark).items():
|
| 58 |
+
lo, hi = bootstrap_ci(vals)
|
| 59 |
+
mean_v = sum(vals) / len(vals) if vals else 0.0
|
| 60 |
+
bootstrap_cis.append({
|
| 61 |
+
"engine": engine_name,
|
| 62 |
+
"mean": safe_round(mean_v),
|
| 63 |
+
"ci_lower": safe_round(lo),
|
| 64 |
+
"ci_upper": safe_round(hi),
|
| 65 |
+
})
|
| 66 |
+
return bootstrap_cis
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
def build_friedman_and_nemenyi(benchmark: "BenchmarkResult") -> dict:
|
| 70 |
+
"""Test de Friedman + post-hoc Nemenyi (Sprint 17, multi-moteurs).
|
| 71 |
+
|
| 72 |
+
Alignement strict sur le même ordre de documents : on reconstruit
|
| 73 |
+
la map à partir des documents communs à tous les moteurs, sinon
|
| 74 |
+
Friedman n'est pas applicable.
|
| 75 |
+
|
| 76 |
+
Returns
|
| 77 |
+
-------
|
| 78 |
+
dict
|
| 79 |
+
``{"friedman": {...}, "nemenyi": {...}}`` à fusionner dans
|
| 80 |
+
la section ``statistics`` du rapport.
|
| 81 |
+
"""
|
| 82 |
+
# Liste ordonnée des doc_ids selon l'ordre d'apparition.
|
| 83 |
+
seen: set[str] = set()
|
| 84 |
+
doc_ids_ordered: list[str] = []
|
| 85 |
+
for report in benchmark.engine_reports:
|
| 86 |
+
for dr in report.document_results:
|
| 87 |
+
if dr.doc_id not in seen:
|
| 88 |
+
seen.add(dr.doc_id)
|
| 89 |
+
doc_ids_ordered.append(dr.doc_id)
|
| 90 |
+
|
| 91 |
+
common_doc_ids: Optional[set[str]] = None
|
| 92 |
+
for report in benchmark.engine_reports:
|
| 93 |
+
doc_ids = {dr.doc_id for dr in report.document_results if dr.metrics.error is None}
|
| 94 |
+
common_doc_ids = doc_ids if common_doc_ids is None else common_doc_ids & doc_ids
|
| 95 |
+
|
| 96 |
+
engine_cer_aligned: dict[str, list[float]] = {}
|
| 97 |
+
if common_doc_ids:
|
| 98 |
+
ordered_common = [d for d in doc_ids_ordered if d in common_doc_ids]
|
| 99 |
+
for report in benchmark.engine_reports:
|
| 100 |
+
dr_by_id = {dr.doc_id: dr for dr in report.document_results}
|
| 101 |
+
engine_cer_aligned[report.engine_name] = [
|
| 102 |
+
safe_round(dr_by_id[d].metrics.cer) for d in ordered_common
|
| 103 |
+
]
|
| 104 |
+
|
| 105 |
+
if engine_cer_aligned:
|
| 106 |
+
friedman = friedman_test(engine_cer_aligned)
|
| 107 |
+
nemenyi = nemenyi_posthoc(engine_cer_aligned)
|
| 108 |
+
else:
|
| 109 |
+
friedman = {
|
| 110 |
+
"statistic": 0.0, "p_value": 1.0, "significant": False,
|
| 111 |
+
"df": 0, "n_blocks": 0, "n_engines": 0, "mean_ranks": {},
|
| 112 |
+
"interpretation": "Test de Friedman non calculé — aucun document commun.",
|
| 113 |
+
"error": "no_common_documents",
|
| 114 |
+
}
|
| 115 |
+
nemenyi = {
|
| 116 |
+
"alpha": 0.05, "critical_distance": 0.0, "q_alpha": 0.0,
|
| 117 |
+
"n_blocks": 0, "n_engines": 0, "mean_ranks": {},
|
| 118 |
+
"engines_sorted": [], "significant_matrix": [], "tied_groups": [],
|
| 119 |
+
"error": "no_common_documents",
|
| 120 |
+
}
|
| 121 |
+
return {"friedman": friedman, "nemenyi": nemenyi}
|
| 122 |
+
|
| 123 |
+
|
| 124 |
+
def build_reliability_curves(benchmark: "BenchmarkResult") -> list[dict]:
|
| 125 |
+
"""Courbes de fiabilité par moteur (Sprint 7)."""
|
| 126 |
+
reliability_curves: list[dict] = []
|
| 127 |
+
for report in benchmark.engine_reports:
|
| 128 |
+
vals = [
|
| 129 |
+
safe_round(dr.metrics.cer)
|
| 130 |
+
for dr in report.document_results
|
| 131 |
+
if dr.metrics.error is None
|
| 132 |
+
]
|
| 133 |
+
curve = compute_reliability_curve(vals)
|
| 134 |
+
reliability_curves.append({
|
| 135 |
+
"engine": report.engine_name,
|
| 136 |
+
"points": curve,
|
| 137 |
+
})
|
| 138 |
+
return reliability_curves
|
| 139 |
+
|
| 140 |
+
|
| 141 |
+
def build_venn_data(benchmark: "BenchmarkResult") -> dict:
|
| 142 |
+
"""Venn des erreurs communes / exclusives (Sprint 7).
|
| 143 |
+
|
| 144 |
+
Construit les ensembles d'erreurs par moteur :
|
| 145 |
+
``{engine → set("doc_id:gt_tok:hyp_tok")}``.
|
| 146 |
+
"""
|
| 147 |
+
venn_error_sets: dict[str, set[str]] = {}
|
| 148 |
+
for report in benchmark.engine_reports:
|
| 149 |
+
error_set: set[str] = set()
|
| 150 |
+
for dr in report.document_results:
|
| 151 |
+
ops = compute_word_diff(dr.ground_truth, dr.hypothesis)
|
| 152 |
+
for op in ops:
|
| 153 |
+
if op["op"] in ("replace", "delete", "insert"):
|
| 154 |
+
key = (
|
| 155 |
+
f"{dr.doc_id}:"
|
| 156 |
+
f"{op.get('old', op.get('text', ''))}:"
|
| 157 |
+
f"{op.get('new', op.get('text', ''))}"
|
| 158 |
+
)
|
| 159 |
+
error_set.add(key)
|
| 160 |
+
venn_error_sets[report.engine_name] = error_set
|
| 161 |
+
return compute_venn_data(venn_error_sets)
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
def build_error_clusters(benchmark: "BenchmarkResult") -> list[dict]:
|
| 165 |
+
"""Clustering des patterns d'erreurs (Sprint 7)."""
|
| 166 |
+
error_data_all: list[dict] = []
|
| 167 |
+
for report in benchmark.engine_reports:
|
| 168 |
+
for dr in report.document_results:
|
| 169 |
+
error_data_all.append({
|
| 170 |
+
"engine": report.engine_name,
|
| 171 |
+
"gt": dr.ground_truth,
|
| 172 |
+
"hypothesis": dr.hypothesis,
|
| 173 |
+
})
|
| 174 |
+
error_clusters_raw = cluster_errors(error_data_all, max_clusters=8)
|
| 175 |
+
return [c.as_dict() for c in error_clusters_raw]
|
| 176 |
+
|
| 177 |
+
|
| 178 |
+
def build_correlation_per_engine(benchmark: "BenchmarkResult") -> list[dict]:
|
| 179 |
+
"""Matrice de corrélation par moteur entre métriques métiers (Sprint 7)."""
|
| 180 |
+
correlation_per_engine: list[dict] = []
|
| 181 |
+
for report in benchmark.engine_reports:
|
| 182 |
+
metrics_list: list[dict[str, float]] = []
|
| 183 |
+
for dr in report.document_results:
|
| 184 |
+
if dr.metrics.error is not None:
|
| 185 |
+
continue
|
| 186 |
+
entry: dict[str, float] = {
|
| 187 |
+
"cer": safe_round(dr.metrics.cer),
|
| 188 |
+
"wer": safe_round(dr.metrics.wer),
|
| 189 |
+
"mer": safe_round(dr.metrics.mer),
|
| 190 |
+
"wil": safe_round(dr.metrics.wil),
|
| 191 |
+
}
|
| 192 |
+
if dr.image_quality:
|
| 193 |
+
entry["quality_score"] = safe_round(dr.image_quality.get("quality_score", 0.5))
|
| 194 |
+
entry["sharpness"] = safe_round(dr.image_quality.get("sharpness_score", 0.5))
|
| 195 |
+
if dr.char_scores:
|
| 196 |
+
entry["ligature"] = safe_round(dr.char_scores.get("ligature", {}).get("score", 0.5))
|
| 197 |
+
entry["diacritic"] = safe_round(dr.char_scores.get("diacritic", {}).get("score", 0.5))
|
| 198 |
+
metrics_list.append(entry)
|
| 199 |
+
if metrics_list:
|
| 200 |
+
corr = compute_correlation_matrix(metrics_list)
|
| 201 |
+
correlation_per_engine.append({
|
| 202 |
+
"engine": report.engine_name,
|
| 203 |
+
**corr,
|
| 204 |
+
})
|
| 205 |
+
return correlation_per_engine
|
| 206 |
+
|
| 207 |
+
|
| 208 |
+
__all__ = [
|
| 209 |
+
"build_pairwise_wilcoxon",
|
| 210 |
+
"build_bootstrap_cis",
|
| 211 |
+
"build_friedman_and_nemenyi",
|
| 212 |
+
"build_reliability_curves",
|
| 213 |
+
"build_venn_data",
|
| 214 |
+
"build_error_clusters",
|
| 215 |
+
"build_correlation_per_engine",
|
| 216 |
+
]
|
|
@@ -0,0 +1,471 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Générateur du rapport HTML interactif auto-contenu.
|
| 2 |
+
|
| 3 |
+
Phase 5.E — module relocalisé depuis ``picarones.report.generator``
|
| 4 |
+
vers ``picarones.reports_v2.html.generator``. Le chemin legacy
|
| 5 |
+
reste disponible via un shim avec ``DeprecationWarning`` ;
|
| 6 |
+
suppression prévue en 2.0.
|
| 7 |
+
|
| 8 |
+
Le rapport produit est un fichier HTML unique embarquant :
|
| 9 |
+
- Toutes les données (JSON inline)
|
| 10 |
+
- Chart.js et diff2html (depuis cdnjs)
|
| 11 |
+
- CSS et JavaScript de l'application
|
| 12 |
+
|
| 13 |
+
Vues disponibles
|
| 14 |
+
----------------
|
| 15 |
+
1. Classement — tableau triable par colonne (CER, WER, MER, WIL)
|
| 16 |
+
2. Galerie — grille d'images avec badge CER coloré
|
| 17 |
+
3. Document — image zoomable + diff coloré GT / OCR par moteur
|
| 18 |
+
4. Analyses — histogramme CER + graphique radar
|
| 19 |
+
|
| 20 |
+
Architecture
|
| 21 |
+
------------
|
| 22 |
+
Ce module est l'**orchestrateur**. Les responsabilités lourdes sont
|
| 23 |
+
découpées en sous-modules :
|
| 24 |
+
|
| 25 |
+
- :mod:`picarones.report.assets` — chargement vendor.js, encodage
|
| 26 |
+
base64 d'images, externalisation lazy.
|
| 27 |
+
- :mod:`picarones.report.report_data` — construction du dict JSON
|
| 28 |
+
passé au template (engines, documents, statistiques, Pareto, etc.).
|
| 29 |
+
- :mod:`picarones.report.render_helpers` — couleurs / SVG mutualisés.
|
| 30 |
+
|
| 31 |
+
Rétrocompat
|
| 32 |
+
-----------
|
| 33 |
+
Deux noms historiques sont **encore importés par des tests** sous
|
| 34 |
+
leur préfixe ``_`` et doivent être préservés :
|
| 35 |
+
|
| 36 |
+
- ``_build_report_data`` (importé par 14 fichiers de tests).
|
| 37 |
+
- ``_cer_color`` (importé par ``tests/report/test_report.py``).
|
| 38 |
+
|
| 39 |
+
Les autres noms ``_pct``, ``_safe``, ``_cer_bg``, ``_encode_image_b64``,
|
| 40 |
+
``_encode_images_b64_from_result``, ``_externalize_images_to_dir``,
|
| 41 |
+
``_load_vendor_js`` sont soit utilisés en interne (les 3 derniers,
|
| 42 |
+
voir :meth:`ReportGenerator.generate`), soit accessibles via leur
|
| 43 |
+
nom canonique dans :mod:`picarones.report.assets` ou
|
| 44 |
+
:mod:`picarones.report.render_helpers`.
|
| 45 |
+
"""
|
| 46 |
+
|
| 47 |
+
from __future__ import annotations
|
| 48 |
+
|
| 49 |
+
import json
|
| 50 |
+
import logging
|
| 51 |
+
from pathlib import Path
|
| 52 |
+
from typing import Any, Optional
|
| 53 |
+
|
| 54 |
+
from picarones.evaluation.benchmark_result import BenchmarkResult
|
| 55 |
+
from picarones.evaluation.statistics import build_critical_difference_svg
|
| 56 |
+
from picarones.reports_v2._helpers.assets import (
|
| 57 |
+
encode_images_b64_from_result as _encode_images_b64_from_result,
|
| 58 |
+
externalize_images_to_dir as _externalize_images_to_dir,
|
| 59 |
+
load_vendor_js as _load_vendor_js,
|
| 60 |
+
)
|
| 61 |
+
|
| 62 |
+
# Ré-exports rétrocompat consommés par les tests externes (cf. docstring
|
| 63 |
+
# de module). La directive de fin de ligne documente l'intention de
|
| 64 |
+
# ré-export et empêche ruff de marquer l'import comme inutilisé.
|
| 65 |
+
from picarones.reports_v2._helpers.render_helpers import cer_step_color as _cer_color # noqa: F401
|
| 66 |
+
from picarones.reports_v2.html.data import build_report_data as _build_report_data # noqa: F401
|
| 67 |
+
|
| 68 |
+
logger = logging.getLogger(__name__)
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
# ---------------------------------------------------------------------------
|
| 72 |
+
# Rendu Jinja2
|
| 73 |
+
# ---------------------------------------------------------------------------
|
| 74 |
+
|
| 75 |
+
# Depuis le Sprint 16, le template monolithique ~3100 lignes a été découpé en
|
| 76 |
+
# fichiers externes dans ``picarones/report/templates/`` (CSS, JS, vues HTML).
|
| 77 |
+
# ``base.html.j2`` assemble le tout via ``{% include %}``.
|
| 78 |
+
|
| 79 |
+
_TEMPLATES_DIR = Path(__file__).parent / "templates"
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def _build_jinja_env():
|
| 83 |
+
"""Construit l'Environment Jinja2 pour le rapport.
|
| 84 |
+
|
| 85 |
+
Autoescape désactivé : le comportement est équivalent à celui du
|
| 86 |
+
``_HTML_TEMPLATE.format()`` historique. Les variables injectées
|
| 87 |
+
(JSON embarqué, SVG généré, synthèse narrative issue de templates
|
| 88 |
+
internes) sont toutes produites par le code Picarones et ne
|
| 89 |
+
nécessitent pas d'échappement HTML.
|
| 90 |
+
"""
|
| 91 |
+
from jinja2 import Environment, FileSystemLoader
|
| 92 |
+
env = Environment(
|
| 93 |
+
loader=FileSystemLoader(str(_TEMPLATES_DIR)),
|
| 94 |
+
autoescape=False,
|
| 95 |
+
keep_trailing_newline=True,
|
| 96 |
+
)
|
| 97 |
+
return env
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
# ---------------------------------------------------------------------------
|
| 101 |
+
# Classe principale
|
| 102 |
+
# ---------------------------------------------------------------------------
|
| 103 |
+
|
| 104 |
+
class ReportGenerator:
|
| 105 |
+
"""Génère un rapport HTML interactif depuis un BenchmarkResult.
|
| 106 |
+
|
| 107 |
+
Usage
|
| 108 |
+
-----
|
| 109 |
+
>>> from picarones.reports_v2.html import ReportGenerator
|
| 110 |
+
>>> gen = ReportGenerator(benchmark_result)
|
| 111 |
+
>>> path = gen.generate("rapport.html")
|
| 112 |
+
>>> # Rapport en anglais :
|
| 113 |
+
>>> gen_en = ReportGenerator(benchmark_result, lang="en")
|
| 114 |
+
>>> path_en = gen_en.generate("report.html")
|
| 115 |
+
"""
|
| 116 |
+
|
| 117 |
+
def __init__(
|
| 118 |
+
self,
|
| 119 |
+
benchmark: BenchmarkResult,
|
| 120 |
+
images_b64: Optional[dict[str, str]] = None,
|
| 121 |
+
lang: str = "fr",
|
| 122 |
+
normalization_profile: Any = None,
|
| 123 |
+
lazy_images: bool = False,
|
| 124 |
+
) -> None:
|
| 125 |
+
"""
|
| 126 |
+
Parameters
|
| 127 |
+
----------
|
| 128 |
+
benchmark:
|
| 129 |
+
Résultat de benchmark à visualiser.
|
| 130 |
+
images_b64:
|
| 131 |
+
Dictionnaire {doc_id: data-URI base64 OU url relative} des images.
|
| 132 |
+
Si None, le générateur cherche dans ``benchmark.metadata["_images_b64"]``.
|
| 133 |
+
Si ``lazy_images=True``, la valeur attendue est une URL relative
|
| 134 |
+
comme ``"report-assets/<doc>.png"``.
|
| 135 |
+
lang:
|
| 136 |
+
Code langue du rapport : ``"fr"`` (défaut) ou ``"en"``.
|
| 137 |
+
normalization_profile:
|
| 138 |
+
Profil de normalisation effectivement utilisé (Sprint 27 — pour
|
| 139 |
+
le snapshot de reproductibilité). ``None`` retombe sur le
|
| 140 |
+
profil mentionné dans ``benchmark.metadata["normalization_profile"]``
|
| 141 |
+
s'il est présent, sinon snapshot indisponible.
|
| 142 |
+
lazy_images:
|
| 143 |
+
Sprint A5 (M-16) — si ``True``, les images sont écrites en
|
| 144 |
+
fichiers PNG/JPEG dans ``<output_dir>/report-assets/`` à côté
|
| 145 |
+
du HTML, et référencées via ``<img loading="lazy">``.
|
| 146 |
+
Le rapport reste auto-portant si on copie aussi le dossier
|
| 147 |
+
d'assets. Utile pour les corpus > 50 documents (un rapport
|
| 148 |
+
base64 monolithique de 1 000 docs dépasse 200 MB et fait
|
| 149 |
+
ramer le navigateur). En mode mono-doc ou démo : laisser
|
| 150 |
+
``False`` pour un fichier HTML unique transportable.
|
| 151 |
+
"""
|
| 152 |
+
self.benchmark = benchmark
|
| 153 |
+
self.images_b64: dict[str, str] = images_b64 or {}
|
| 154 |
+
self.lang = lang
|
| 155 |
+
self.normalization_profile = normalization_profile
|
| 156 |
+
self.lazy_images = lazy_images
|
| 157 |
+
|
| 158 |
+
# Récupérer les images embarquées dans les metadata (fixtures)
|
| 159 |
+
if not self.images_b64:
|
| 160 |
+
self.images_b64 = benchmark.metadata.get("_images_b64", {}) # type: ignore[assignment]
|
| 161 |
+
|
| 162 |
+
# Sprint 27 — fallback : profil de normalisation depuis les metadata
|
| 163 |
+
if self.normalization_profile is None:
|
| 164 |
+
self.normalization_profile = benchmark.metadata.get("normalization_profile")
|
| 165 |
+
|
| 166 |
+
def generate(self, output_path: str | Path) -> Path:
|
| 167 |
+
"""Génère le fichier HTML et le sauvegarde sur disque.
|
| 168 |
+
|
| 169 |
+
Parameters
|
| 170 |
+
----------
|
| 171 |
+
output_path:
|
| 172 |
+
Chemin du fichier HTML à écrire.
|
| 173 |
+
|
| 174 |
+
Returns
|
| 175 |
+
-------
|
| 176 |
+
Path
|
| 177 |
+
Chemin absolu du fichier généré.
|
| 178 |
+
"""
|
| 179 |
+
from picarones.reports_v2.i18n import get_labels
|
| 180 |
+
|
| 181 |
+
output_path = Path(output_path)
|
| 182 |
+
output_path.parent.mkdir(parents=True, exist_ok=True)
|
| 183 |
+
|
| 184 |
+
# Sprint A5 (M-16) — externalisation des images si lazy_images=True
|
| 185 |
+
# ou auto-encodage base64 sinon. Les deux modes alimentent la même
|
| 186 |
+
# variable ``images_b64`` (le nom est conservé pour rétrocompat ;
|
| 187 |
+
# en mode lazy la valeur est une URL relative au lieu d'un data-URI).
|
| 188 |
+
# En mode lazy, on **force** l'externalisation même si self.images_b64
|
| 189 |
+
# est pré-rempli (par les fixtures, par metadata, etc.) — sinon le
|
| 190 |
+
# rapport contiendrait quand même des data-URI géants.
|
| 191 |
+
if self.lazy_images:
|
| 192 |
+
images_b64 = _externalize_images_to_dir(
|
| 193 |
+
self.benchmark, output_path.parent,
|
| 194 |
+
)
|
| 195 |
+
else:
|
| 196 |
+
images_b64 = self.images_b64
|
| 197 |
+
if not images_b64:
|
| 198 |
+
images_b64 = _encode_images_b64_from_result(self.benchmark)
|
| 199 |
+
|
| 200 |
+
labels = get_labels(self.lang)
|
| 201 |
+
report_data = _build_report_data(self.benchmark, images_b64)
|
| 202 |
+
|
| 203 |
+
# Sprint 27 — snapshots de reproductibilité (pricing, glossaire,
|
| 204 |
+
# profil de normalisation, environnement). Embarqués dans le JSON
|
| 205 |
+
# du rapport pour qu'un lecteur puisse régénérer la synthèse, le
|
| 206 |
+
# Pareto et le glossaire sans accès au code source.
|
| 207 |
+
from picarones.reports_v2.html.snapshot import snapshot_all
|
| 208 |
+
report_data["snapshots"] = snapshot_all(
|
| 209 |
+
lang=self.lang,
|
| 210 |
+
normalization_profile=self.normalization_profile,
|
| 211 |
+
)
|
| 212 |
+
|
| 213 |
+
report_json = json.dumps(report_data, ensure_ascii=False, separators=(",", ":"))
|
| 214 |
+
i18n_json = json.dumps(labels, ensure_ascii=False, separators=(",", ":"))
|
| 215 |
+
chartjs_js = _load_vendor_js("chart.umd.min.js")
|
| 216 |
+
|
| 217 |
+
# Sprint 17 — rendu SVG du CDD côté serveur (statique, pas de JS)
|
| 218 |
+
cdd_svg = build_critical_difference_svg(
|
| 219 |
+
report_data.get("statistics", {}).get("nemenyi", {}),
|
| 220 |
+
)
|
| 221 |
+
|
| 222 |
+
# Sprint 18 — synthèse factuelle narrative (déterministe, sans LLM)
|
| 223 |
+
from picarones.reports_v2.narrative import build_synthesis
|
| 224 |
+
synthesis = build_synthesis(report_data, lang=self.lang)
|
| 225 |
+
|
| 226 |
+
# Sprint 20 — glossaire contextuel chargé depuis YAML
|
| 227 |
+
from picarones.reports_v2.glossary import load_glossary
|
| 228 |
+
glossary = load_glossary(self.lang)
|
| 229 |
+
glossary_json = json.dumps(glossary, ensure_ascii=False, separators=(",", ":"))
|
| 230 |
+
|
| 231 |
+
section_html = self._build_section_html(report_data, labels)
|
| 232 |
+
|
| 233 |
+
env = _build_jinja_env()
|
| 234 |
+
template = env.get_template("base.html.j2")
|
| 235 |
+
html = template.render(
|
| 236 |
+
corpus_name=self.benchmark.corpus_name,
|
| 237 |
+
picarones_version=self.benchmark.picarones_version,
|
| 238 |
+
report_data_json=report_json,
|
| 239 |
+
i18n_json=i18n_json,
|
| 240 |
+
html_lang=labels.get("html_lang", "fr"),
|
| 241 |
+
chartjs_inline=chartjs_js,
|
| 242 |
+
critical_difference_svg=cdd_svg,
|
| 243 |
+
friedman=report_data.get("statistics", {}).get("friedman", {}),
|
| 244 |
+
synthesis=synthesis,
|
| 245 |
+
glossary_json=glossary_json,
|
| 246 |
+
**section_html,
|
| 247 |
+
)
|
| 248 |
+
|
| 249 |
+
output_path.write_text(html, encoding="utf-8")
|
| 250 |
+
return output_path.resolve()
|
| 251 |
+
|
| 252 |
+
def _build_section_html(
|
| 253 |
+
self, report_data: dict, labels: dict[str, str],
|
| 254 |
+
) -> dict[str, str]:
|
| 255 |
+
"""Construit toutes les sections HTML conditionnelles du rapport.
|
| 256 |
+
|
| 257 |
+
Chaque renderer (NER, calibration, philologie, etc.) est appelé
|
| 258 |
+
de manière indépendante. Une section retourne ``""`` si aucun
|
| 259 |
+
moteur n'a de signal pour elle — le template gère l'affichage
|
| 260 |
+
conditionnel.
|
| 261 |
+
|
| 262 |
+
Returns
|
| 263 |
+
-------
|
| 264 |
+
dict[str, str]
|
| 265 |
+
Map ``{nom_de_section: html}`` à splatter dans
|
| 266 |
+
``template.render(**section_html)``.
|
| 267 |
+
"""
|
| 268 |
+
engines = report_data.get("engines", [])
|
| 269 |
+
|
| 270 |
+
# Sprint 37 — section inter-moteurs (matrice de divergence + oracle).
|
| 271 |
+
from picarones.reports_v2.html.renderers.inter_engine import (
|
| 272 |
+
build_divergence_matrix_html,
|
| 273 |
+
build_oracle_gap_html,
|
| 274 |
+
)
|
| 275 |
+
# Sprint 41 — section NER (résumé F1 par moteur + heatmap par catégorie).
|
| 276 |
+
from picarones.reports_v2.html.renderers.ner import (
|
| 277 |
+
build_ner_per_category_html,
|
| 278 |
+
build_ner_summary_html,
|
| 279 |
+
)
|
| 280 |
+
# Sprint 43 — section calibration (tableau ECE/MCE + grille de
|
| 281 |
+
# reliability diagrams par moteur).
|
| 282 |
+
from picarones.reports_v2.html.renderers.calibration import (
|
| 283 |
+
build_calibration_summary_html,
|
| 284 |
+
build_reliability_diagrams_grid_html,
|
| 285 |
+
)
|
| 286 |
+
# Sprint 46 — section stratifiée (tableau par strate).
|
| 287 |
+
from picarones.reports_v2.html.renderers.stratification import (
|
| 288 |
+
build_stratified_ranking_html,
|
| 289 |
+
)
|
| 290 |
+
# Sprint 62 — profil philologique (6 sections adaptive).
|
| 291 |
+
from picarones.reports_v2.html.renderers.philological import (
|
| 292 |
+
build_philological_profile_html,
|
| 293 |
+
)
|
| 294 |
+
# Sprint 86 — A.II.5 : recherchabilité fuzzy + séquences numériques.
|
| 295 |
+
from picarones.reports_v2.html.renderers.searchability import (
|
| 296 |
+
build_searchability_summary_html,
|
| 297 |
+
)
|
| 298 |
+
from picarones.reports_v2.html.renderers.numerical_sequences import (
|
| 299 |
+
build_numerical_sequences_html,
|
| 300 |
+
)
|
| 301 |
+
# Sprint 87 — A.II.2 : lisibilité (delta Flesch).
|
| 302 |
+
from picarones.reports_v2.html.renderers.readability import (
|
| 303 |
+
build_readability_summary_html,
|
| 304 |
+
)
|
| 305 |
+
# Sprint 89 — A.II.8b : spécialisation inter-moteurs.
|
| 306 |
+
from picarones.reports_v2.html.renderers.specialization import (
|
| 307 |
+
build_specialization_html,
|
| 308 |
+
)
|
| 309 |
+
# Chantier 3 (post-Sprint 97) — 3 vues thématiques composées.
|
| 310 |
+
from picarones.reports_v2.html.views import (
|
| 311 |
+
build_advanced_taxonomy_view_html,
|
| 312 |
+
build_diagnostics_view_html,
|
| 313 |
+
build_economics_view_html,
|
| 314 |
+
)
|
| 315 |
+
# Sprint « câblage des modules test-only » (mai 2026) — sections
|
| 316 |
+
# qui consomment les nouvelles métriques calculées dans
|
| 317 |
+
# ``report_data.extra_metrics``.
|
| 318 |
+
from picarones.reports_v2.html.renderers.marginal_cost import (
|
| 319 |
+
build_marginal_cost_html,
|
| 320 |
+
)
|
| 321 |
+
from picarones.reports_v2.html.renderers.rare_token_recall import (
|
| 322 |
+
build_rare_token_recall_html,
|
| 323 |
+
)
|
| 324 |
+
from picarones.reports_v2.html.renderers.taxonomy_cooccurrence import (
|
| 325 |
+
build_taxonomy_cooccurrence_html,
|
| 326 |
+
)
|
| 327 |
+
from picarones.reports_v2.html.renderers.taxonomy_intra_doc import (
|
| 328 |
+
build_taxonomy_intra_doc_html,
|
| 329 |
+
)
|
| 330 |
+
|
| 331 |
+
# Spécialisation : construit une map {engine: counts} depuis les
|
| 332 |
+
# ``aggregated_taxonomy`` ; un moteur sans taxonomie est exclu.
|
| 333 |
+
taxos: dict = {}
|
| 334 |
+
for eng in engines:
|
| 335 |
+
tax = eng.get("aggregated_taxonomy")
|
| 336 |
+
if isinstance(tax, dict):
|
| 337 |
+
counts = tax.get("counts") if "counts" in tax else tax
|
| 338 |
+
if isinstance(counts, dict) and counts:
|
| 339 |
+
taxos[eng.get("name", "?")] = {
|
| 340 |
+
k: float(v) for k, v in counts.items()
|
| 341 |
+
if isinstance(v, (int, float))
|
| 342 |
+
}
|
| 343 |
+
|
| 344 |
+
return {
|
| 345 |
+
# Sprint 37
|
| 346 |
+
"divergence_matrix_html": build_divergence_matrix_html(
|
| 347 |
+
report_data.get("inter_engine_analysis"), labels=labels,
|
| 348 |
+
),
|
| 349 |
+
"oracle_gap_html": build_oracle_gap_html(
|
| 350 |
+
report_data.get("inter_engine_analysis"), labels=labels,
|
| 351 |
+
),
|
| 352 |
+
# Sprint 41
|
| 353 |
+
"ner_summary_html": build_ner_summary_html(engines, labels=labels),
|
| 354 |
+
"ner_per_category_html": build_ner_per_category_html(engines, labels=labels),
|
| 355 |
+
# Sprint 43
|
| 356 |
+
"calibration_summary_html": build_calibration_summary_html(
|
| 357 |
+
engines, labels=labels,
|
| 358 |
+
),
|
| 359 |
+
"reliability_diagrams_html": build_reliability_diagrams_grid_html(
|
| 360 |
+
engines, labels=labels,
|
| 361 |
+
),
|
| 362 |
+
# Sprint 46
|
| 363 |
+
"stratified_ranking_html": build_stratified_ranking_html(
|
| 364 |
+
report_data.get("stratified_ranking"),
|
| 365 |
+
report_data.get("available_strata"),
|
| 366 |
+
report_data.get("corpus_homogeneity"),
|
| 367 |
+
labels=labels,
|
| 368 |
+
),
|
| 369 |
+
# Sprint 62
|
| 370 |
+
"philological_profile_html": build_philological_profile_html(
|
| 371 |
+
engines, labels=labels,
|
| 372 |
+
),
|
| 373 |
+
# Sprint 86
|
| 374 |
+
"searchability_html": build_searchability_summary_html(
|
| 375 |
+
engines, labels=labels,
|
| 376 |
+
),
|
| 377 |
+
"numerical_sequences_html": build_numerical_sequences_html(
|
| 378 |
+
engines, labels=labels,
|
| 379 |
+
),
|
| 380 |
+
# Sprint 87
|
| 381 |
+
"readability_html": build_readability_summary_html(
|
| 382 |
+
engines, labels=labels,
|
| 383 |
+
),
|
| 384 |
+
# Sprint 89
|
| 385 |
+
"specialization_html": build_specialization_html(taxos, labels=labels),
|
| 386 |
+
# Chantier 3 — vues thématiques composées
|
| 387 |
+
"economics_view_html": build_economics_view_html(
|
| 388 |
+
report_data, labels=labels,
|
| 389 |
+
engine_reports=self.benchmark.engine_reports,
|
| 390 |
+
),
|
| 391 |
+
"advanced_taxonomy_view_html": build_advanced_taxonomy_view_html(
|
| 392 |
+
report_data, labels=labels,
|
| 393 |
+
),
|
| 394 |
+
"diagnostics_view_html": build_diagnostics_view_html(
|
| 395 |
+
report_data, labels=labels,
|
| 396 |
+
),
|
| 397 |
+
# Sprint « câblage des modules test-only » (mai 2026) :
|
| 398 |
+
# 4 nouvelles sections pour les modules câblés en
|
| 399 |
+
# ``report_data.extra_metrics``. Adaptive : "" si pas de signal.
|
| 400 |
+
"taxonomy_cooccurrence_html": build_taxonomy_cooccurrence_html(
|
| 401 |
+
report_data.get("taxonomy_cooccurrence"), labels=labels,
|
| 402 |
+
),
|
| 403 |
+
"taxonomy_intra_doc_html": build_taxonomy_intra_doc_html(
|
| 404 |
+
report_data.get("taxonomy_intra_doc"), labels=labels,
|
| 405 |
+
),
|
| 406 |
+
"rare_token_recall_html": build_rare_token_recall_html(
|
| 407 |
+
report_data.get("rare_token_recall"), labels=labels,
|
| 408 |
+
),
|
| 409 |
+
"marginal_cost_html": build_marginal_cost_html(
|
| 410 |
+
report_data.get("marginal_cost"), labels=labels,
|
| 411 |
+
),
|
| 412 |
+
}
|
| 413 |
+
|
| 414 |
+
@classmethod
|
| 415 |
+
def from_json(cls, json_path: str | Path, **kwargs) -> "ReportGenerator":
|
| 416 |
+
"""Crée un générateur depuis un fichier JSON de résultats.
|
| 417 |
+
|
| 418 |
+
Compatible avec les fichiers produits par ``BenchmarkResult.to_json()``.
|
| 419 |
+
Les images base64 doivent être passées via ``kwargs["images_b64"]``
|
| 420 |
+
si elles ne sont pas dans le JSON.
|
| 421 |
+
"""
|
| 422 |
+
import json as _json
|
| 423 |
+
|
| 424 |
+
data = _json.loads(Path(json_path).read_text(encoding="utf-8"))
|
| 425 |
+
|
| 426 |
+
# Reconstruction minimale d'un BenchmarkResult depuis le dict
|
| 427 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 428 |
+
from picarones.evaluation.benchmark_result import DocumentResult, EngineReport
|
| 429 |
+
|
| 430 |
+
engine_reports = []
|
| 431 |
+
for er_data in data.get("engine_reports", []):
|
| 432 |
+
doc_results = []
|
| 433 |
+
for dr_data in er_data.get("document_results", []):
|
| 434 |
+
m = dr_data["metrics"]
|
| 435 |
+
metrics = MetricsResult(
|
| 436 |
+
cer=m["cer"], cer_nfc=m["cer_nfc"], cer_caseless=m["cer_caseless"],
|
| 437 |
+
wer=m["wer"], wer_normalized=m["wer_normalized"],
|
| 438 |
+
mer=m["mer"], wil=m["wil"],
|
| 439 |
+
reference_length=m["reference_length"],
|
| 440 |
+
hypothesis_length=m["hypothesis_length"],
|
| 441 |
+
error=m.get("error"),
|
| 442 |
+
)
|
| 443 |
+
doc_results.append(DocumentResult(
|
| 444 |
+
doc_id=dr_data["doc_id"],
|
| 445 |
+
image_path=dr_data["image_path"],
|
| 446 |
+
ground_truth=dr_data["ground_truth"],
|
| 447 |
+
hypothesis=dr_data["hypothesis"],
|
| 448 |
+
metrics=metrics,
|
| 449 |
+
duration_seconds=dr_data.get("duration_seconds", 0.0),
|
| 450 |
+
engine_error=dr_data.get("engine_error"),
|
| 451 |
+
))
|
| 452 |
+
engine_reports.append(EngineReport(
|
| 453 |
+
engine_name=er_data["engine_name"],
|
| 454 |
+
engine_version=er_data.get("engine_version", "unknown"),
|
| 455 |
+
engine_config=er_data.get("engine_config", {}),
|
| 456 |
+
document_results=doc_results,
|
| 457 |
+
))
|
| 458 |
+
|
| 459 |
+
corpus_info = data.get("corpus", {})
|
| 460 |
+
bm = BenchmarkResult(
|
| 461 |
+
corpus_name=corpus_info.get("name", "Corpus"),
|
| 462 |
+
corpus_source=corpus_info.get("source"),
|
| 463 |
+
document_count=corpus_info.get("document_count", 0),
|
| 464 |
+
engine_reports=engine_reports,
|
| 465 |
+
run_date=data.get("run_date", ""),
|
| 466 |
+
picarones_version=data.get("picarones_version", ""),
|
| 467 |
+
metadata=data.get("metadata", {}),
|
| 468 |
+
)
|
| 469 |
+
|
| 470 |
+
images_b64 = kwargs.pop("images_b64", {})
|
| 471 |
+
return cls(bm, images_b64=images_b64, **kwargs)
|
|
@@ -0,0 +1,281 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Snapshots de reproductibilité pour le rapport HTML (Sprint 27).
|
| 2 |
+
|
| 3 |
+
Phase 5.E — module relocalisé depuis ``picarones.report.snapshot``
|
| 4 |
+
vers ``picarones.reports_v2.html.snapshot``. Le chemin legacy
|
| 5 |
+
reste disponible via un shim avec ``DeprecationWarning`` ;
|
| 6 |
+
suppression prévue en 2.0.
|
| 7 |
+
|
| 8 |
+
Le rapport HTML auto-contenu doit pouvoir être *rejoué* sans avoir
|
| 9 |
+
accès au code source du moment où il a été généré : un lecteur en
|
| 10 |
+
2026 doit pouvoir comprendre exactement quelle table de prix, quelle
|
| 11 |
+
définition de métrique, quel profil de normalisation, et quelle
|
| 12 |
+
version de Picarones ont produit les chiffres affichés.
|
| 13 |
+
|
| 14 |
+
Avant le Sprint 27, le rapport intégrait uniquement
|
| 15 |
+
``pareto.pricing_meta.last_updated`` — une simple date de mise à jour
|
| 16 |
+
qui ne disait rien sur le contenu de la table. Si quelqu'un modifiait
|
| 17 |
+
``picarones/data/pricing.yaml`` après génération, il était impossible
|
| 18 |
+
de reconstituer ce qu'avait vu le lecteur du rapport.
|
| 19 |
+
|
| 20 |
+
Quatre snapshots sont produits par ce module et embarqués dans
|
| 21 |
+
``report_data.snapshots`` :
|
| 22 |
+
|
| 23 |
+
- ``pricing`` — YAML brut intégral de la table de prix.
|
| 24 |
+
- ``glossary`` — entrées du glossaire pour la langue du rapport.
|
| 25 |
+
- ``normalization`` — profil de normalisation effectivement appliqué.
|
| 26 |
+
- ``environment`` — version Picarones, Python, plateforme, commit git
|
| 27 |
+
si dispo, liste figée des dépendances installées.
|
| 28 |
+
|
| 29 |
+
Garanties
|
| 30 |
+
---------
|
| 31 |
+
- **Déterminisme** : sur entrées identiques, ``snapshot_all()`` produit
|
| 32 |
+
un dict bit-à-bit identique. Les listes sont triées, les timestamps
|
| 33 |
+
sont absents.
|
| 34 |
+
- **Pas d'effet de bord** : le module ne modifie aucun état global ;
|
| 35 |
+
les chemins YAML sont uniquement lus, jamais écrits.
|
| 36 |
+
- **Dégradé non bloquant** : si pyyaml est absent, si ``pricing.yaml``
|
| 37 |
+
n'existe pas, si git n'est pas installé, le snapshot retourne un
|
| 38 |
+
dict ``{"available": False, "reason": "..."}`` plutôt que de lever.
|
| 39 |
+
"""
|
| 40 |
+
|
| 41 |
+
from __future__ import annotations
|
| 42 |
+
|
| 43 |
+
import logging
|
| 44 |
+
import platform
|
| 45 |
+
import subprocess
|
| 46 |
+
import sys
|
| 47 |
+
from importlib.metadata import distributions
|
| 48 |
+
from pathlib import Path
|
| 49 |
+
from typing import Any, Optional
|
| 50 |
+
|
| 51 |
+
def _resolve_picarones_version() -> str:
|
| 52 |
+
"""Récupère la version courante de Picarones sans importer le
|
| 53 |
+
package racine (interdit depuis ``reports_v2/`` par layer-deps)."""
|
| 54 |
+
try:
|
| 55 |
+
from importlib.metadata import version as _get_version
|
| 56 |
+
return _get_version("picarones")
|
| 57 |
+
except Exception: # noqa: BLE001
|
| 58 |
+
return "1.0.0"
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
__version__ = _resolve_picarones_version()
|
| 62 |
+
|
| 63 |
+
logger = logging.getLogger(__name__)
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
# ---------------------------------------------------------------------------
|
| 67 |
+
# Pricing snapshot
|
| 68 |
+
# ---------------------------------------------------------------------------
|
| 69 |
+
|
| 70 |
+
def pricing_snapshot(pricing_path: Optional[Path] = None) -> dict[str, Any]:
|
| 71 |
+
"""Retourne le YAML brut + dict parsé de la table de prix utilisée.
|
| 72 |
+
|
| 73 |
+
Si ``pricing_path`` n'est pas fourni, utilise le chemin par défaut
|
| 74 |
+
de ``picarones.measurements.pricing._DEFAULT_PRICING_PATH``.
|
| 75 |
+
"""
|
| 76 |
+
if pricing_path is None:
|
| 77 |
+
try:
|
| 78 |
+
from picarones.evaluation.metrics.pricing import _DEFAULT_PRICING_PATH
|
| 79 |
+
pricing_path = _DEFAULT_PRICING_PATH
|
| 80 |
+
except ImportError:
|
| 81 |
+
return {"available": False, "reason": "module pricing introuvable"}
|
| 82 |
+
|
| 83 |
+
pricing_path = Path(pricing_path)
|
| 84 |
+
if not pricing_path.exists():
|
| 85 |
+
return {
|
| 86 |
+
"available": False,
|
| 87 |
+
"reason": f"pricing.yaml introuvable : {pricing_path}",
|
| 88 |
+
"expected_path": str(pricing_path),
|
| 89 |
+
}
|
| 90 |
+
|
| 91 |
+
try:
|
| 92 |
+
raw = pricing_path.read_text(encoding="utf-8")
|
| 93 |
+
except OSError as exc:
|
| 94 |
+
return {
|
| 95 |
+
"available": False,
|
| 96 |
+
"reason": f"lecture impossible : {exc}",
|
| 97 |
+
"expected_path": str(pricing_path),
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
try:
|
| 101 |
+
import yaml
|
| 102 |
+
data = yaml.safe_load(raw) or {}
|
| 103 |
+
except (ImportError, Exception) as exc:
|
| 104 |
+
# Pas de yaml ou parsing en échec — on garde le brut quand même.
|
| 105 |
+
logger.warning("[snapshot] parsing pricing.yaml échoué : %s", exc)
|
| 106 |
+
data = {}
|
| 107 |
+
|
| 108 |
+
return {
|
| 109 |
+
"available": True,
|
| 110 |
+
"source_path": str(pricing_path),
|
| 111 |
+
"filename": pricing_path.name,
|
| 112 |
+
"size_bytes": len(raw.encode("utf-8")),
|
| 113 |
+
"raw_yaml": raw,
|
| 114 |
+
"data": data,
|
| 115 |
+
}
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
# ---------------------------------------------------------------------------
|
| 119 |
+
# Glossary snapshot
|
| 120 |
+
# ---------------------------------------------------------------------------
|
| 121 |
+
|
| 122 |
+
def glossary_snapshot(
|
| 123 |
+
lang: str = "fr",
|
| 124 |
+
used_keys: Optional[list[str] | set[str]] = None,
|
| 125 |
+
) -> dict[str, Any]:
|
| 126 |
+
"""Retourne les entrées du glossaire qui figurent dans le rapport.
|
| 127 |
+
|
| 128 |
+
``used_keys`` permet de ne snapshotter que les termes effectivement
|
| 129 |
+
référencés (réduit la taille). ``None`` → toutes les entrées de la
|
| 130 |
+
langue (mode conservateur).
|
| 131 |
+
"""
|
| 132 |
+
try:
|
| 133 |
+
from picarones.reports_v2.glossary import load_glossary, SUPPORTED_LANGS
|
| 134 |
+
except ImportError:
|
| 135 |
+
return {"available": False, "reason": "module glossary introuvable"}
|
| 136 |
+
|
| 137 |
+
full = load_glossary(lang) or {}
|
| 138 |
+
if not full:
|
| 139 |
+
return {
|
| 140 |
+
"available": False,
|
| 141 |
+
"reason": f"aucune entrée pour lang={lang!r}",
|
| 142 |
+
"supported_langs": SUPPORTED_LANGS,
|
| 143 |
+
}
|
| 144 |
+
|
| 145 |
+
if used_keys is not None:
|
| 146 |
+
keys = set(used_keys)
|
| 147 |
+
entries = {k: v for k, v in full.items() if k in keys}
|
| 148 |
+
else:
|
| 149 |
+
entries = dict(full)
|
| 150 |
+
|
| 151 |
+
# Tri pour reproductibilité bit-à-bit.
|
| 152 |
+
entries_sorted = {k: entries[k] for k in sorted(entries)}
|
| 153 |
+
|
| 154 |
+
return {
|
| 155 |
+
"available": True,
|
| 156 |
+
"lang": lang,
|
| 157 |
+
"entry_count": len(entries_sorted),
|
| 158 |
+
"entries": entries_sorted,
|
| 159 |
+
}
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
# ---------------------------------------------------------------------------
|
| 163 |
+
# Normalization profile snapshot
|
| 164 |
+
# ---------------------------------------------------------------------------
|
| 165 |
+
|
| 166 |
+
def normalization_snapshot(profile: Any) -> dict[str, Any]:
|
| 167 |
+
"""Sérialise un ``NormalizationProfile``.
|
| 168 |
+
|
| 169 |
+
Couvre les profils built-in (``medieval_french``, ``nfc``, …) et les
|
| 170 |
+
profils custom YAML chargés au runtime — l'objectif est qu'un
|
| 171 |
+
lecteur du rapport puisse régénérer exactement la même
|
| 172 |
+
normalisation à partir de ce snapshot.
|
| 173 |
+
"""
|
| 174 |
+
if profile is None:
|
| 175 |
+
return {"available": False, "reason": "aucun profil fourni"}
|
| 176 |
+
|
| 177 |
+
# NormalizationProfile est un dataclass — on accède aux champs par
|
| 178 |
+
# nom plutôt que via ``asdict`` pour bien contrôler le format.
|
| 179 |
+
try:
|
| 180 |
+
return {
|
| 181 |
+
"available": True,
|
| 182 |
+
"name": getattr(profile, "name", "unknown"),
|
| 183 |
+
"nfc": bool(getattr(profile, "nfc", True)),
|
| 184 |
+
"caseless": bool(getattr(profile, "caseless", False)),
|
| 185 |
+
"diplomatic_table": dict(getattr(profile, "diplomatic_table", {}) or {}),
|
| 186 |
+
"exclude_chars": sorted(getattr(profile, "exclude_chars", set()) or set()),
|
| 187 |
+
"description": getattr(profile, "description", ""),
|
| 188 |
+
}
|
| 189 |
+
except Exception as exc:
|
| 190 |
+
return {"available": False, "reason": f"sérialisation échouée : {exc}"}
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
# ---------------------------------------------------------------------------
|
| 194 |
+
# Environment snapshot
|
| 195 |
+
# ---------------------------------------------------------------------------
|
| 196 |
+
|
| 197 |
+
def _git_commit(repo_path: Optional[Path] = None) -> Optional[str]:
|
| 198 |
+
"""Retourne le commit git court (12 chars) si on est dans un repo, sinon None."""
|
| 199 |
+
cwd = repo_path or Path(__file__).resolve().parents[2]
|
| 200 |
+
try:
|
| 201 |
+
out = subprocess.check_output(
|
| 202 |
+
["git", "rev-parse", "HEAD"],
|
| 203 |
+
cwd=str(cwd),
|
| 204 |
+
stderr=subprocess.DEVNULL,
|
| 205 |
+
text=True,
|
| 206 |
+
timeout=2,
|
| 207 |
+
).strip()
|
| 208 |
+
return out[:12] if out else None
|
| 209 |
+
except (subprocess.CalledProcessError, FileNotFoundError, subprocess.TimeoutExpired):
|
| 210 |
+
return None
|
| 211 |
+
|
| 212 |
+
|
| 213 |
+
def _installed_packages(limit: int = 200) -> list[str]:
|
| 214 |
+
"""Liste figée des paquets installés au format ``name==version``.
|
| 215 |
+
|
| 216 |
+
Triée par nom (case-insensitive) pour reproductibilité. Cappée à
|
| 217 |
+
``limit`` paquets pour ne pas exploser le poids du rapport.
|
| 218 |
+
"""
|
| 219 |
+
try:
|
| 220 |
+
pkgs: list[str] = []
|
| 221 |
+
seen: set[str] = set()
|
| 222 |
+
for d in distributions():
|
| 223 |
+
try:
|
| 224 |
+
name = (d.metadata.get("Name") or "").strip()
|
| 225 |
+
version = (d.version or "").strip()
|
| 226 |
+
except Exception:
|
| 227 |
+
continue
|
| 228 |
+
if not name or name.lower() in seen:
|
| 229 |
+
continue
|
| 230 |
+
seen.add(name.lower())
|
| 231 |
+
pkgs.append(f"{name}=={version}")
|
| 232 |
+
pkgs.sort(key=str.lower)
|
| 233 |
+
return pkgs[:limit]
|
| 234 |
+
except Exception as exc: # pragma: no cover — défense en profondeur
|
| 235 |
+
logger.warning("[snapshot] enum dépendances échoué : %s", exc)
|
| 236 |
+
return []
|
| 237 |
+
|
| 238 |
+
|
| 239 |
+
def environment_snapshot(repo_path: Optional[Path] = None) -> dict[str, Any]:
|
| 240 |
+
"""Retourne version Picarones, Python, plateforme, commit, deps figées."""
|
| 241 |
+
return {
|
| 242 |
+
"available": True,
|
| 243 |
+
"picarones_version": __version__,
|
| 244 |
+
"python_version": platform.python_version(),
|
| 245 |
+
"python_implementation": platform.python_implementation(),
|
| 246 |
+
"platform": platform.platform(),
|
| 247 |
+
"executable": sys.executable,
|
| 248 |
+
"git_commit": _git_commit(repo_path),
|
| 249 |
+
"installed_packages": _installed_packages(),
|
| 250 |
+
}
|
| 251 |
+
|
| 252 |
+
|
| 253 |
+
# ---------------------------------------------------------------------------
|
| 254 |
+
# API agrégée
|
| 255 |
+
# ---------------------------------------------------------------------------
|
| 256 |
+
|
| 257 |
+
def snapshot_all(
|
| 258 |
+
*,
|
| 259 |
+
lang: str = "fr",
|
| 260 |
+
glossary_used_keys: Optional[list[str] | set[str]] = None,
|
| 261 |
+
pricing_path: Optional[Path] = None,
|
| 262 |
+
normalization_profile: Any = None,
|
| 263 |
+
repo_path: Optional[Path] = None,
|
| 264 |
+
) -> dict[str, Any]:
|
| 265 |
+
"""Construit le bloc ``snapshots`` à embarquer dans ``report_data``."""
|
| 266 |
+
return {
|
| 267 |
+
"pricing": pricing_snapshot(pricing_path=pricing_path),
|
| 268 |
+
"glossary": glossary_snapshot(lang=lang, used_keys=glossary_used_keys),
|
| 269 |
+
"normalization": normalization_snapshot(normalization_profile),
|
| 270 |
+
"environment": environment_snapshot(repo_path=repo_path),
|
| 271 |
+
"schema_version": 1,
|
| 272 |
+
}
|
| 273 |
+
|
| 274 |
+
|
| 275 |
+
__all__ = [
|
| 276 |
+
"pricing_snapshot",
|
| 277 |
+
"glossary_snapshot",
|
| 278 |
+
"normalization_snapshot",
|
| 279 |
+
"environment_snapshot",
|
| 280 |
+
"snapshot_all",
|
| 281 |
+
]
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
@@ -0,0 +1,132 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Labels i18n pour le rapport HTML et l'interface Picarones.
|
| 2 |
+
|
| 3 |
+
Phase 5.E — module relocalisé depuis ``picarones.i18n`` vers
|
| 4 |
+
``picarones.reports_v2.i18n``. Le chemin legacy reste disponible
|
| 5 |
+
via un shim avec ``DeprecationWarning`` ; suppression prévue en 2.0.
|
| 6 |
+
|
| 7 |
+
Langues supportées
|
| 8 |
+
------------------
|
| 9 |
+
- ``"fr"`` : français (défaut)
|
| 10 |
+
- ``"en"`` : anglais patrimonial (heritage English)
|
| 11 |
+
|
| 12 |
+
Depuis le Sprint 17, les traductions sont stockées dans des fichiers
|
| 13 |
+
JSON et chargées au premier accès. ``TRANSLATIONS`` reste exposé
|
| 14 |
+
comme dict pour compatibilité ascendante.
|
| 15 |
+
|
| 16 |
+
Sprint 30 — durcissement
|
| 17 |
+
------------------------
|
| 18 |
+
- Chargement lazy + thread-safe via verrou explicite ; les serveurs
|
| 19 |
+
web sous charge concurrente ne peuvent plus initialiser deux fois.
|
| 20 |
+
- ``reload_translations()`` exposé pour les tests qui modifient les
|
| 21 |
+
fichiers JSON à la volée.
|
| 22 |
+
- ``get_labels()`` mémoizé via ``functools.lru_cache`` pour absorber
|
| 23 |
+
le fallback ``lang → fr`` sans relire le dict à chaque appel.
|
| 24 |
+
"""
|
| 25 |
+
|
| 26 |
+
from __future__ import annotations
|
| 27 |
+
|
| 28 |
+
import json
|
| 29 |
+
import logging
|
| 30 |
+
import threading
|
| 31 |
+
from functools import lru_cache
|
| 32 |
+
from pathlib import Path
|
| 33 |
+
|
| 34 |
+
logger = logging.getLogger(__name__)
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
_I18N_DIR = Path(__file__).parent
|
| 38 |
+
_LOAD_LOCK = threading.Lock()
|
| 39 |
+
_TRANSLATIONS_CACHE: dict[str, dict[str, str]] | None = None
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
def _load_translations() -> dict[str, dict[str, str]]:
|
| 43 |
+
"""Charge tous les fichiers JSON du dossier i18n.
|
| 44 |
+
|
| 45 |
+
Un fichier ``{lang}.json`` définit les labels de la langue ``lang``.
|
| 46 |
+
Retourne toujours un dict non-vide, même si le dossier est manquant
|
| 47 |
+
(dans ce cas, le dict est vide et ``get_labels`` tombe sur un fallback).
|
| 48 |
+
"""
|
| 49 |
+
translations: dict[str, dict[str, str]] = {}
|
| 50 |
+
if not _I18N_DIR.is_dir():
|
| 51 |
+
return translations
|
| 52 |
+
for path in sorted(_I18N_DIR.glob("*.json")):
|
| 53 |
+
lang = path.stem
|
| 54 |
+
try:
|
| 55 |
+
with path.open(encoding="utf-8") as fh:
|
| 56 |
+
translations[lang] = json.load(fh)
|
| 57 |
+
except (OSError, json.JSONDecodeError) as e:
|
| 58 |
+
logger.warning("[i18n] fichier '%s' ignoré : %s", path, e)
|
| 59 |
+
return translations
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
def _get_translations() -> dict[str, dict[str, str]]:
|
| 63 |
+
"""Retourne le cache de translations, initialisé une seule fois.
|
| 64 |
+
|
| 65 |
+
Thread-safe : deux threads qui appellent simultanément en démarrage
|
| 66 |
+
ne déclencheront qu'une seule lecture disque.
|
| 67 |
+
"""
|
| 68 |
+
global _TRANSLATIONS_CACHE
|
| 69 |
+
if _TRANSLATIONS_CACHE is not None:
|
| 70 |
+
return _TRANSLATIONS_CACHE
|
| 71 |
+
with _LOAD_LOCK:
|
| 72 |
+
if _TRANSLATIONS_CACHE is None:
|
| 73 |
+
_TRANSLATIONS_CACHE = _load_translations()
|
| 74 |
+
return _TRANSLATIONS_CACHE
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
def reload_translations() -> None:
|
| 78 |
+
"""Force la relecture des fichiers JSON au prochain ``get_labels``.
|
| 79 |
+
|
| 80 |
+
Utile pour les tests qui modifient ``reports_v2/i18n/*.json`` à la volée.
|
| 81 |
+
"""
|
| 82 |
+
global _TRANSLATIONS_CACHE
|
| 83 |
+
with _LOAD_LOCK:
|
| 84 |
+
_TRANSLATIONS_CACHE = None
|
| 85 |
+
_get_labels_cached.cache_clear()
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
@lru_cache(maxsize=None)
|
| 89 |
+
def _get_labels_cached(lang: str) -> tuple[tuple[str, str], ...]:
|
| 90 |
+
"""Cache mémoïsé : ``lang -> tuple ordonné des paires``.
|
| 91 |
+
|
| 92 |
+
Le retour en tuple permet à ``lru_cache`` de mémoriser sans
|
| 93 |
+
contrainte de hashabilité, et est trivialement converti en dict
|
| 94 |
+
par ``get_labels`` à chaque appel (coût O(n)).
|
| 95 |
+
"""
|
| 96 |
+
translations = _get_translations()
|
| 97 |
+
labels = translations.get(lang) or translations.get("fr") or {}
|
| 98 |
+
return tuple(labels.items())
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
def get_labels(lang: str = "fr") -> dict[str, str]:
|
| 102 |
+
"""Retourne le dictionnaire de labels pour la langue donnée.
|
| 103 |
+
|
| 104 |
+
Parameters
|
| 105 |
+
----------
|
| 106 |
+
lang:
|
| 107 |
+
Code langue : ``"fr"`` (défaut) ou ``"en"``.
|
| 108 |
+
|
| 109 |
+
Returns
|
| 110 |
+
-------
|
| 111 |
+
dict
|
| 112 |
+
Labels traduits. Toujours valide : bascule sur ``"fr"`` si lang inconnu.
|
| 113 |
+
Si ``"fr"`` lui-même manque, retourne un dict vide (comportement dégradé
|
| 114 |
+
mais non bloquant).
|
| 115 |
+
"""
|
| 116 |
+
return dict(_get_labels_cached(lang))
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
# ``TRANSLATIONS`` reste accessible comme attribut module pour les
|
| 120 |
+
# consommateurs externes qui le lisaient directement. Initialisé
|
| 121 |
+
# paresseusement à l'import — n'engendre **pas** de lecture si le
|
| 122 |
+
# module n'est jamais utilisé.
|
| 123 |
+
TRANSLATIONS: dict[str, dict[str, str]] = _get_translations()
|
| 124 |
+
SUPPORTED_LANGS: list[str] = list(TRANSLATIONS.keys())
|
| 125 |
+
|
| 126 |
+
|
| 127 |
+
__all__ = [
|
| 128 |
+
"TRANSLATIONS",
|
| 129 |
+
"SUPPORTED_LANGS",
|
| 130 |
+
"get_labels",
|
| 131 |
+
"reload_translations",
|
| 132 |
+
]
|
|
@@ -226,7 +226,7 @@ def run_benchmark_thread_v2(job: BenchmarkJob, req: BenchmarkRunRequest) -> None
|
|
| 226 |
return
|
| 227 |
|
| 228 |
job.add_event("log", {"message": "Génération du rapport HTML…"})
|
| 229 |
-
from picarones.
|
| 230 |
gen = ReportGenerator(result, lang=req.report_lang)
|
| 231 |
gen.generate(output_html)
|
| 232 |
|
|
@@ -334,7 +334,7 @@ def run_benchmark_thread(job: BenchmarkJob, req: BenchmarkRequest) -> None:
|
|
| 334 |
return
|
| 335 |
|
| 336 |
job.add_event("log", {"message": "Génération du rapport HTML…"})
|
| 337 |
-
from picarones.
|
| 338 |
report_lang = getattr(req, "report_lang", "fr")
|
| 339 |
gen = ReportGenerator(result, lang=report_lang)
|
| 340 |
gen.generate(output_html)
|
|
|
|
| 226 |
return
|
| 227 |
|
| 228 |
job.add_event("log", {"message": "Génération du rapport HTML…"})
|
| 229 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 230 |
gen = ReportGenerator(result, lang=req.report_lang)
|
| 231 |
gen.generate(output_html)
|
| 232 |
|
|
|
|
| 334 |
return
|
| 335 |
|
| 336 |
job.add_event("log", {"message": "Génération du rapport HTML…"})
|
| 337 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 338 |
report_lang = getattr(req, "report_lang", "fr")
|
| 339 |
gen = ReportGenerator(result, lang=report_lang)
|
| 340 |
gen.generate(output_html)
|
|
@@ -49,7 +49,9 @@ FILE_BUDGETS: dict[str, int] = {
|
|
| 49 |
# et le sous-package picarones/report/report_data/. Budget serré
|
| 50 |
# à 500 pour verrouiller le gain ; toute croissance > 500 sera
|
| 51 |
# un signal pour redécouper.
|
| 52 |
-
|
|
|
|
|
|
|
| 53 |
# --- Fichiers métier larges.
|
| 54 |
"picarones/measurements/robustness.py": 850, # actuel 731
|
| 55 |
# Phase 5.C.batch7 : ``report/pipeline_render.py`` est désormais
|
|
@@ -144,7 +146,9 @@ FILE_BUDGETS: dict[str, int] = {
|
|
| 144 |
# L'ancien emplacement est désormais un re-export ; le contenu
|
| 145 |
# canonique vit ici.
|
| 146 |
"picarones/formats/text/normalization.py": 500, # actuel 420
|
| 147 |
-
|
|
|
|
|
|
|
| 148 |
# --- Module mutualisé créé par le sprint des render helpers
|
| 149 |
# (Sprint « consolidation des renderers » 2026-05-02). Budget
|
| 150 |
# calibré sur la taille post-documentation des conventions.
|
|
|
|
| 49 |
# et le sous-package picarones/report/report_data/. Budget serré
|
| 50 |
# à 500 pour verrouiller le gain ; toute croissance > 500 sera
|
| 51 |
# un signal pour redécouper.
|
| 52 |
+
# Phase 5.E : ``report/generator.py`` est désormais un shim ;
|
| 53 |
+
# canonique dans ``reports_v2/html/generator.py``.
|
| 54 |
+
"picarones/reports_v2/html/generator.py": 550, # actuel 471
|
| 55 |
# --- Fichiers métier larges.
|
| 56 |
"picarones/measurements/robustness.py": 850, # actuel 731
|
| 57 |
# Phase 5.C.batch7 : ``report/pipeline_render.py`` est désormais
|
|
|
|
| 146 |
# L'ancien emplacement est désormais un re-export ; le contenu
|
| 147 |
# canonique vit ici.
|
| 148 |
"picarones/formats/text/normalization.py": 500, # actuel 420
|
| 149 |
+
# Phase 5.E : ``report/comparison.py`` est désormais un shim ;
|
| 150 |
+
# canonique dans ``reports_v2/html/comparison.py``.
|
| 151 |
+
"picarones/reports_v2/html/comparison.py": 500, # actuel 414
|
| 152 |
# --- Module mutualisé créé par le sprint des render helpers
|
| 153 |
# (Sprint « consolidation des renderers » 2026-05-02). Budget
|
| 154 |
# calibré sur la taille post-documentation des conventions.
|
|
@@ -94,6 +94,13 @@ TEST_ONLY_BASELINE: frozenset[str] = frozenset({
|
|
| 94 |
"image_predictive",
|
| 95 |
"worst_lines",
|
| 96 |
"throughput",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
})
|
| 98 |
|
| 99 |
|
|
|
|
| 94 |
"image_predictive",
|
| 95 |
"worst_lines",
|
| 96 |
"throughput",
|
| 97 |
+
# Phase 5.E : 3 modules supplémentaires consommés uniquement
|
| 98 |
+
# par les renderers/views/data migrés vers
|
| 99 |
+
# ``reports_v2/html/`` qui importent désormais le canonique
|
| 100 |
+
# directement.
|
| 101 |
+
"statistics",
|
| 102 |
+
"pricing",
|
| 103 |
+
"difficulty",
|
| 104 |
})
|
| 105 |
|
| 106 |
|
|
@@ -62,7 +62,7 @@ def _make_fake_benchmark():
|
|
| 62 |
|
| 63 |
def _generate_html(bm=None) -> str:
|
| 64 |
"""Génère le HTML complet du rapport pour un BenchmarkResult minimal."""
|
| 65 |
-
from picarones.
|
| 66 |
import tempfile
|
| 67 |
import os
|
| 68 |
if bm is None:
|
|
|
|
| 62 |
|
| 63 |
def _generate_html(bm=None) -> str:
|
| 64 |
"""Génère le HTML complet du rapport pour un BenchmarkResult minimal."""
|
| 65 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 66 |
import tempfile
|
| 67 |
import os
|
| 68 |
if bm is None:
|
|
@@ -386,7 +386,7 @@ class TestReportWithPipeline:
|
|
| 386 |
@pytest.fixture(scope="class")
|
| 387 |
def report_data(self):
|
| 388 |
from picarones.fixtures import generate_sample_benchmark
|
| 389 |
-
from picarones.
|
| 390 |
bm = generate_sample_benchmark(n_docs=3, seed=42)
|
| 391 |
images_b64 = bm.metadata.get("_images_b64", {})
|
| 392 |
return _build_report_data(bm, images_b64)
|
|
@@ -433,7 +433,7 @@ class TestReportWithPipeline:
|
|
| 433 |
|
| 434 |
def test_html_contains_pipeline_tag(self, tmp_path):
|
| 435 |
from picarones.fixtures import generate_sample_benchmark
|
| 436 |
-
from picarones.
|
| 437 |
bm = generate_sample_benchmark(n_docs=3, seed=42)
|
| 438 |
out = tmp_path / "report.html"
|
| 439 |
ReportGenerator(bm).generate(out)
|
|
|
|
| 386 |
@pytest.fixture(scope="class")
|
| 387 |
def report_data(self):
|
| 388 |
from picarones.fixtures import generate_sample_benchmark
|
| 389 |
+
from picarones.reports_v2.html.generator import _build_report_data
|
| 390 |
bm = generate_sample_benchmark(n_docs=3, seed=42)
|
| 391 |
images_b64 = bm.metadata.get("_images_b64", {})
|
| 392 |
return _build_report_data(bm, images_b64)
|
|
|
|
| 433 |
|
| 434 |
def test_html_contains_pipeline_tag(self, tmp_path):
|
| 435 |
from picarones.fixtures import generate_sample_benchmark
|
| 436 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 437 |
bm = generate_sample_benchmark(n_docs=3, seed=42)
|
| 438 |
out = tmp_path / "report.html"
|
| 439 |
ReportGenerator(bm).generate(out)
|
|
@@ -790,7 +790,7 @@ class TestReportDiplomaticCER:
|
|
| 790 |
def test_report_data_has_cer_diplomatic(self):
|
| 791 |
"""_build_report_data doit inclure cer_diplomatic dans engines_summary."""
|
| 792 |
from picarones.fixtures import generate_sample_benchmark
|
| 793 |
-
from picarones.
|
| 794 |
|
| 795 |
bm = generate_sample_benchmark()
|
| 796 |
data = _build_report_data(bm, images_b64={})
|
|
@@ -805,7 +805,7 @@ class TestReportDiplomaticCER:
|
|
| 805 |
def test_html_contains_cer_diplo_column(self, tmp_path):
|
| 806 |
"""Le HTML généré doit contenir la colonne CER diplo."""
|
| 807 |
from picarones.fixtures import generate_sample_benchmark
|
| 808 |
-
from picarones.
|
| 809 |
|
| 810 |
bm = generate_sample_benchmark()
|
| 811 |
out = tmp_path / "report_test.html"
|
|
@@ -818,7 +818,7 @@ class TestReportDiplomaticCER:
|
|
| 818 |
def test_html_contains_medieval_graphie_indicator(self, tmp_path):
|
| 819 |
"""Le rapport doit mentionner les graphies médiévales (ſ=s ou u=v)."""
|
| 820 |
from picarones.fixtures import generate_sample_benchmark
|
| 821 |
-
from picarones.
|
| 822 |
|
| 823 |
bm = generate_sample_benchmark()
|
| 824 |
out = tmp_path / "report_test.html"
|
|
|
|
| 790 |
def test_report_data_has_cer_diplomatic(self):
|
| 791 |
"""_build_report_data doit inclure cer_diplomatic dans engines_summary."""
|
| 792 |
from picarones.fixtures import generate_sample_benchmark
|
| 793 |
+
from picarones.reports_v2.html.generator import _build_report_data
|
| 794 |
|
| 795 |
bm = generate_sample_benchmark()
|
| 796 |
data = _build_report_data(bm, images_b64={})
|
|
|
|
| 805 |
def test_html_contains_cer_diplo_column(self, tmp_path):
|
| 806 |
"""Le HTML généré doit contenir la colonne CER diplo."""
|
| 807 |
from picarones.fixtures import generate_sample_benchmark
|
| 808 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 809 |
|
| 810 |
bm = generate_sample_benchmark()
|
| 811 |
out = tmp_path / "report_test.html"
|
|
|
|
| 818 |
def test_html_contains_medieval_graphie_indicator(self, tmp_path):
|
| 819 |
"""Le rapport doit mentionner les graphies médiévales (ſ=s ou u=v)."""
|
| 820 |
from picarones.fixtures import generate_sample_benchmark
|
| 821 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 822 |
|
| 823 |
bm = generate_sample_benchmark()
|
| 824 |
out = tmp_path / "report_test.html"
|
|
@@ -230,39 +230,39 @@ class TestI18nModule:
|
|
| 230 |
"""Vérifie le module picarones.i18n."""
|
| 231 |
|
| 232 |
def test_get_labels_fr(self):
|
| 233 |
-
from picarones.i18n import get_labels
|
| 234 |
labels = get_labels("fr")
|
| 235 |
assert labels["tab_ranking"] == "Classement"
|
| 236 |
assert labels["html_lang"] == "fr"
|
| 237 |
assert labels["date_locale"] == "fr-FR"
|
| 238 |
|
| 239 |
def test_get_labels_en(self):
|
| 240 |
-
from picarones.i18n import get_labels
|
| 241 |
labels = get_labels("en")
|
| 242 |
assert labels["tab_ranking"] == "Ranking"
|
| 243 |
assert labels["html_lang"] == "en"
|
| 244 |
assert labels["date_locale"] == "en-GB"
|
| 245 |
|
| 246 |
def test_get_labels_fallback(self):
|
| 247 |
-
from picarones.i18n import get_labels
|
| 248 |
# Langue inconnue → bascule sur fr
|
| 249 |
labels = get_labels("de")
|
| 250 |
assert labels["tab_ranking"] == "Classement"
|
| 251 |
|
| 252 |
def test_all_fr_keys_present_in_en(self):
|
| 253 |
-
from picarones.i18n import TRANSLATIONS
|
| 254 |
fr_keys = set(TRANSLATIONS["fr"].keys())
|
| 255 |
en_keys = set(TRANSLATIONS["en"].keys())
|
| 256 |
missing = fr_keys - en_keys
|
| 257 |
assert not missing, f"Clés présentes en FR mais absentes en EN : {missing}"
|
| 258 |
|
| 259 |
def test_supported_langs(self):
|
| 260 |
-
from picarones.i18n import SUPPORTED_LANGS
|
| 261 |
assert "fr" in SUPPORTED_LANGS
|
| 262 |
assert "en" in SUPPORTED_LANGS
|
| 263 |
|
| 264 |
def test_footer_labels(self):
|
| 265 |
-
from picarones.i18n import get_labels
|
| 266 |
fr = get_labels("fr")
|
| 267 |
en = get_labels("en")
|
| 268 |
assert "footer_generated" in fr
|
|
@@ -270,7 +270,7 @@ class TestI18nModule:
|
|
| 270 |
assert fr["footer_generated"] != en["footer_generated"]
|
| 271 |
|
| 272 |
def test_hallucination_labels_translated(self):
|
| 273 |
-
from picarones.i18n import get_labels
|
| 274 |
en = get_labels("en")
|
| 275 |
assert "detected" in en["hall_detected"].lower()
|
| 276 |
assert "⚠" in en["hall_detected"]
|
|
@@ -286,7 +286,7 @@ class TestEnglishReport:
|
|
| 286 |
@pytest.fixture(scope="class")
|
| 287 |
def english_html(self, tmp_path_factory):
|
| 288 |
from picarones.fixtures import generate_sample_benchmark
|
| 289 |
-
from picarones.
|
| 290 |
|
| 291 |
bm = generate_sample_benchmark(n_docs=3, seed=42)
|
| 292 |
tmp = tmp_path_factory.mktemp("report_en")
|
|
@@ -298,7 +298,7 @@ class TestEnglishReport:
|
|
| 298 |
@pytest.fixture(scope="class")
|
| 299 |
def french_html(self, tmp_path_factory):
|
| 300 |
from picarones.fixtures import generate_sample_benchmark
|
| 301 |
-
from picarones.
|
| 302 |
|
| 303 |
bm = generate_sample_benchmark(n_docs=3, seed=42)
|
| 304 |
tmp = tmp_path_factory.mktemp("report_fr")
|
|
@@ -357,14 +357,14 @@ class TestEnglishReport:
|
|
| 357 |
|
| 358 |
def test_report_generator_default_lang_is_fr(self):
|
| 359 |
from picarones.fixtures import generate_sample_benchmark
|
| 360 |
-
from picarones.
|
| 361 |
bm = generate_sample_benchmark(n_docs=2, seed=1)
|
| 362 |
gen = ReportGenerator(bm)
|
| 363 |
assert gen.lang == "fr"
|
| 364 |
|
| 365 |
def test_report_generator_lang_en(self):
|
| 366 |
from picarones.fixtures import generate_sample_benchmark
|
| 367 |
-
from picarones.
|
| 368 |
bm = generate_sample_benchmark(n_docs=2, seed=1)
|
| 369 |
gen = ReportGenerator(bm, lang="en")
|
| 370 |
assert gen.lang == "en"
|
|
|
|
| 230 |
"""Vérifie le module picarones.i18n."""
|
| 231 |
|
| 232 |
def test_get_labels_fr(self):
|
| 233 |
+
from picarones.reports_v2.i18n import get_labels
|
| 234 |
labels = get_labels("fr")
|
| 235 |
assert labels["tab_ranking"] == "Classement"
|
| 236 |
assert labels["html_lang"] == "fr"
|
| 237 |
assert labels["date_locale"] == "fr-FR"
|
| 238 |
|
| 239 |
def test_get_labels_en(self):
|
| 240 |
+
from picarones.reports_v2.i18n import get_labels
|
| 241 |
labels = get_labels("en")
|
| 242 |
assert labels["tab_ranking"] == "Ranking"
|
| 243 |
assert labels["html_lang"] == "en"
|
| 244 |
assert labels["date_locale"] == "en-GB"
|
| 245 |
|
| 246 |
def test_get_labels_fallback(self):
|
| 247 |
+
from picarones.reports_v2.i18n import get_labels
|
| 248 |
# Langue inconnue → bascule sur fr
|
| 249 |
labels = get_labels("de")
|
| 250 |
assert labels["tab_ranking"] == "Classement"
|
| 251 |
|
| 252 |
def test_all_fr_keys_present_in_en(self):
|
| 253 |
+
from picarones.reports_v2.i18n import TRANSLATIONS
|
| 254 |
fr_keys = set(TRANSLATIONS["fr"].keys())
|
| 255 |
en_keys = set(TRANSLATIONS["en"].keys())
|
| 256 |
missing = fr_keys - en_keys
|
| 257 |
assert not missing, f"Clés présentes en FR mais absentes en EN : {missing}"
|
| 258 |
|
| 259 |
def test_supported_langs(self):
|
| 260 |
+
from picarones.reports_v2.i18n import SUPPORTED_LANGS
|
| 261 |
assert "fr" in SUPPORTED_LANGS
|
| 262 |
assert "en" in SUPPORTED_LANGS
|
| 263 |
|
| 264 |
def test_footer_labels(self):
|
| 265 |
+
from picarones.reports_v2.i18n import get_labels
|
| 266 |
fr = get_labels("fr")
|
| 267 |
en = get_labels("en")
|
| 268 |
assert "footer_generated" in fr
|
|
|
|
| 270 |
assert fr["footer_generated"] != en["footer_generated"]
|
| 271 |
|
| 272 |
def test_hallucination_labels_translated(self):
|
| 273 |
+
from picarones.reports_v2.i18n import get_labels
|
| 274 |
en = get_labels("en")
|
| 275 |
assert "detected" in en["hall_detected"].lower()
|
| 276 |
assert "⚠" in en["hall_detected"]
|
|
|
|
| 286 |
@pytest.fixture(scope="class")
|
| 287 |
def english_html(self, tmp_path_factory):
|
| 288 |
from picarones.fixtures import generate_sample_benchmark
|
| 289 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 290 |
|
| 291 |
bm = generate_sample_benchmark(n_docs=3, seed=42)
|
| 292 |
tmp = tmp_path_factory.mktemp("report_en")
|
|
|
|
| 298 |
@pytest.fixture(scope="class")
|
| 299 |
def french_html(self, tmp_path_factory):
|
| 300 |
from picarones.fixtures import generate_sample_benchmark
|
| 301 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 302 |
|
| 303 |
bm = generate_sample_benchmark(n_docs=3, seed=42)
|
| 304 |
tmp = tmp_path_factory.mktemp("report_fr")
|
|
|
|
| 357 |
|
| 358 |
def test_report_generator_default_lang_is_fr(self):
|
| 359 |
from picarones.fixtures import generate_sample_benchmark
|
| 360 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 361 |
bm = generate_sample_benchmark(n_docs=2, seed=1)
|
| 362 |
gen = ReportGenerator(bm)
|
| 363 |
assert gen.lang == "fr"
|
| 364 |
|
| 365 |
def test_report_generator_lang_en(self):
|
| 366 |
from picarones.fixtures import generate_sample_benchmark
|
| 367 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 368 |
bm = generate_sample_benchmark(n_docs=2, seed=1)
|
| 369 |
gen = ReportGenerator(bm, lang="en")
|
| 370 |
assert gen.lang == "en"
|
|
@@ -29,20 +29,20 @@ ROOT = Path(__file__).parent.parent.parent
|
|
| 29 |
|
| 30 |
class TestI18nCache:
|
| 31 |
def test_get_labels_returns_dict(self):
|
| 32 |
-
from picarones.i18n import get_labels
|
| 33 |
labels = get_labels("fr")
|
| 34 |
assert isinstance(labels, dict)
|
| 35 |
assert len(labels) > 5
|
| 36 |
|
| 37 |
def test_get_labels_unknown_falls_back_to_fr(self):
|
| 38 |
-
from picarones.i18n import get_labels
|
| 39 |
fr = get_labels("fr")
|
| 40 |
unknown = get_labels("xx-pas-existante")
|
| 41 |
# Le fallback doit être le contenu fr
|
| 42 |
assert unknown == fr
|
| 43 |
|
| 44 |
def test_get_labels_cached(self):
|
| 45 |
-
from picarones import i18n
|
| 46 |
i18n.reload_translations()
|
| 47 |
# Premier appel — peuple le cache
|
| 48 |
i18n.get_labels("fr")
|
|
@@ -54,7 +54,7 @@ class TestI18nCache:
|
|
| 54 |
assert info_after.hits > info_before.hits
|
| 55 |
|
| 56 |
def test_reload_translations_clears_cache(self):
|
| 57 |
-
from picarones import i18n
|
| 58 |
i18n.get_labels("fr")
|
| 59 |
info_before = i18n._get_labels_cached.cache_info()
|
| 60 |
assert info_before.currsize >= 1
|
|
@@ -117,7 +117,7 @@ class TestSafeVersionLogsDebug:
|
|
| 117 |
|
| 118 |
class TestBadgesAccessibility:
|
| 119 |
def test_app_js_exposes_tier_helpers(self):
|
| 120 |
-
path = ROOT / "picarones" / "
|
| 121 |
src = path.read_text(encoding="utf-8")
|
| 122 |
for fn in ("cerTier", "cerTierIcon", "cerTierLabel"):
|
| 123 |
assert f"function {fn}" in src, (
|
|
@@ -125,7 +125,7 @@ class TestBadgesAccessibility:
|
|
| 125 |
)
|
| 126 |
|
| 127 |
def test_styles_define_tier_patterns(self):
|
| 128 |
-
path = ROOT / "picarones" / "
|
| 129 |
src = path.read_text(encoding="utf-8")
|
| 130 |
for tier in ("excellent", "acceptable", "mediocre", "critical"):
|
| 131 |
assert f'data-cer-tier="{tier}"' in src, (
|
|
@@ -138,7 +138,7 @@ class TestBadgesAccessibility:
|
|
| 138 |
assert "border: 1.5px double" in src
|
| 139 |
|
| 140 |
def test_main_badge_carries_data_attr_and_aria(self):
|
| 141 |
-
path = ROOT / "picarones" / "
|
| 142 |
src = path.read_text(encoding="utf-8")
|
| 143 |
assert "setAttribute('data-cer-tier'" in src
|
| 144 |
assert "setAttribute('aria-label'" in src
|
|
@@ -206,7 +206,7 @@ class TestChangelogAndSpecsUpdated:
|
|
| 206 |
class TestGeneratedReportCarriesA11y:
|
| 207 |
def test_generated_html_embeds_tier_helpers(self, tmp_path):
|
| 208 |
from picarones import fixtures
|
| 209 |
-
from picarones.
|
| 210 |
|
| 211 |
b = fixtures.generate_sample_benchmark(n_docs=4)
|
| 212 |
out = tmp_path / "rapport.html"
|
|
|
|
| 29 |
|
| 30 |
class TestI18nCache:
|
| 31 |
def test_get_labels_returns_dict(self):
|
| 32 |
+
from picarones.reports_v2.i18n import get_labels
|
| 33 |
labels = get_labels("fr")
|
| 34 |
assert isinstance(labels, dict)
|
| 35 |
assert len(labels) > 5
|
| 36 |
|
| 37 |
def test_get_labels_unknown_falls_back_to_fr(self):
|
| 38 |
+
from picarones.reports_v2.i18n import get_labels
|
| 39 |
fr = get_labels("fr")
|
| 40 |
unknown = get_labels("xx-pas-existante")
|
| 41 |
# Le fallback doit être le contenu fr
|
| 42 |
assert unknown == fr
|
| 43 |
|
| 44 |
def test_get_labels_cached(self):
|
| 45 |
+
from picarones.reports_v2 import i18n
|
| 46 |
i18n.reload_translations()
|
| 47 |
# Premier appel — peuple le cache
|
| 48 |
i18n.get_labels("fr")
|
|
|
|
| 54 |
assert info_after.hits > info_before.hits
|
| 55 |
|
| 56 |
def test_reload_translations_clears_cache(self):
|
| 57 |
+
from picarones.reports_v2 import i18n
|
| 58 |
i18n.get_labels("fr")
|
| 59 |
info_before = i18n._get_labels_cached.cache_info()
|
| 60 |
assert info_before.currsize >= 1
|
|
|
|
| 117 |
|
| 118 |
class TestBadgesAccessibility:
|
| 119 |
def test_app_js_exposes_tier_helpers(self):
|
| 120 |
+
path = ROOT / "picarones" / "reports_v2" / "html" / "templates" / "_app.js"
|
| 121 |
src = path.read_text(encoding="utf-8")
|
| 122 |
for fn in ("cerTier", "cerTierIcon", "cerTierLabel"):
|
| 123 |
assert f"function {fn}" in src, (
|
|
|
|
| 125 |
)
|
| 126 |
|
| 127 |
def test_styles_define_tier_patterns(self):
|
| 128 |
+
path = ROOT / "picarones" / "reports_v2" / "html" / "templates" / "_styles.css"
|
| 129 |
src = path.read_text(encoding="utf-8")
|
| 130 |
for tier in ("excellent", "acceptable", "mediocre", "critical"):
|
| 131 |
assert f'data-cer-tier="{tier}"' in src, (
|
|
|
|
| 138 |
assert "border: 1.5px double" in src
|
| 139 |
|
| 140 |
def test_main_badge_carries_data_attr_and_aria(self):
|
| 141 |
+
path = ROOT / "picarones" / "reports_v2" / "html" / "templates" / "_app.js"
|
| 142 |
src = path.read_text(encoding="utf-8")
|
| 143 |
assert "setAttribute('data-cer-tier'" in src
|
| 144 |
assert "setAttribute('aria-label'" in src
|
|
|
|
| 206 |
class TestGeneratedReportCarriesA11y:
|
| 207 |
def test_generated_html_embeds_tier_helpers(self, tmp_path):
|
| 208 |
from picarones import fixtures
|
| 209 |
+
from picarones.reports_v2.html.generator import ReportGenerator
|
| 210 |
|
| 211 |
b = fixtures.generate_sample_benchmark(n_docs=4)
|
| 212 |
out = tmp_path / "rapport.html"
|