Spaces:
Running
feat(evaluation): Sprint A14-S27 — découpage ProjectionEngine + EvaluationEngine
Browse filesLe S13 fusionnait dans DefaultEvaluationViewExecutor deux responsabilités
distinctes : transformer un artefact d'un type vers un autre (« projeter »)
**et** calculer les métriques sur les payloads (« évaluer »). La cible
architecturale les sépare en deux moteurs spécialisés à responsabilité unique.
Nouveaux moteurs
----------------
- ProjectionEngine (picarones/evaluation/projection_engine.py)
· Délègue au ProjectorRegistry, gère identité (spec=None ou
source==target) et erreurs (introuvable, lève → ProjectionError).
· Retourne ProjectionResult(artifact, payload, report) frozen.
- EvaluationEngine (picarones/evaluation/evaluation_engine.py)
· Délègue au MetricRegistry, dispatch erreurs métrique dans
failed_metrics, métrique inconnue → message explicite.
· Retourne EvaluationResult(metric_values, failed_metrics) frozen
avec helpers n_succeeded/n_failed/all_succeeded/with_global_failure.
· Sucre evaluate_one() pour les callers à métrique unique
(ex : pipeline executor sur jonction unique S28+).
Refactor de l'executor
----------------------
- DefaultEvaluationViewExecutor.__init__ canonique attend désormais
(projection_engine, evaluation_engine, payload_loader). La séquence
d'orchestration (type-check → project → load → normalize → evaluate
→ ViewResult) reste identique mais déléguée.
- Classmethod from_registries(metric_registry, projector_registry,
payload_loader) reste exposée comme sucre ergonomique pour callers
qui n'ont pas envie de fabriquer eux-mêmes les deux engines.
Migration des 14 callsites
--------------------------
- run_orchestrator.py : .from_registries(...)
- 13 fichiers de test : .from_registries(...)
- 3 tests TestConstructor renommés vers le nouveau contrat (engines)
+ 3 nouveaux tests TestConstructor pour from_registries.
Tests S27 dédiés (21 nouveaux)
------------------------------
- ProjectionEngine : constructeur, identité (None / source==target),
nominal triplet, projecteur introuvable, lève → ProjectionError
wrappé, ProjectionError native non re-wrappé.
- EvaluationEngine : constructeur, all_succeed, métrique non-zéro,
sucre evaluate_one, ordre préservé, métrique inconnue, métrique
qui lève, liste vide.
- Dataclasses frozen + with_global_failure + has_projection.
Tests legacy S13 + intégration : 100 % préservés (couverture
indirecte de la délégation).
Pourquoi cette séparation
-------------------------
- Réutilisation : le PipelineExecutor (S28+) appellera
ProjectionEngine.project directement quand il transforme un
artefact entre étapes du DAG, sans dépendre de l'executor de vue.
- Testabilité : on teste la projection sans construire de view ;
on teste la collecte d'erreurs métriques sans projecteur ni view.
- Découplage : l'executor n'a plus de logique métier — uniquement
la séquence + l'agrégation finale dans ViewResult.
Tests : 4527 passed, 11 skipped, 0 failed (vs 4504 avant : +21 S27
+ 2 nouveaux TestConstructor — 0 régression).
Lint : ruff check picarones/ tests/ → All checks passed.
https://claude.ai/code/session_011XQZNitg1rCgia8ZD1a2hP
- README.md +1 -1
- picarones/app/services/run_orchestrator.py +1 -1
- picarones/evaluation/__init__.py +15 -1
- picarones/evaluation/evaluation_engine.py +177 -0
- picarones/evaluation/projection_engine.py +174 -0
- picarones/evaluation/views/executor.py +117 -135
- tests/cli/test_sprint_a14_s22_app_cli.py +1 -1
- tests/evaluation/test_sprint_a14_s13_view_executor.py +41 -9
- tests/evaluation/test_sprint_a14_s16_views_consistency.py +1 -1
- tests/evaluation/test_sprint_a14_s25_projector_payload.py +4 -4
- tests/evaluation/test_sprint_a14_s27_engines.py +352 -0
- tests/evaluation/views/test_sprint_a14_s14_text_view.py +4 -2
- tests/evaluation/views/test_sprint_a14_s15_alto_view.py +1 -1
- tests/evaluation/views/test_sprint_a14_s16_search_view.py +1 -1
- tests/integration/test_sprint_a14_s17_full_run.py +3 -1
- tests/integration/test_sprint_a14_s18_bnf_e2e.py +3 -1
- tests/integration/test_sprint_a14_s21_report_service.py +2 -2
- tests/integration/test_sprint_a14_s23_registry_service.py +1 -1
|
@@ -396,7 +396,7 @@ ruff check picarones/ tests/
|
|
| 396 |
python -m mypy picarones/core/
|
| 397 |
```
|
| 398 |
|
| 399 |
-
**Test suite**: ~
|
| 400 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 401 |
requiring live HTTP. A handful of tests depend on optional engines
|
| 402 |
(`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when
|
|
|
|
| 396 |
python -m mypy picarones/core/
|
| 397 |
```
|
| 398 |
|
| 399 |
+
**Test suite**: ~4540 tests, ~3 min on a modern laptop. Coverage
|
| 400 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 401 |
requiring live HTTP. A handful of tests depend on optional engines
|
| 402 |
(`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when
|
|
@@ -346,7 +346,7 @@ class RunOrchestrator:
|
|
| 346 |
timeout_seconds_per_doc=300.0,
|
| 347 |
poll_interval_seconds=0.05,
|
| 348 |
)
|
| 349 |
-
view_executor = DefaultEvaluationViewExecutor(
|
| 350 |
registries.metrics,
|
| 351 |
registries.projectors,
|
| 352 |
_filesystem_payload_loader,
|
|
|
|
| 346 |
timeout_seconds_per_doc=300.0,
|
| 347 |
poll_interval_seconds=0.05,
|
| 348 |
)
|
| 349 |
+
view_executor = DefaultEvaluationViewExecutor.from_registries(
|
| 350 |
registries.metrics,
|
| 351 |
registries.projectors,
|
| 352 |
_filesystem_payload_loader,
|
|
@@ -31,4 +31,18 @@ rewrite ciblé (Sprints S13-S18).
|
|
| 31 |
|
| 32 |
from __future__ import annotations
|
| 33 |
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
from __future__ import annotations
|
| 33 |
|
| 34 |
+
from picarones.evaluation.evaluation_engine import (
|
| 35 |
+
EvaluationEngine,
|
| 36 |
+
EvaluationResult,
|
| 37 |
+
)
|
| 38 |
+
from picarones.evaluation.projection_engine import (
|
| 39 |
+
ProjectionEngine,
|
| 40 |
+
ProjectionResult,
|
| 41 |
+
)
|
| 42 |
+
|
| 43 |
+
__all__ = [
|
| 44 |
+
"EvaluationEngine",
|
| 45 |
+
"EvaluationResult",
|
| 46 |
+
"ProjectionEngine",
|
| 47 |
+
"ProjectionResult",
|
| 48 |
+
]
|
|
@@ -0,0 +1,177 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""``EvaluationEngine`` — Sprint A14-S27.
|
| 2 |
+
|
| 3 |
+
Pendant de ``ProjectionEngine`` (cf. ``projection_engine.py``).
|
| 4 |
+
Le S13 fusionnait dans ``DefaultEvaluationViewExecutor`` projection
|
| 5 |
+
**et** évaluation ; la cible architecturale les sépare en deux
|
| 6 |
+
moteurs spécialisés à responsabilité unique.
|
| 7 |
+
|
| 8 |
+
``EvaluationEngine`` calcule un ensemble nommé de métriques sur
|
| 9 |
+
une paire ``(reference, hypothesis)`` de payloads. Une métrique
|
| 10 |
+
qui lève en interne va dans ``failed_metrics`` au lieu de planter
|
| 11 |
+
l'évaluation complète — l'erreur est capturée et associée au nom
|
| 12 |
+
de la métrique.
|
| 13 |
+
|
| 14 |
+
Pourquoi cette séparation
|
| 15 |
+
-------------------------
|
| 16 |
+
- **Réutilisation** : le ``PipelineExecutor`` (S28+) peut appeler
|
| 17 |
+
``EvaluationEngine.evaluate`` pour des métriques de jonction
|
| 18 |
+
intra-pipeline (ex : « score de stabilité entre deux étapes ») sans
|
| 19 |
+
passer par un ``EvaluationView``.
|
| 20 |
+
- **Testabilité** : on teste la collecte d'erreurs (métrique cassée,
|
| 21 |
+
métrique inconnue) sans instancier de vue ni de projecteur.
|
| 22 |
+
- **Découplage** : ``EvaluationEngine`` ne sait rien des artefacts,
|
| 23 |
+
des projections, des vues — il prend des payloads bruts.
|
| 24 |
+
|
| 25 |
+
Anti-sur-ingénierie
|
| 26 |
+
-------------------
|
| 27 |
+
Pas de batch (évaluer N paires en une passe), pas de cache de
|
| 28 |
+
payload normalisé, pas de pré-tri des métriques. Le moteur est
|
| 29 |
+
volontairement minimal — la complexité vit dans les métriques
|
| 30 |
+
elles-mêmes (cf. ``picarones/evaluation/metrics/``).
|
| 31 |
+
"""
|
| 32 |
+
|
| 33 |
+
from __future__ import annotations
|
| 34 |
+
|
| 35 |
+
from dataclasses import dataclass, field
|
| 36 |
+
from typing import Any
|
| 37 |
+
|
| 38 |
+
from picarones.evaluation.registry import (
|
| 39 |
+
MetricNotFoundError,
|
| 40 |
+
MetricRegistry,
|
| 41 |
+
)
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
@dataclass(frozen=True)
|
| 45 |
+
class EvaluationResult:
|
| 46 |
+
"""Résultat d'un appel à ``EvaluationEngine.evaluate``.
|
| 47 |
+
|
| 48 |
+
Attributes
|
| 49 |
+
----------
|
| 50 |
+
metric_values:
|
| 51 |
+
Métriques calculées avec succès, ``{name: value}``.
|
| 52 |
+
failed_metrics:
|
| 53 |
+
Métriques qui ont échoué, ``{name: error_message}``. Les
|
| 54 |
+
deux dicts sont disjoints : une métrique apparaît dans l'un
|
| 55 |
+
ou l'autre, jamais les deux.
|
| 56 |
+
|
| 57 |
+
Notes
|
| 58 |
+
-----
|
| 59 |
+
Frozen dataclass : container immuable ; les dicts internes le
|
| 60 |
+
sont aussi grâce à ``field(default_factory=dict)`` qu'on ne
|
| 61 |
+
mute pas après construction. Le caller doit considérer les
|
| 62 |
+
dicts comme lecture seule.
|
| 63 |
+
"""
|
| 64 |
+
|
| 65 |
+
metric_values: dict[str, Any] = field(default_factory=dict)
|
| 66 |
+
failed_metrics: dict[str, str] = field(default_factory=dict)
|
| 67 |
+
|
| 68 |
+
@property
|
| 69 |
+
def n_succeeded(self) -> int:
|
| 70 |
+
return len(self.metric_values)
|
| 71 |
+
|
| 72 |
+
@property
|
| 73 |
+
def n_failed(self) -> int:
|
| 74 |
+
return len(self.failed_metrics)
|
| 75 |
+
|
| 76 |
+
@property
|
| 77 |
+
def all_succeeded(self) -> bool:
|
| 78 |
+
return self.n_failed == 0
|
| 79 |
+
|
| 80 |
+
def with_global_failure(self, error: str) -> "EvaluationResult":
|
| 81 |
+
"""Retourne un nouveau ``EvaluationResult`` où **toutes** les
|
| 82 |
+
métriques portent le même message d'erreur global. Utile à
|
| 83 |
+
un caller qui constate qu'un payload n'a pas pu être chargé
|
| 84 |
+
et veut marquer l'évaluation entière en échec."""
|
| 85 |
+
return EvaluationResult(
|
| 86 |
+
metric_values={},
|
| 87 |
+
failed_metrics={
|
| 88 |
+
name: error
|
| 89 |
+
for name in (
|
| 90 |
+
list(self.metric_values) + list(self.failed_metrics)
|
| 91 |
+
)
|
| 92 |
+
},
|
| 93 |
+
)
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
class EvaluationEngine:
|
| 97 |
+
"""Moteur de calcul de métriques sur une paire de payloads.
|
| 98 |
+
|
| 99 |
+
Responsabilité unique : prendre un ``MetricRegistry``, une liste
|
| 100 |
+
de noms de métriques, et une paire ``(reference, hypothesis)``,
|
| 101 |
+
retourner un ``EvaluationResult``. Pas de connaissance des
|
| 102 |
+
artefacts, des projections, des vues.
|
| 103 |
+
|
| 104 |
+
Parameters
|
| 105 |
+
----------
|
| 106 |
+
metric_registry:
|
| 107 |
+
Registre des métriques, instancié explicitement au démarrage
|
| 108 |
+
(pas de singleton global, pas de side-effect d'import).
|
| 109 |
+
"""
|
| 110 |
+
|
| 111 |
+
def __init__(self, metric_registry: MetricRegistry) -> None:
|
| 112 |
+
if not isinstance(metric_registry, MetricRegistry):
|
| 113 |
+
raise TypeError(
|
| 114 |
+
"metric_registry doit être un MetricRegistry."
|
| 115 |
+
)
|
| 116 |
+
self._metrics = metric_registry
|
| 117 |
+
|
| 118 |
+
@property
|
| 119 |
+
def metrics(self) -> MetricRegistry:
|
| 120 |
+
"""Accès en lecture au registre sous-jacent (utile aux tests)."""
|
| 121 |
+
return self._metrics
|
| 122 |
+
|
| 123 |
+
def evaluate(
|
| 124 |
+
self,
|
| 125 |
+
metric_names: tuple[str, ...] | list[str],
|
| 126 |
+
reference: Any,
|
| 127 |
+
hypothesis: Any,
|
| 128 |
+
) -> EvaluationResult:
|
| 129 |
+
"""Calcule chaque métrique nommée sur la paire (référence, hypothèse).
|
| 130 |
+
|
| 131 |
+
Comportement :
|
| 132 |
+
|
| 133 |
+
- Une métrique enregistrée et qui retourne une valeur → entrée
|
| 134 |
+
dans ``metric_values``.
|
| 135 |
+
- Une métrique enregistrée qui lève une exception → entrée
|
| 136 |
+
dans ``failed_metrics`` avec le message ``f"{type}: {message}"``.
|
| 137 |
+
- Un nom de métrique non enregistré → entrée dans
|
| 138 |
+
``failed_metrics`` avec un message explicite.
|
| 139 |
+
|
| 140 |
+
L'ordre d'évaluation suit l'ordre de ``metric_names`` ; les
|
| 141 |
+
deux dicts résultats préservent cet ordre (Python 3.7+
|
| 142 |
+
garantit l'ordre d'insertion sur les ``dict``).
|
| 143 |
+
"""
|
| 144 |
+
metric_values: dict[str, Any] = {}
|
| 145 |
+
failed_metrics: dict[str, str] = {}
|
| 146 |
+
|
| 147 |
+
for name in metric_names:
|
| 148 |
+
try:
|
| 149 |
+
value = self._metrics.compute(name, reference, hypothesis)
|
| 150 |
+
metric_values[name] = value
|
| 151 |
+
except MetricNotFoundError as exc:
|
| 152 |
+
failed_metrics[name] = (
|
| 153 |
+
f"métrique non enregistrée dans le MetricRegistry : "
|
| 154 |
+
f"{exc}"
|
| 155 |
+
)
|
| 156 |
+
except Exception as exc: # noqa: BLE001
|
| 157 |
+
failed_metrics[name] = f"{type(exc).__name__}: {exc}"
|
| 158 |
+
|
| 159 |
+
return EvaluationResult(
|
| 160 |
+
metric_values=metric_values,
|
| 161 |
+
failed_metrics=failed_metrics,
|
| 162 |
+
)
|
| 163 |
+
|
| 164 |
+
def evaluate_one(
|
| 165 |
+
self,
|
| 166 |
+
metric_name: str,
|
| 167 |
+
reference: Any,
|
| 168 |
+
hypothesis: Any,
|
| 169 |
+
) -> EvaluationResult:
|
| 170 |
+
"""Cas particulier : une seule métrique. Sucre syntaxique sur
|
| 171 |
+
``evaluate``. Utile aux callers qui pilotent une jonction
|
| 172 |
+
unique (typiquement le pipeline executor sur une métrique de
|
| 173 |
+
jonction)."""
|
| 174 |
+
return self.evaluate((metric_name,), reference, hypothesis)
|
| 175 |
+
|
| 176 |
+
|
| 177 |
+
__all__ = ["EvaluationEngine", "EvaluationResult"]
|
|
@@ -0,0 +1,174 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""``ProjectionEngine`` — Sprint A14-S27.
|
| 2 |
+
|
| 3 |
+
Le S13 fusionnait dans ``DefaultEvaluationViewExecutor`` deux
|
| 4 |
+
responsabilités distinctes : transformer un artefact d'un type vers
|
| 5 |
+
un autre (« projeter ») **et** calculer les métriques sur les
|
| 6 |
+
payloads (« évaluer »). La cible architecturale les sépare en
|
| 7 |
+
deux moteurs spécialisés à responsabilité unique :
|
| 8 |
+
|
| 9 |
+
- ``ProjectionEngine`` (ce module) : transforme un ``Artifact``
|
| 10 |
+
candidat selon une ``ProjectionSpec`` et retourne le nouvel
|
| 11 |
+
artefact, son ``payload`` calculé, et un ``ProjectionReport``
|
| 12 |
+
documentant les pertes.
|
| 13 |
+
- ``EvaluationEngine`` (cf. ``evaluation_engine.py``) : calcule les
|
| 14 |
+
métriques sur des payloads.
|
| 15 |
+
|
| 16 |
+
L'executor de vue (``DefaultEvaluationViewExecutor``) orchestre les
|
| 17 |
+
deux : projection d'abord, puis chargement, normalisation, et
|
| 18 |
+
évaluation. Il ne contient plus de logique de projection ni de
|
| 19 |
+
calcul de métrique — uniquement la séquence et la collecte d'erreurs.
|
| 20 |
+
|
| 21 |
+
Pourquoi cette séparation
|
| 22 |
+
-------------------------
|
| 23 |
+
- **Réutilisation** : le ``PipelineExecutor`` (S28+) appelle
|
| 24 |
+
``ProjectionEngine.project`` directement quand il transforme un
|
| 25 |
+
artefact entre deux étapes du DAG, sans dépendre de l'executor de
|
| 26 |
+
vue.
|
| 27 |
+
- **Testabilité** : on peut tester la projection sur des artefacts
|
| 28 |
+
arbitraires sans construire un ``EvaluationView`` ni un
|
| 29 |
+
``MetricRegistry``.
|
| 30 |
+
- **Lisibilité** : chaque moteur expose une API minimale et
|
| 31 |
+
vérifiable au type.
|
| 32 |
+
|
| 33 |
+
Anti-sur-ingénierie
|
| 34 |
+
-------------------
|
| 35 |
+
Pas de cache de payload entre projections, pas de batch, pas de
|
| 36 |
+
pré-validation des params (le projecteur lui-même validera ce qu'il
|
| 37 |
+
attend). Le moteur est volontairement minimal — la complexité vit
|
| 38 |
+
dans les projecteurs (cf. ``picarones/evaluation/projectors/``).
|
| 39 |
+
"""
|
| 40 |
+
|
| 41 |
+
from __future__ import annotations
|
| 42 |
+
|
| 43 |
+
from dataclasses import dataclass
|
| 44 |
+
from typing import Any
|
| 45 |
+
|
| 46 |
+
from picarones.domain.artifacts import Artifact
|
| 47 |
+
from picarones.domain.errors import ProjectionError
|
| 48 |
+
from picarones.domain.projection_spec import ProjectionSpec
|
| 49 |
+
from picarones.evaluation.projectors.base import ProjectionReport
|
| 50 |
+
from picarones.evaluation.projectors.registry import (
|
| 51 |
+
ProjectorNotFoundError,
|
| 52 |
+
ProjectorRegistry,
|
| 53 |
+
)
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
@dataclass(frozen=True)
|
| 57 |
+
class ProjectionResult:
|
| 58 |
+
"""Résultat d'un appel à ``ProjectionEngine.project``.
|
| 59 |
+
|
| 60 |
+
Attributes
|
| 61 |
+
----------
|
| 62 |
+
artifact:
|
| 63 |
+
Artefact effectif après projection. Si la spec était
|
| 64 |
+
``None`` ou identité, c'est l'artefact d'entrée tel quel.
|
| 65 |
+
payload:
|
| 66 |
+
Payload calculé par le projecteur, ou ``None`` si aucune
|
| 67 |
+
projection n'a été effectuée (le caller chargera depuis
|
| 68 |
+
son ``payload_loader``).
|
| 69 |
+
report:
|
| 70 |
+
Rapport de projection si une projection a eu lieu, ou
|
| 71 |
+
``None`` pour une vue sans projection (identité).
|
| 72 |
+
|
| 73 |
+
Notes
|
| 74 |
+
-----
|
| 75 |
+
Frozen dataclass : aucune mutation post-construction. La
|
| 76 |
+
sérialisation passe par ``ProjectionReport`` (pydantic) qui sait
|
| 77 |
+
déjà se sérialiser ; ``ProjectionResult`` reste un container
|
| 78 |
+
interne entre engine et executor.
|
| 79 |
+
"""
|
| 80 |
+
|
| 81 |
+
artifact: Artifact
|
| 82 |
+
payload: Any | None
|
| 83 |
+
report: ProjectionReport | None
|
| 84 |
+
|
| 85 |
+
@property
|
| 86 |
+
def has_projection(self) -> bool:
|
| 87 |
+
"""Vrai si une projection effective a eu lieu (report présent)."""
|
| 88 |
+
return self.report is not None
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
class ProjectionEngine:
|
| 92 |
+
"""Moteur de projection d'artefacts selon une ``ProjectionSpec``.
|
| 93 |
+
|
| 94 |
+
Responsabilité unique : prendre un ``Artifact`` et une éventuelle
|
| 95 |
+
``ProjectionSpec``, retourner un ``ProjectionResult``. Pas de
|
| 96 |
+
chargement de payload depuis un loader externe (le projecteur
|
| 97 |
+
fournit le payload calculé directement, depuis Sprint S25). Pas
|
| 98 |
+
de connaissance des métriques ni des vues.
|
| 99 |
+
|
| 100 |
+
Parameters
|
| 101 |
+
----------
|
| 102 |
+
projector_registry:
|
| 103 |
+
Registre des projecteurs disponibles, instancié explicitement
|
| 104 |
+
au démarrage de l'application. Pas de singleton global, pas
|
| 105 |
+
de side-effect d'import.
|
| 106 |
+
"""
|
| 107 |
+
|
| 108 |
+
def __init__(self, projector_registry: ProjectorRegistry) -> None:
|
| 109 |
+
if not isinstance(projector_registry, ProjectorRegistry):
|
| 110 |
+
raise TypeError(
|
| 111 |
+
"projector_registry doit être un ProjectorRegistry."
|
| 112 |
+
)
|
| 113 |
+
self._projectors = projector_registry
|
| 114 |
+
|
| 115 |
+
@property
|
| 116 |
+
def projectors(self) -> ProjectorRegistry:
|
| 117 |
+
"""Accès en lecture au registre sous-jacent (utile aux tests)."""
|
| 118 |
+
return self._projectors
|
| 119 |
+
|
| 120 |
+
def project(
|
| 121 |
+
self,
|
| 122 |
+
artifact: Artifact,
|
| 123 |
+
spec: ProjectionSpec | None,
|
| 124 |
+
) -> ProjectionResult:
|
| 125 |
+
"""Applique la projection si pertinente.
|
| 126 |
+
|
| 127 |
+
Comportement :
|
| 128 |
+
|
| 129 |
+
- ``spec is None`` ou ``spec.is_identity`` →
|
| 130 |
+
``ProjectionResult`` avec l'artefact d'entrée tel quel,
|
| 131 |
+
``payload=None``, ``report=None``. Le caller utilisera
|
| 132 |
+
son payload_loader pour charger l'artefact original.
|
| 133 |
+
- Sinon : résout le projecteur dans le registre, exécute
|
| 134 |
+
``project()``, et retourne le ``ProjectionResult`` complet
|
| 135 |
+
avec payload calculé.
|
| 136 |
+
|
| 137 |
+
Raises
|
| 138 |
+
------
|
| 139 |
+
ProjectionError
|
| 140 |
+
Si le projecteur référencé n'est pas enregistré, ou si
|
| 141 |
+
le projecteur lève une exception interne (wrappée dans
|
| 142 |
+
une ``ProjectionError`` qui préserve la chaîne ``__cause__``).
|
| 143 |
+
"""
|
| 144 |
+
if spec is None or spec.is_identity:
|
| 145 |
+
return ProjectionResult(
|
| 146 |
+
artifact=artifact, payload=None, report=None,
|
| 147 |
+
)
|
| 148 |
+
|
| 149 |
+
try:
|
| 150 |
+
projector = self._projectors.get(spec.projector_name)
|
| 151 |
+
except ProjectorNotFoundError as exc:
|
| 152 |
+
raise ProjectionError(
|
| 153 |
+
f"Projecteur {spec.projector_name!r} introuvable "
|
| 154 |
+
"dans le ProjectorRegistry."
|
| 155 |
+
) from exc
|
| 156 |
+
|
| 157 |
+
try:
|
| 158 |
+
target, payload, report = projector.project(
|
| 159 |
+
artifact, dict(spec.params),
|
| 160 |
+
)
|
| 161 |
+
except ProjectionError:
|
| 162 |
+
raise
|
| 163 |
+
except Exception as exc: # noqa: BLE001
|
| 164 |
+
raise ProjectionError(
|
| 165 |
+
f"Projecteur {spec.projector_name!r} a levé sur "
|
| 166 |
+
f"l'artefact {artifact.id!r} : {exc}"
|
| 167 |
+
) from exc
|
| 168 |
+
|
| 169 |
+
return ProjectionResult(
|
| 170 |
+
artifact=target, payload=payload, report=report,
|
| 171 |
+
)
|
| 172 |
+
|
| 173 |
+
|
| 174 |
+
__all__ = ["ProjectionEngine", "ProjectionResult"]
|
|
@@ -1,36 +1,47 @@
|
|
| 1 |
-
"""``DefaultEvaluationViewExecutor`` — Sprint A14-S13.
|
| 2 |
|
| 3 |
Implémentation concrète du protocole ``EvaluationViewExecutor`` (S5).
|
| 4 |
-
|
|
|
|
|
|
|
| 5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
1. Vérifie que ``candidate.type`` est dans ``view.candidate_types``.
|
| 7 |
-
2.
|
| 8 |
-
``
|
| 9 |
-
|
| 10 |
3. Charge les payloads (texte, ALTO parsé, etc.) via le
|
| 11 |
-
``payload_loader`` injecté
|
|
|
|
| 12 |
4. Applique optionnellement un profil de normalisation texte
|
| 13 |
-
(``view.normalization_profile``)
|
| 14 |
-
5.
|
| 15 |
-
``
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
Anti-sur-ingénierie
|
| 27 |
-------------------
|
| 28 |
-
Pas de cache de payload chargé entre métriques (chaque
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
Pas de gestion de batch (évaluer N paires en une seule passe). À
|
| 33 |
-
ajouter quand un caller en a concrètement besoin.
|
| 34 |
"""
|
| 35 |
|
| 36 |
from __future__ import annotations
|
|
@@ -39,67 +50,86 @@ import logging
|
|
| 39 |
from typing import Any, Callable
|
| 40 |
|
| 41 |
from picarones.domain.artifacts import Artifact
|
| 42 |
-
from picarones.domain.errors import ProjectionError
|
| 43 |
from picarones.domain.evaluation_spec import EvaluationView
|
| 44 |
-
from picarones.evaluation.
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
from picarones.evaluation.registry import MetricRegistry, MetricNotFoundError
|
| 49 |
from picarones.evaluation.views.base import ViewResult
|
| 50 |
|
| 51 |
logger = logging.getLogger(__name__)
|
| 52 |
|
| 53 |
|
| 54 |
-
#: Sentinelle interne pour distinguer "pas de projection" de "projection
|
| 55 |
-
#: a retourné None comme payload" (cas pathologique mais théoriquement
|
| 56 |
-
#: possible). Ne jamais comparer avec ``==`` — toujours ``is``.
|
| 57 |
-
_UNSET = object()
|
| 58 |
-
|
| 59 |
-
|
| 60 |
#: Type alias : un payload loader prend un Artifact et retourne le
|
| 61 |
#: contenu chargé (str pour RAW_TEXT, dict pour ENTITIES, etc.).
|
| 62 |
PayloadLoader = Callable[[Artifact], Any]
|
| 63 |
|
| 64 |
|
| 65 |
class DefaultEvaluationViewExecutor:
|
| 66 |
-
"""
|
| 67 |
|
| 68 |
Parameters
|
| 69 |
----------
|
| 70 |
-
|
| 71 |
-
``
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
|
|
|
| 76 |
payload_loader:
|
| 77 |
Callable ``(Artifact) -> Any`` qui charge le contenu d'un
|
| 78 |
-
artefact
|
| 79 |
-
|
| 80 |
-
|
|
|
|
| 81 |
"""
|
| 82 |
|
| 83 |
def __init__(
|
| 84 |
self,
|
| 85 |
-
|
| 86 |
-
|
| 87 |
payload_loader: PayloadLoader,
|
| 88 |
) -> None:
|
| 89 |
-
if not isinstance(
|
| 90 |
raise TypeError(
|
| 91 |
-
"
|
| 92 |
)
|
| 93 |
-
if not isinstance(
|
| 94 |
raise TypeError(
|
| 95 |
-
"
|
| 96 |
)
|
| 97 |
if not callable(payload_loader):
|
| 98 |
raise TypeError("payload_loader doit être callable.")
|
| 99 |
-
self.
|
| 100 |
-
self.
|
| 101 |
self._loader = payload_loader
|
| 102 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
# ──────────────────────────────────────────────────────────────────
|
| 104 |
# API publique
|
| 105 |
# ──────────────────────────────────────────────────────────────────
|
|
@@ -115,21 +145,20 @@ class DefaultEvaluationViewExecutor:
|
|
| 115 |
Returns
|
| 116 |
-------
|
| 117 |
ViewResult
|
| 118 |
-
Toujours retourné
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
|
| 123 |
Raises
|
| 124 |
------
|
| 125 |
ProjectionError
|
| 126 |
-
Si la vue exige une projection que le projecteur ne
|
| 127 |
-
pas réaliser (
|
| 128 |
-
projecteur trouvé).
|
| 129 |
ValueError
|
| 130 |
Si ``candidate.type`` n'est pas dans
|
| 131 |
``view.candidate_types``. Le caller (typiquement le
|
| 132 |
-
|
| 133 |
produisent pas le bon type avant d'appeler ``evaluate``.
|
| 134 |
"""
|
| 135 |
# 1. Vérification du type d'entrée.
|
|
@@ -141,64 +170,32 @@ class DefaultEvaluationViewExecutor:
|
|
| 141 |
f"{sorted(t.value for t in view.candidate_types)}."
|
| 142 |
)
|
| 143 |
|
| 144 |
-
# 2. Projection (
|
| 145 |
-
#
|
| 146 |
-
#
|
| 147 |
-
# un mapping par type source (``projections_by_source_type``).
|
| 148 |
-
# Le projecteur retourne ``(Artifact, payload, report)`` —
|
| 149 |
-
# on conserve le payload pour le passer aux métriques sans
|
| 150 |
-
# repasser par le loader (l'artefact projeté est intermédiaire
|
| 151 |
-
# et n'a typiquement pas d'URI).
|
| 152 |
-
effective_candidate = candidate
|
| 153 |
-
projection_report = None
|
| 154 |
-
projected_payload: Any = _UNSET
|
| 155 |
projection_spec = view.projection_for(candidate.type)
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
projection_spec.projector_name,
|
| 160 |
-
)
|
| 161 |
-
except ProjectorNotFoundError as exc:
|
| 162 |
-
raise ProjectionError(
|
| 163 |
-
f"View {view.name!r} référence le projecteur "
|
| 164 |
-
f"{projection_spec.projector_name!r} introuvable "
|
| 165 |
-
"dans le ProjectorRegistry."
|
| 166 |
-
) from exc
|
| 167 |
-
try:
|
| 168 |
-
(
|
| 169 |
-
effective_candidate,
|
| 170 |
-
projected_payload,
|
| 171 |
-
projection_report,
|
| 172 |
-
) = projector.project(
|
| 173 |
-
candidate, dict(projection_spec.params),
|
| 174 |
-
)
|
| 175 |
-
except ProjectionError:
|
| 176 |
-
raise
|
| 177 |
-
except Exception as exc: # noqa: BLE001
|
| 178 |
-
raise ProjectionError(
|
| 179 |
-
f"Projecteur {projection_spec.projector_name!r} a "
|
| 180 |
-
f"levé sur l'artefact {candidate.id!r} : {exc}"
|
| 181 |
-
) from exc
|
| 182 |
|
| 183 |
# 3. Chargement des payloads.
|
| 184 |
-
#
|
| 185 |
-
#
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
cand_payload = projected_payload
|
| 190 |
else:
|
| 191 |
try:
|
| 192 |
-
cand_payload = self._loader(
|
| 193 |
except Exception as exc: # noqa: BLE001
|
| 194 |
return self._failed_view_result(
|
| 195 |
view=view,
|
| 196 |
candidate=candidate,
|
| 197 |
ground_truth=ground_truth,
|
| 198 |
-
projection_report=
|
| 199 |
global_error=(
|
| 200 |
f"payload_loader a échoué sur le candidat "
|
| 201 |
-
f"{
|
| 202 |
),
|
| 203 |
)
|
| 204 |
try:
|
|
@@ -208,7 +205,7 @@ class DefaultEvaluationViewExecutor:
|
|
| 208 |
view=view,
|
| 209 |
candidate=candidate,
|
| 210 |
ground_truth=ground_truth,
|
| 211 |
-
projection_report=
|
| 212 |
global_error=(
|
| 213 |
f"payload_loader a échoué sur la GT "
|
| 214 |
f"{ground_truth.id!r} : {exc}"
|
|
@@ -221,34 +218,19 @@ class DefaultEvaluationViewExecutor:
|
|
| 221 |
view.normalization_profile, cand_payload, gt_payload,
|
| 222 |
)
|
| 223 |
|
| 224 |
-
# 5.
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
failed_metrics: dict[str, str] = {}
|
| 229 |
-
for name in view.metric_names:
|
| 230 |
-
try:
|
| 231 |
-
value = self._metrics.compute(name, gt_payload, cand_payload)
|
| 232 |
-
metric_values[name] = value
|
| 233 |
-
except MetricNotFoundError as exc:
|
| 234 |
-
failed_metrics[name] = (
|
| 235 |
-
f"métrique non enregistrée dans le MetricRegistry : "
|
| 236 |
-
f"{exc}"
|
| 237 |
-
)
|
| 238 |
-
except Exception as exc: # noqa: BLE001
|
| 239 |
-
failed_metrics[name] = (
|
| 240 |
-
f"{type(exc).__name__}: {exc}"
|
| 241 |
-
)
|
| 242 |
|
| 243 |
-
# 6.
|
| 244 |
warnings = tuple(view.warnings)
|
| 245 |
ignored = tuple(view.ignored_dimensions)
|
| 246 |
-
if
|
| 247 |
-
warnings = warnings + tuple(
|
| 248 |
-
# Déduplique les ignored_dimensions tout en préservant l'ordre.
|
| 249 |
seen: set[str] = set(ignored)
|
| 250 |
extra = tuple(
|
| 251 |
-
d for d in
|
| 252 |
if d not in seen
|
| 253 |
)
|
| 254 |
ignored = ignored + extra
|
|
@@ -257,9 +239,9 @@ class DefaultEvaluationViewExecutor:
|
|
| 257 |
view_name=view.name,
|
| 258 |
candidate_artifact_id=candidate.id,
|
| 259 |
ground_truth_artifact_id=ground_truth.id,
|
| 260 |
-
metric_values=metric_values,
|
| 261 |
-
failed_metrics=failed_metrics,
|
| 262 |
-
projection_report=
|
| 263 |
warnings=warnings,
|
| 264 |
ignored_dimensions=ignored,
|
| 265 |
)
|
|
|
|
| 1 |
+
"""``DefaultEvaluationViewExecutor`` — Sprint A14-S13, refactoré au S27.
|
| 2 |
|
| 3 |
Implémentation concrète du protocole ``EvaluationViewExecutor`` (S5).
|
| 4 |
+
Orchestre une vue d'évaluation sur une paire (candidat, GT) en
|
| 5 |
+
**déléguant** la projection et l'évaluation à deux moteurs spécialisés
|
| 6 |
+
introduits au S27 :
|
| 7 |
|
| 8 |
+
- ``ProjectionEngine`` (cf. ``picarones/evaluation/projection_engine.py``)
|
| 9 |
+
transforme l'artefact candidat selon la ``ProjectionSpec``.
|
| 10 |
+
- ``EvaluationEngine`` (cf. ``picarones/evaluation/evaluation_engine.py``)
|
| 11 |
+
calcule les métriques sur les payloads.
|
| 12 |
+
|
| 13 |
+
Séquence d'orchestration
|
| 14 |
+
------------------------
|
| 15 |
1. Vérifie que ``candidate.type`` est dans ``view.candidate_types``.
|
| 16 |
+
2. ``ProjectionEngine.project(candidate, view.projection_for(candidate.type))``
|
| 17 |
+
→ retourne un ``ProjectionResult`` qui peut contenir un payload
|
| 18 |
+
pré-calculé.
|
| 19 |
3. Charge les payloads (texte, ALTO parsé, etc.) via le
|
| 20 |
+
``payload_loader`` injecté. Si la projection a produit un payload,
|
| 21 |
+
l'utilise directement sans repasser par le loader.
|
| 22 |
4. Applique optionnellement un profil de normalisation texte
|
| 23 |
+
(``view.normalization_profile``).
|
| 24 |
+
5. ``EvaluationEngine.evaluate(view.metric_names, gt_payload, cand_payload)``
|
| 25 |
+
→ retourne un ``EvaluationResult`` avec metric_values + failed_metrics.
|
| 26 |
+
6. Construit le ``ViewResult`` agrégeant tout (projection_report,
|
| 27 |
+
metric_values, failed_metrics, warnings, ignored_dimensions).
|
| 28 |
+
|
| 29 |
+
Construction
|
| 30 |
+
------------
|
| 31 |
+
- ``__init__`` canonique prend ``(projection_engine, evaluation_engine,
|
| 32 |
+
payload_loader)``.
|
| 33 |
+
- ``from_registries(metric_registry, projector_registry, payload_loader)``
|
| 34 |
+
reste exposé comme classmethod ergonomique pour les callers qui
|
| 35 |
+
n'ont pas envie de fabriquer eux-mêmes les deux moteurs (tests,
|
| 36 |
+
scripts ad-hoc). Aucune logique nouvelle — uniquement un appel
|
| 37 |
+
composé ; l'API canonique reste l'injection des deux engines.
|
| 38 |
|
| 39 |
Anti-sur-ingénierie
|
| 40 |
-------------------
|
| 41 |
+
Pas de cache de payload chargé entre métriques (chaque appel à
|
| 42 |
+
``evaluate`` est indépendant). Pas de batch (évaluer N paires en
|
| 43 |
+
une passe). Pas de validation cross-métrique. La complexité vit
|
| 44 |
+
dans les engines, pas dans l'executor.
|
|
|
|
|
|
|
| 45 |
"""
|
| 46 |
|
| 47 |
from __future__ import annotations
|
|
|
|
| 50 |
from typing import Any, Callable
|
| 51 |
|
| 52 |
from picarones.domain.artifacts import Artifact
|
|
|
|
| 53 |
from picarones.domain.evaluation_spec import EvaluationView
|
| 54 |
+
from picarones.evaluation.evaluation_engine import EvaluationEngine
|
| 55 |
+
from picarones.evaluation.projection_engine import ProjectionEngine
|
| 56 |
+
from picarones.evaluation.projectors.registry import ProjectorRegistry
|
| 57 |
+
from picarones.evaluation.registry import MetricRegistry
|
|
|
|
| 58 |
from picarones.evaluation.views.base import ViewResult
|
| 59 |
|
| 60 |
logger = logging.getLogger(__name__)
|
| 61 |
|
| 62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
#: Type alias : un payload loader prend un Artifact et retourne le
|
| 64 |
#: contenu chargé (str pour RAW_TEXT, dict pour ENTITIES, etc.).
|
| 65 |
PayloadLoader = Callable[[Artifact], Any]
|
| 66 |
|
| 67 |
|
| 68 |
class DefaultEvaluationViewExecutor:
|
| 69 |
+
"""Orchestrateur de vue d'évaluation.
|
| 70 |
|
| 71 |
Parameters
|
| 72 |
----------
|
| 73 |
+
projection_engine:
|
| 74 |
+
``ProjectionEngine`` injecté. Responsable de la
|
| 75 |
+
transformation d'artefacts entre types via le registre de
|
| 76 |
+
projecteurs.
|
| 77 |
+
evaluation_engine:
|
| 78 |
+
``EvaluationEngine`` injecté. Responsable du calcul des
|
| 79 |
+
métriques nommées sur des payloads.
|
| 80 |
payload_loader:
|
| 81 |
Callable ``(Artifact) -> Any`` qui charge le contenu d'un
|
| 82 |
+
artefact non encore résolu (typiquement la GT et le candidat
|
| 83 |
+
s'il n'est pas projeté). Pour les tests, un dict in-memory
|
| 84 |
+
; en production, un service applicatif qui sait gérer les
|
| 85 |
+
workspaces sandboxés.
|
| 86 |
"""
|
| 87 |
|
| 88 |
def __init__(
|
| 89 |
self,
|
| 90 |
+
projection_engine: ProjectionEngine,
|
| 91 |
+
evaluation_engine: EvaluationEngine,
|
| 92 |
payload_loader: PayloadLoader,
|
| 93 |
) -> None:
|
| 94 |
+
if not isinstance(projection_engine, ProjectionEngine):
|
| 95 |
raise TypeError(
|
| 96 |
+
"projection_engine doit être un ProjectionEngine."
|
| 97 |
)
|
| 98 |
+
if not isinstance(evaluation_engine, EvaluationEngine):
|
| 99 |
raise TypeError(
|
| 100 |
+
"evaluation_engine doit être un EvaluationEngine."
|
| 101 |
)
|
| 102 |
if not callable(payload_loader):
|
| 103 |
raise TypeError("payload_loader doit être callable.")
|
| 104 |
+
self._projection = projection_engine
|
| 105 |
+
self._evaluation = evaluation_engine
|
| 106 |
self._loader = payload_loader
|
| 107 |
|
| 108 |
+
# ──────────────────────────────────────────────────────────────────
|
| 109 |
+
# Constructeur ergonomique
|
| 110 |
+
# ──────────────────────────────────────────────────────────────────
|
| 111 |
+
|
| 112 |
+
@classmethod
|
| 113 |
+
def from_registries(
|
| 114 |
+
cls,
|
| 115 |
+
metric_registry: MetricRegistry,
|
| 116 |
+
projector_registry: ProjectorRegistry,
|
| 117 |
+
payload_loader: PayloadLoader,
|
| 118 |
+
) -> "DefaultEvaluationViewExecutor":
|
| 119 |
+
"""Construit l'executor à partir des registres bruts.
|
| 120 |
+
|
| 121 |
+
Sucre syntaxique sur l'API canonique : un caller qui a déjà
|
| 122 |
+
un ``MetricRegistry`` + ``ProjectorRegistry`` (cas typique :
|
| 123 |
+
un test, ou un service qui n'a qu'un seul executor) gagne
|
| 124 |
+
deux lignes. Aucune logique nouvelle — instancie
|
| 125 |
+
``ProjectionEngine`` et ``EvaluationEngine`` puis délègue.
|
| 126 |
+
"""
|
| 127 |
+
return cls(
|
| 128 |
+
projection_engine=ProjectionEngine(projector_registry),
|
| 129 |
+
evaluation_engine=EvaluationEngine(metric_registry),
|
| 130 |
+
payload_loader=payload_loader,
|
| 131 |
+
)
|
| 132 |
+
|
| 133 |
# ──────────────────────────────────────────────────────────────────
|
| 134 |
# API publique
|
| 135 |
# ──────────────────────────────────────────────────────────────────
|
|
|
|
| 145 |
Returns
|
| 146 |
-------
|
| 147 |
ViewResult
|
| 148 |
+
Toujours retourné en sortie normale — les erreurs de
|
| 149 |
+
métriques individuelles vont dans ``failed_metrics``,
|
| 150 |
+
les erreurs de chargement de payload se traduisent en
|
| 151 |
+
``failed_metrics`` global.
|
| 152 |
|
| 153 |
Raises
|
| 154 |
------
|
| 155 |
ProjectionError
|
| 156 |
+
Si la vue exige une projection que le projecteur ne
|
| 157 |
+
peut pas réaliser (cohérent avec le contrat du S5).
|
|
|
|
| 158 |
ValueError
|
| 159 |
Si ``candidate.type`` n'est pas dans
|
| 160 |
``view.candidate_types``. Le caller (typiquement le
|
| 161 |
+
``BenchmarkService``) doit filtrer les pipelines qui ne
|
| 162 |
produisent pas le bon type avant d'appeler ``evaluate``.
|
| 163 |
"""
|
| 164 |
# 1. Vérification du type d'entrée.
|
|
|
|
| 170 |
f"{sorted(t.value for t in view.candidate_types)}."
|
| 171 |
)
|
| 172 |
|
| 173 |
+
# 2. Projection (déléguée). Lève ``ProjectionError`` si la
|
| 174 |
+
# projection est invalide — on laisse remonter (cohérence
|
| 175 |
+
# avec le contrat S5).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
projection_spec = view.projection_for(candidate.type)
|
| 177 |
+
projection_result = self._projection.project(
|
| 178 |
+
candidate, projection_spec,
|
| 179 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
|
| 181 |
# 3. Chargement des payloads.
|
| 182 |
+
# Si la projection a fourni un payload, on l'utilise sans
|
| 183 |
+
# repasser par le loader (typique S25 — l'artefact projeté
|
| 184 |
+
# n'a pas d'URI). Sinon, on charge le candidat via le loader.
|
| 185 |
+
if projection_result.payload is not None:
|
| 186 |
+
cand_payload = projection_result.payload
|
|
|
|
| 187 |
else:
|
| 188 |
try:
|
| 189 |
+
cand_payload = self._loader(projection_result.artifact)
|
| 190 |
except Exception as exc: # noqa: BLE001
|
| 191 |
return self._failed_view_result(
|
| 192 |
view=view,
|
| 193 |
candidate=candidate,
|
| 194 |
ground_truth=ground_truth,
|
| 195 |
+
projection_report=projection_result.report,
|
| 196 |
global_error=(
|
| 197 |
f"payload_loader a échoué sur le candidat "
|
| 198 |
+
f"{projection_result.artifact.id!r} : {exc}"
|
| 199 |
),
|
| 200 |
)
|
| 201 |
try:
|
|
|
|
| 205 |
view=view,
|
| 206 |
candidate=candidate,
|
| 207 |
ground_truth=ground_truth,
|
| 208 |
+
projection_report=projection_result.report,
|
| 209 |
global_error=(
|
| 210 |
f"payload_loader a échoué sur la GT "
|
| 211 |
f"{ground_truth.id!r} : {exc}"
|
|
|
|
| 218 |
view.normalization_profile, cand_payload, gt_payload,
|
| 219 |
)
|
| 220 |
|
| 221 |
+
# 5. Évaluation déléguée. Une métrique cassée → failed_metrics.
|
| 222 |
+
evaluation_result = self._evaluation.evaluate(
|
| 223 |
+
view.metric_names, gt_payload, cand_payload,
|
| 224 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 225 |
|
| 226 |
+
# 6. Agrégation finale dans le ViewResult.
|
| 227 |
warnings = tuple(view.warnings)
|
| 228 |
ignored = tuple(view.ignored_dimensions)
|
| 229 |
+
if projection_result.report is not None:
|
| 230 |
+
warnings = warnings + tuple(projection_result.report.warnings)
|
|
|
|
| 231 |
seen: set[str] = set(ignored)
|
| 232 |
extra = tuple(
|
| 233 |
+
d for d in projection_result.report.ignored_dimensions
|
| 234 |
if d not in seen
|
| 235 |
)
|
| 236 |
ignored = ignored + extra
|
|
|
|
| 239 |
view_name=view.name,
|
| 240 |
candidate_artifact_id=candidate.id,
|
| 241 |
ground_truth_artifact_id=ground_truth.id,
|
| 242 |
+
metric_values=evaluation_result.metric_values,
|
| 243 |
+
failed_metrics=evaluation_result.failed_metrics,
|
| 244 |
+
projection_report=projection_result.report,
|
| 245 |
warnings=warnings,
|
| 246 |
ignored_dimensions=ignored,
|
| 247 |
)
|
|
@@ -83,7 +83,7 @@ def _build_minimal_run_dir(out_dir: Path, *, corpus_name: str = "test") -> None:
|
|
| 83 |
from picarones.evaluation.views import DefaultEvaluationViewExecutor
|
| 84 |
from picarones.pipeline import CorpusRunner, PipelineExecutor
|
| 85 |
loader = lambda art: "" # noqa: E731
|
| 86 |
-
view_executor = DefaultEvaluationViewExecutor(
|
| 87 |
MetricRegistry(), ProjectorRegistry(), loader,
|
| 88 |
)
|
| 89 |
runner_internal = CorpusRunner(
|
|
|
|
| 83 |
from picarones.evaluation.views import DefaultEvaluationViewExecutor
|
| 84 |
from picarones.pipeline import CorpusRunner, PipelineExecutor
|
| 85 |
loader = lambda art: "" # noqa: E731
|
| 86 |
+
view_executor = DefaultEvaluationViewExecutor.from_registries(
|
| 87 |
MetricRegistry(), ProjectorRegistry(), loader,
|
| 88 |
)
|
| 89 |
runner_internal = CorpusRunner(
|
|
@@ -105,7 +105,7 @@ def _build_executor(
|
|
| 105 |
raise KeyError(f"payload manquant : {artifact.id}")
|
| 106 |
return payloads[artifact.id]
|
| 107 |
|
| 108 |
-
return DefaultEvaluationViewExecutor(metrics, projectors, loader)
|
| 109 |
|
| 110 |
|
| 111 |
def _text_view(
|
|
@@ -226,7 +226,7 @@ class TestEvaluator:
|
|
| 226 |
metrics = MetricRegistry()
|
| 227 |
projectors = ProjectorRegistry()
|
| 228 |
projectors.register(_CrashingProjector())
|
| 229 |
-
executor = DefaultEvaluationViewExecutor(
|
| 230 |
metrics, projectors, lambda a: None,
|
| 231 |
)
|
| 232 |
view = _text_view(
|
|
@@ -304,7 +304,9 @@ class TestEvaluator:
|
|
| 304 |
def _bad_loader(artifact):
|
| 305 |
raise FileNotFoundError(f"missing file for {artifact.id}")
|
| 306 |
|
| 307 |
-
executor = DefaultEvaluationViewExecutor(
|
|
|
|
|
|
|
| 308 |
view = _text_view(metric_names=("cer",))
|
| 309 |
cand = Artifact(id="cand", document_id="d", type=ArtifactType.RAW_TEXT)
|
| 310 |
gt = Artifact(id="gt", document_id="d", type=ArtifactType.RAW_TEXT)
|
|
@@ -320,21 +322,51 @@ class TestEvaluator:
|
|
| 320 |
|
| 321 |
|
| 322 |
class TestConstructor:
|
| 323 |
-
|
| 324 |
-
|
|
|
|
|
|
|
|
|
|
| 325 |
DefaultEvaluationViewExecutor(
|
| 326 |
-
"not
|
|
|
|
|
|
|
| 327 |
)
|
| 328 |
|
| 329 |
-
def
|
| 330 |
-
|
|
|
|
| 331 |
DefaultEvaluationViewExecutor(
|
| 332 |
-
|
|
|
|
|
|
|
| 333 |
)
|
| 334 |
|
| 335 |
def test_rejects_non_callable_loader(self) -> None:
|
|
|
|
|
|
|
| 336 |
with pytest.raises(TypeError, match="callable"):
|
| 337 |
DefaultEvaluationViewExecutor(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 338 |
MetricRegistry(), ProjectorRegistry(), "not_callable", # type: ignore[arg-type]
|
| 339 |
)
|
| 340 |
|
|
|
|
| 105 |
raise KeyError(f"payload manquant : {artifact.id}")
|
| 106 |
return payloads[artifact.id]
|
| 107 |
|
| 108 |
+
return DefaultEvaluationViewExecutor.from_registries(metrics, projectors, loader)
|
| 109 |
|
| 110 |
|
| 111 |
def _text_view(
|
|
|
|
| 226 |
metrics = MetricRegistry()
|
| 227 |
projectors = ProjectorRegistry()
|
| 228 |
projectors.register(_CrashingProjector())
|
| 229 |
+
executor = DefaultEvaluationViewExecutor.from_registries(
|
| 230 |
metrics, projectors, lambda a: None,
|
| 231 |
)
|
| 232 |
view = _text_view(
|
|
|
|
| 304 |
def _bad_loader(artifact):
|
| 305 |
raise FileNotFoundError(f"missing file for {artifact.id}")
|
| 306 |
|
| 307 |
+
executor = DefaultEvaluationViewExecutor.from_registries(
|
| 308 |
+
metrics, projectors, _bad_loader,
|
| 309 |
+
)
|
| 310 |
view = _text_view(metric_names=("cer",))
|
| 311 |
cand = Artifact(id="cand", document_id="d", type=ArtifactType.RAW_TEXT)
|
| 312 |
gt = Artifact(id="gt", document_id="d", type=ArtifactType.RAW_TEXT)
|
|
|
|
| 322 |
|
| 323 |
|
| 324 |
class TestConstructor:
|
| 325 |
+
"""Le constructeur canonique (S27) attend deux engines + un loader."""
|
| 326 |
+
|
| 327 |
+
def test_rejects_non_projection_engine(self) -> None:
|
| 328 |
+
from picarones.evaluation.evaluation_engine import EvaluationEngine
|
| 329 |
+
with pytest.raises(TypeError, match="projection_engine"):
|
| 330 |
DefaultEvaluationViewExecutor(
|
| 331 |
+
"not an engine", # type: ignore[arg-type]
|
| 332 |
+
EvaluationEngine(MetricRegistry()),
|
| 333 |
+
lambda a: None,
|
| 334 |
)
|
| 335 |
|
| 336 |
+
def test_rejects_non_evaluation_engine(self) -> None:
|
| 337 |
+
from picarones.evaluation.projection_engine import ProjectionEngine
|
| 338 |
+
with pytest.raises(TypeError, match="evaluation_engine"):
|
| 339 |
DefaultEvaluationViewExecutor(
|
| 340 |
+
ProjectionEngine(ProjectorRegistry()),
|
| 341 |
+
"nope", # type: ignore[arg-type]
|
| 342 |
+
lambda a: None,
|
| 343 |
)
|
| 344 |
|
| 345 |
def test_rejects_non_callable_loader(self) -> None:
|
| 346 |
+
from picarones.evaluation.evaluation_engine import EvaluationEngine
|
| 347 |
+
from picarones.evaluation.projection_engine import ProjectionEngine
|
| 348 |
with pytest.raises(TypeError, match="callable"):
|
| 349 |
DefaultEvaluationViewExecutor(
|
| 350 |
+
ProjectionEngine(ProjectorRegistry()),
|
| 351 |
+
EvaluationEngine(MetricRegistry()),
|
| 352 |
+
"not_callable", # type: ignore[arg-type]
|
| 353 |
+
)
|
| 354 |
+
|
| 355 |
+
def test_from_registries_rejects_non_metric_registry(self) -> None:
|
| 356 |
+
with pytest.raises(TypeError, match="metric_registry"):
|
| 357 |
+
DefaultEvaluationViewExecutor.from_registries(
|
| 358 |
+
"not a registry", ProjectorRegistry(), lambda a: None, # type: ignore[arg-type]
|
| 359 |
+
)
|
| 360 |
+
|
| 361 |
+
def test_from_registries_rejects_non_projector_registry(self) -> None:
|
| 362 |
+
with pytest.raises(TypeError, match="projector_registry"):
|
| 363 |
+
DefaultEvaluationViewExecutor.from_registries(
|
| 364 |
+
MetricRegistry(), "nope", lambda a: None, # type: ignore[arg-type]
|
| 365 |
+
)
|
| 366 |
+
|
| 367 |
+
def test_from_registries_rejects_non_callable_loader(self) -> None:
|
| 368 |
+
with pytest.raises(TypeError, match="callable"):
|
| 369 |
+
DefaultEvaluationViewExecutor.from_registries(
|
| 370 |
MetricRegistry(), ProjectorRegistry(), "not_callable", # type: ignore[arg-type]
|
| 371 |
)
|
| 372 |
|
|
@@ -127,7 +127,7 @@ def _build_unified_executor(payloads: dict) -> DefaultEvaluationViewExecutor:
|
|
| 127 |
raise KeyError(art.id)
|
| 128 |
return payloads[art.id]
|
| 129 |
|
| 130 |
-
return DefaultEvaluationViewExecutor(metrics, projectors, loader)
|
| 131 |
|
| 132 |
|
| 133 |
# ──────────────────────────────────────────────────────────────────
|
|
|
|
| 127 |
raise KeyError(art.id)
|
| 128 |
return payloads[art.id]
|
| 129 |
|
| 130 |
+
return DefaultEvaluationViewExecutor.from_registries(metrics, projectors, loader)
|
| 131 |
|
| 132 |
|
| 133 |
# ──────────────────────────────────────────────────────────────────
|
|
@@ -113,7 +113,7 @@ class TestProjectionWithoutLoaderHack:
|
|
| 113 |
|
| 114 |
# Loader strict qui ASSERTE qu'il n'est pas appelé sur l'artefact
|
| 115 |
# projeté.
|
| 116 |
-
executor = DefaultEvaluationViewExecutor(
|
| 117 |
registries.metrics,
|
| 118 |
registries.projectors,
|
| 119 |
_strict_loader,
|
|
@@ -160,7 +160,7 @@ class TestProjectionWithoutLoaderHack:
|
|
| 160 |
gt_path.write_text("Titre Bonjour le monde", encoding="utf-8")
|
| 161 |
|
| 162 |
registries = RegistryService.bootstrap_defaults()
|
| 163 |
-
executor = DefaultEvaluationViewExecutor(
|
| 164 |
registries.metrics,
|
| 165 |
registries.projectors,
|
| 166 |
_strict_loader,
|
|
@@ -201,7 +201,7 @@ class TestProjectionWithoutLoaderHack:
|
|
| 201 |
gt_path.write_text(gt_text, encoding="utf-8")
|
| 202 |
|
| 203 |
registries = RegistryService.bootstrap_defaults()
|
| 204 |
-
executor = DefaultEvaluationViewExecutor(
|
| 205 |
registries.metrics,
|
| 206 |
registries.projectors,
|
| 207 |
_strict_loader,
|
|
@@ -287,7 +287,7 @@ class TestPayloadFromProjectorIsAuthoritative:
|
|
| 287 |
metric_names=("capture",),
|
| 288 |
)
|
| 289 |
|
| 290 |
-
executor = DefaultEvaluationViewExecutor(
|
| 291 |
metrics, projectors, _strict_loader,
|
| 292 |
)
|
| 293 |
cand = Artifact(
|
|
|
|
| 113 |
|
| 114 |
# Loader strict qui ASSERTE qu'il n'est pas appelé sur l'artefact
|
| 115 |
# projeté.
|
| 116 |
+
executor = DefaultEvaluationViewExecutor.from_registries(
|
| 117 |
registries.metrics,
|
| 118 |
registries.projectors,
|
| 119 |
_strict_loader,
|
|
|
|
| 160 |
gt_path.write_text("Titre Bonjour le monde", encoding="utf-8")
|
| 161 |
|
| 162 |
registries = RegistryService.bootstrap_defaults()
|
| 163 |
+
executor = DefaultEvaluationViewExecutor.from_registries(
|
| 164 |
registries.metrics,
|
| 165 |
registries.projectors,
|
| 166 |
_strict_loader,
|
|
|
|
| 201 |
gt_path.write_text(gt_text, encoding="utf-8")
|
| 202 |
|
| 203 |
registries = RegistryService.bootstrap_defaults()
|
| 204 |
+
executor = DefaultEvaluationViewExecutor.from_registries(
|
| 205 |
registries.metrics,
|
| 206 |
registries.projectors,
|
| 207 |
_strict_loader,
|
|
|
|
| 287 |
metric_names=("capture",),
|
| 288 |
)
|
| 289 |
|
| 290 |
+
executor = DefaultEvaluationViewExecutor.from_registries(
|
| 291 |
metrics, projectors, _strict_loader,
|
| 292 |
)
|
| 293 |
cand = Artifact(
|
|
@@ -0,0 +1,352 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Sprint A14-S27 — ``ProjectionEngine`` + ``EvaluationEngine`` séparés.
|
| 2 |
+
|
| 3 |
+
Tests des deux moteurs introduits par S27 pour découper le S13.
|
| 4 |
+
Couvre :
|
| 5 |
+
|
| 6 |
+
1. ``ProjectionEngine.project`` :
|
| 7 |
+
- cas identité (spec None) → artefact tel quel, payload None,
|
| 8 |
+
report None ;
|
| 9 |
+
- spec identité (source == target) → idem ;
|
| 10 |
+
- projection nominale → triplet complet (artefact target, payload,
|
| 11 |
+
report) ;
|
| 12 |
+
- projecteur introuvable → ProjectionError ;
|
| 13 |
+
- projecteur qui lève → wrappé en ProjectionError ;
|
| 14 |
+
- validation du constructeur (rejette non-registry).
|
| 15 |
+
|
| 16 |
+
2. ``EvaluationEngine.evaluate`` :
|
| 17 |
+
- calcule chaque métrique, dispatch erreur dans failed_metrics ;
|
| 18 |
+
- métrique inconnue → message explicite ;
|
| 19 |
+
- métrique qui lève → message ``{type}: {msg}`` ;
|
| 20 |
+
- ordre des résultats préservé ;
|
| 21 |
+
- validation du constructeur ;
|
| 22 |
+
- sucre ``evaluate_one`` ;
|
| 23 |
+
- dataclass ``EvaluationResult`` (n_succeeded, n_failed,
|
| 24 |
+
all_succeeded, with_global_failure).
|
| 25 |
+
|
| 26 |
+
3. Intégration : l'executor refondu (S27) délègue aux deux engines —
|
| 27 |
+
les comportements existants du S13 sont préservés (couverture
|
| 28 |
+
indirecte par ``test_sprint_a14_s13_view_executor.py``).
|
| 29 |
+
"""
|
| 30 |
+
|
| 31 |
+
from __future__ import annotations
|
| 32 |
+
|
| 33 |
+
import pytest
|
| 34 |
+
|
| 35 |
+
from picarones.domain.artifacts import Artifact, ArtifactType
|
| 36 |
+
from picarones.domain.errors import ProjectionError
|
| 37 |
+
from picarones.domain.projection_spec import ProjectionSpec
|
| 38 |
+
from picarones.evaluation.evaluation_engine import (
|
| 39 |
+
EvaluationEngine,
|
| 40 |
+
EvaluationResult,
|
| 41 |
+
)
|
| 42 |
+
from picarones.evaluation.projection_engine import (
|
| 43 |
+
ProjectionEngine,
|
| 44 |
+
ProjectionResult,
|
| 45 |
+
)
|
| 46 |
+
from picarones.evaluation.projectors.base import ProjectionReport
|
| 47 |
+
from picarones.evaluation.projectors.registry import (
|
| 48 |
+
ProjectorRegistry,
|
| 49 |
+
)
|
| 50 |
+
from picarones.evaluation.registry import MetricRegistry
|
| 51 |
+
from picarones.domain.evaluation_spec import MetricSpec
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 55 |
+
# Stubs réutilisables
|
| 56 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
class _StubProjector:
|
| 60 |
+
name = "stub"
|
| 61 |
+
source_type = ArtifactType.ALTO_XML
|
| 62 |
+
target_type = ArtifactType.RAW_TEXT
|
| 63 |
+
|
| 64 |
+
def __init__(self, payload: str = "projected") -> None:
|
| 65 |
+
self._payload = payload
|
| 66 |
+
|
| 67 |
+
def project(self, artifact, params):
|
| 68 |
+
target = Artifact(
|
| 69 |
+
id=f"{artifact.id}:projected",
|
| 70 |
+
document_id=artifact.document_id,
|
| 71 |
+
type=self.target_type,
|
| 72 |
+
)
|
| 73 |
+
report = ProjectionReport(
|
| 74 |
+
source_artifact_id=artifact.id,
|
| 75 |
+
source_type=self.source_type,
|
| 76 |
+
target_type=self.target_type,
|
| 77 |
+
projector_name=self.name,
|
| 78 |
+
lossy=True,
|
| 79 |
+
ignored_dimensions=("geometry",),
|
| 80 |
+
warnings=("dim perdue",),
|
| 81 |
+
)
|
| 82 |
+
return target, self._payload, report
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
class _CrashingProjector:
|
| 86 |
+
name = "crash"
|
| 87 |
+
source_type = ArtifactType.ALTO_XML
|
| 88 |
+
target_type = ArtifactType.RAW_TEXT
|
| 89 |
+
|
| 90 |
+
def project(self, artifact, params):
|
| 91 |
+
raise RuntimeError("boom interne")
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 95 |
+
# ProjectionEngine
|
| 96 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
class TestProjectionEngineConstructor:
|
| 100 |
+
def test_rejects_non_registry(self) -> None:
|
| 101 |
+
with pytest.raises(TypeError, match="projector_registry"):
|
| 102 |
+
ProjectionEngine("nope") # type: ignore[arg-type]
|
| 103 |
+
|
| 104 |
+
def test_accepts_empty_registry(self) -> None:
|
| 105 |
+
engine = ProjectionEngine(ProjectorRegistry())
|
| 106 |
+
assert engine.projectors is not None
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
class TestProjectionEngineIdentity:
|
| 110 |
+
def test_none_spec_returns_unchanged(self) -> None:
|
| 111 |
+
engine = ProjectionEngine(ProjectorRegistry())
|
| 112 |
+
artifact = Artifact(id="a", document_id="d", type=ArtifactType.RAW_TEXT)
|
| 113 |
+
result = engine.project(artifact, None)
|
| 114 |
+
assert result.artifact is artifact
|
| 115 |
+
assert result.payload is None
|
| 116 |
+
assert result.report is None
|
| 117 |
+
assert result.has_projection is False
|
| 118 |
+
|
| 119 |
+
def test_identity_spec_returns_unchanged(self) -> None:
|
| 120 |
+
engine = ProjectionEngine(ProjectorRegistry())
|
| 121 |
+
artifact = Artifact(id="a", document_id="d", type=ArtifactType.RAW_TEXT)
|
| 122 |
+
spec = ProjectionSpec(
|
| 123 |
+
source_type=ArtifactType.RAW_TEXT,
|
| 124 |
+
target_type=ArtifactType.RAW_TEXT,
|
| 125 |
+
projector_name="ignored_when_identity",
|
| 126 |
+
)
|
| 127 |
+
result = engine.project(artifact, spec)
|
| 128 |
+
assert result.artifact is artifact
|
| 129 |
+
assert result.payload is None
|
| 130 |
+
assert result.report is None
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
class TestProjectionEngineNominal:
|
| 134 |
+
def test_nominal_returns_triple(self) -> None:
|
| 135 |
+
registry = ProjectorRegistry()
|
| 136 |
+
registry.register(_StubProjector(payload="hello"))
|
| 137 |
+
engine = ProjectionEngine(registry)
|
| 138 |
+
artifact = Artifact(
|
| 139 |
+
id="alto",
|
| 140 |
+
document_id="d",
|
| 141 |
+
type=ArtifactType.ALTO_XML,
|
| 142 |
+
)
|
| 143 |
+
spec = ProjectionSpec(
|
| 144 |
+
source_type=ArtifactType.ALTO_XML,
|
| 145 |
+
target_type=ArtifactType.RAW_TEXT,
|
| 146 |
+
projector_name="stub",
|
| 147 |
+
)
|
| 148 |
+
result = engine.project(artifact, spec)
|
| 149 |
+
assert result.artifact.type == ArtifactType.RAW_TEXT
|
| 150 |
+
assert result.artifact.id == "alto:projected"
|
| 151 |
+
assert result.payload == "hello"
|
| 152 |
+
assert result.report is not None
|
| 153 |
+
assert result.report.projector_name == "stub"
|
| 154 |
+
assert result.has_projection is True
|
| 155 |
+
|
| 156 |
+
|
| 157 |
+
class TestProjectionEngineErrors:
|
| 158 |
+
def test_unknown_projector_raises_projection_error(self) -> None:
|
| 159 |
+
engine = ProjectionEngine(ProjectorRegistry())
|
| 160 |
+
artifact = Artifact(id="a", document_id="d", type=ArtifactType.ALTO_XML)
|
| 161 |
+
spec = ProjectionSpec(
|
| 162 |
+
source_type=ArtifactType.ALTO_XML,
|
| 163 |
+
target_type=ArtifactType.RAW_TEXT,
|
| 164 |
+
projector_name="missing",
|
| 165 |
+
)
|
| 166 |
+
with pytest.raises(ProjectionError, match="introuvable"):
|
| 167 |
+
engine.project(artifact, spec)
|
| 168 |
+
|
| 169 |
+
def test_crashing_projector_wraps_in_projection_error(self) -> None:
|
| 170 |
+
registry = ProjectorRegistry()
|
| 171 |
+
registry.register(_CrashingProjector())
|
| 172 |
+
engine = ProjectionEngine(registry)
|
| 173 |
+
artifact = Artifact(id="a", document_id="d", type=ArtifactType.ALTO_XML)
|
| 174 |
+
spec = ProjectionSpec(
|
| 175 |
+
source_type=ArtifactType.ALTO_XML,
|
| 176 |
+
target_type=ArtifactType.RAW_TEXT,
|
| 177 |
+
projector_name="crash",
|
| 178 |
+
)
|
| 179 |
+
with pytest.raises(ProjectionError, match="boom interne"):
|
| 180 |
+
engine.project(artifact, spec)
|
| 181 |
+
|
| 182 |
+
def test_native_projection_error_propagated_unwrapped(self) -> None:
|
| 183 |
+
"""Si le projecteur lève déjà un ``ProjectionError``, on ne le
|
| 184 |
+
wrappe pas dans un nouveau (préservation de la sémantique)."""
|
| 185 |
+
class _NativeProjErrProjector:
|
| 186 |
+
name = "native_err"
|
| 187 |
+
source_type = ArtifactType.ALTO_XML
|
| 188 |
+
target_type = ArtifactType.RAW_TEXT
|
| 189 |
+
|
| 190 |
+
def project(self, artifact, params):
|
| 191 |
+
raise ProjectionError("erreur native")
|
| 192 |
+
|
| 193 |
+
registry = ProjectorRegistry()
|
| 194 |
+
registry.register(_NativeProjErrProjector())
|
| 195 |
+
engine = ProjectionEngine(registry)
|
| 196 |
+
artifact = Artifact(id="a", document_id="d", type=ArtifactType.ALTO_XML)
|
| 197 |
+
spec = ProjectionSpec(
|
| 198 |
+
source_type=ArtifactType.ALTO_XML,
|
| 199 |
+
target_type=ArtifactType.RAW_TEXT,
|
| 200 |
+
projector_name="native_err",
|
| 201 |
+
)
|
| 202 |
+
with pytest.raises(ProjectionError, match="erreur native"):
|
| 203 |
+
engine.project(artifact, spec)
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 207 |
+
# EvaluationEngine
|
| 208 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 209 |
+
|
| 210 |
+
|
| 211 |
+
def _build_metric_registry(extra: dict = None) -> MetricRegistry:
|
| 212 |
+
reg = MetricRegistry()
|
| 213 |
+
reg.register(
|
| 214 |
+
MetricSpec(
|
| 215 |
+
name="cer",
|
| 216 |
+
input_types=(ArtifactType.RAW_TEXT, ArtifactType.RAW_TEXT),
|
| 217 |
+
),
|
| 218 |
+
lambda r, h: 0.0 if r == h else 1.0,
|
| 219 |
+
)
|
| 220 |
+
reg.register(
|
| 221 |
+
MetricSpec(
|
| 222 |
+
name="wer",
|
| 223 |
+
input_types=(ArtifactType.RAW_TEXT, ArtifactType.RAW_TEXT),
|
| 224 |
+
),
|
| 225 |
+
lambda r, h: 0.0 if r == h else 0.5,
|
| 226 |
+
)
|
| 227 |
+
if extra:
|
| 228 |
+
for name, fn in extra.items():
|
| 229 |
+
reg.register(
|
| 230 |
+
MetricSpec(
|
| 231 |
+
name=name,
|
| 232 |
+
input_types=(ArtifactType.RAW_TEXT, ArtifactType.RAW_TEXT),
|
| 233 |
+
),
|
| 234 |
+
fn,
|
| 235 |
+
)
|
| 236 |
+
return reg
|
| 237 |
+
|
| 238 |
+
|
| 239 |
+
class TestEvaluationEngineConstructor:
|
| 240 |
+
def test_rejects_non_registry(self) -> None:
|
| 241 |
+
with pytest.raises(TypeError, match="metric_registry"):
|
| 242 |
+
EvaluationEngine("nope") # type: ignore[arg-type]
|
| 243 |
+
|
| 244 |
+
def test_accepts_empty_registry(self) -> None:
|
| 245 |
+
engine = EvaluationEngine(MetricRegistry())
|
| 246 |
+
assert engine.metrics is not None
|
| 247 |
+
|
| 248 |
+
|
| 249 |
+
class TestEvaluationEngineNominal:
|
| 250 |
+
def test_all_metrics_succeed(self) -> None:
|
| 251 |
+
engine = EvaluationEngine(_build_metric_registry())
|
| 252 |
+
result = engine.evaluate(("cer", "wer"), "x", "x")
|
| 253 |
+
assert result.metric_values == {"cer": 0.0, "wer": 0.0}
|
| 254 |
+
assert result.failed_metrics == {}
|
| 255 |
+
assert result.n_succeeded == 2
|
| 256 |
+
assert result.n_failed == 0
|
| 257 |
+
assert result.all_succeeded is True
|
| 258 |
+
|
| 259 |
+
def test_metric_returning_nonzero(self) -> None:
|
| 260 |
+
engine = EvaluationEngine(_build_metric_registry())
|
| 261 |
+
result = engine.evaluate(("cer", "wer"), "abc", "xyz")
|
| 262 |
+
assert result.metric_values["cer"] == 1.0
|
| 263 |
+
assert result.metric_values["wer"] == 0.5
|
| 264 |
+
|
| 265 |
+
def test_evaluate_one_sugar(self) -> None:
|
| 266 |
+
engine = EvaluationEngine(_build_metric_registry())
|
| 267 |
+
result = engine.evaluate_one("cer", "x", "x")
|
| 268 |
+
assert result.metric_values == {"cer": 0.0}
|
| 269 |
+
assert result.failed_metrics == {}
|
| 270 |
+
|
| 271 |
+
def test_order_preserved(self) -> None:
|
| 272 |
+
engine = EvaluationEngine(_build_metric_registry())
|
| 273 |
+
result = engine.evaluate(("wer", "cer"), "x", "x")
|
| 274 |
+
# dict préserve l'ordre d'insertion (Python 3.7+).
|
| 275 |
+
assert list(result.metric_values.keys()) == ["wer", "cer"]
|
| 276 |
+
|
| 277 |
+
|
| 278 |
+
class TestEvaluationEngineFailures:
|
| 279 |
+
def test_unknown_metric_goes_to_failed(self) -> None:
|
| 280 |
+
engine = EvaluationEngine(_build_metric_registry())
|
| 281 |
+
result = engine.evaluate(("cer", "missing"), "x", "x")
|
| 282 |
+
assert "cer" in result.metric_values
|
| 283 |
+
assert "missing" in result.failed_metrics
|
| 284 |
+
assert "non enregistrée" in result.failed_metrics["missing"]
|
| 285 |
+
|
| 286 |
+
def test_metric_that_raises_goes_to_failed(self) -> None:
|
| 287 |
+
def _broken(r, h):
|
| 288 |
+
raise ValueError("metric crashed")
|
| 289 |
+
|
| 290 |
+
engine = EvaluationEngine(_build_metric_registry({"broken": _broken}))
|
| 291 |
+
result = engine.evaluate(("cer", "broken", "wer"), "x", "x")
|
| 292 |
+
assert "cer" in result.metric_values
|
| 293 |
+
assert "wer" in result.metric_values
|
| 294 |
+
assert "broken" in result.failed_metrics
|
| 295 |
+
assert "ValueError" in result.failed_metrics["broken"]
|
| 296 |
+
assert "metric crashed" in result.failed_metrics["broken"]
|
| 297 |
+
assert result.n_succeeded == 2
|
| 298 |
+
assert result.n_failed == 1
|
| 299 |
+
assert result.all_succeeded is False
|
| 300 |
+
|
| 301 |
+
def test_empty_metric_list_returns_empty_result(self) -> None:
|
| 302 |
+
engine = EvaluationEngine(_build_metric_registry())
|
| 303 |
+
result = engine.evaluate((), "x", "x")
|
| 304 |
+
assert result.metric_values == {}
|
| 305 |
+
assert result.failed_metrics == {}
|
| 306 |
+
assert result.all_succeeded is True
|
| 307 |
+
|
| 308 |
+
|
| 309 |
+
class TestEvaluationResultDataclass:
|
| 310 |
+
def test_with_global_failure_marks_all(self) -> None:
|
| 311 |
+
engine = EvaluationEngine(_build_metric_registry())
|
| 312 |
+
result = engine.evaluate(("cer", "wer"), "x", "x")
|
| 313 |
+
failed_all = result.with_global_failure("loader crashed")
|
| 314 |
+
assert failed_all.metric_values == {}
|
| 315 |
+
assert failed_all.failed_metrics == {
|
| 316 |
+
"cer": "loader crashed",
|
| 317 |
+
"wer": "loader crashed",
|
| 318 |
+
}
|
| 319 |
+
|
| 320 |
+
def test_dataclass_is_frozen(self) -> None:
|
| 321 |
+
result = EvaluationResult(metric_values={"cer": 0.0})
|
| 322 |
+
with pytest.raises(Exception): # FrozenInstanceError
|
| 323 |
+
result.metric_values = {} # type: ignore[misc]
|
| 324 |
+
|
| 325 |
+
|
| 326 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 327 |
+
# ProjectionResult dataclass
|
| 328 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 329 |
+
|
| 330 |
+
|
| 331 |
+
class TestProjectionResultDataclass:
|
| 332 |
+
def test_has_projection_property(self) -> None:
|
| 333 |
+
artifact = Artifact(id="a", document_id="d", type=ArtifactType.RAW_TEXT)
|
| 334 |
+
no_proj = ProjectionResult(artifact=artifact, payload=None, report=None)
|
| 335 |
+
assert no_proj.has_projection is False
|
| 336 |
+
|
| 337 |
+
report = ProjectionReport(
|
| 338 |
+
source_artifact_id="a",
|
| 339 |
+
source_type=ArtifactType.ALTO_XML,
|
| 340 |
+
target_type=ArtifactType.RAW_TEXT,
|
| 341 |
+
projector_name="x",
|
| 342 |
+
)
|
| 343 |
+
with_proj = ProjectionResult(
|
| 344 |
+
artifact=artifact, payload="text", report=report,
|
| 345 |
+
)
|
| 346 |
+
assert with_proj.has_projection is True
|
| 347 |
+
|
| 348 |
+
def test_dataclass_is_frozen(self) -> None:
|
| 349 |
+
artifact = Artifact(id="a", document_id="d", type=ArtifactType.RAW_TEXT)
|
| 350 |
+
result = ProjectionResult(artifact=artifact, payload=None, report=None)
|
| 351 |
+
with pytest.raises(Exception): # FrozenInstanceError
|
| 352 |
+
result.payload = "modified" # type: ignore[misc]
|
|
@@ -101,7 +101,7 @@ def _build_executor(payloads: dict[str, object]) -> DefaultEvaluationViewExecuto
|
|
| 101 |
raise KeyError(f"payload manquant : {artifact.id}")
|
| 102 |
return payloads[artifact.id]
|
| 103 |
|
| 104 |
-
return DefaultEvaluationViewExecutor(metrics, projectors, loader)
|
| 105 |
|
| 106 |
|
| 107 |
# ──────────────────────────────────────────────────────────────────────
|
|
@@ -285,7 +285,9 @@ class TestBnFCentralUseCase:
|
|
| 285 |
projectors.register(AltoToText())
|
| 286 |
projectors.register(PageToText())
|
| 287 |
projectors.register(CanonicalToText())
|
| 288 |
-
executor = DefaultEvaluationViewExecutor(
|
|
|
|
|
|
|
| 289 |
view = build_text_view()
|
| 290 |
|
| 291 |
gt = Artifact(id="gt_text", document_id="bnf_doc",
|
|
|
|
| 101 |
raise KeyError(f"payload manquant : {artifact.id}")
|
| 102 |
return payloads[artifact.id]
|
| 103 |
|
| 104 |
+
return DefaultEvaluationViewExecutor.from_registries(metrics, projectors, loader)
|
| 105 |
|
| 106 |
|
| 107 |
# ──────────────────────────────────────────────────────────────────────
|
|
|
|
| 285 |
projectors.register(AltoToText())
|
| 286 |
projectors.register(PageToText())
|
| 287 |
projectors.register(CanonicalToText())
|
| 288 |
+
executor = DefaultEvaluationViewExecutor.from_registries(
|
| 289 |
+
metrics, projectors, loader,
|
| 290 |
+
)
|
| 291 |
view = build_text_view()
|
| 292 |
|
| 293 |
gt = Artifact(id="gt_text", document_id="bnf_doc",
|
|
@@ -187,7 +187,7 @@ def _build_alto_executor(payloads: dict[str, AltoDocument]) -> DefaultEvaluation
|
|
| 187 |
raise KeyError(f"missing payload {art.id}")
|
| 188 |
return payloads[art.id]
|
| 189 |
|
| 190 |
-
return DefaultEvaluationViewExecutor(metrics, projectors, loader)
|
| 191 |
|
| 192 |
|
| 193 |
class TestAltoViewWithExecutor:
|
|
|
|
| 187 |
raise KeyError(f"missing payload {art.id}")
|
| 188 |
return payloads[art.id]
|
| 189 |
|
| 190 |
+
return DefaultEvaluationViewExecutor.from_registries(metrics, projectors, loader)
|
| 191 |
|
| 192 |
|
| 193 |
class TestAltoViewWithExecutor:
|
|
@@ -188,7 +188,7 @@ def _build_search_executor(payloads: dict[str, str]) -> DefaultEvaluationViewExe
|
|
| 188 |
raise KeyError(art.id)
|
| 189 |
return payloads[art.id]
|
| 190 |
|
| 191 |
-
return DefaultEvaluationViewExecutor(metrics, projectors, loader)
|
| 192 |
|
| 193 |
|
| 194 |
class TestSearchViewWithExecutor:
|
|
|
|
| 188 |
raise KeyError(art.id)
|
| 189 |
return payloads[art.id]
|
| 190 |
|
| 191 |
+
return DefaultEvaluationViewExecutor.from_registries(metrics, projectors, loader)
|
| 192 |
|
| 193 |
|
| 194 |
class TestSearchViewWithExecutor:
|
|
@@ -275,7 +275,9 @@ def _build_service(tmp_path: Path) -> tuple[BenchmarkService, dict[str, Path]]:
|
|
| 275 |
return parse_alto(Path(art.uri).read_bytes())
|
| 276 |
raise KeyError(f"loader ne sait pas charger {art.id} (type {art.type})")
|
| 277 |
|
| 278 |
-
view_executor = DefaultEvaluationViewExecutor(
|
|
|
|
|
|
|
| 279 |
|
| 280 |
# Pipeline executor + corpus runner.
|
| 281 |
registry_adapters = {
|
|
|
|
| 275 |
return parse_alto(Path(art.uri).read_bytes())
|
| 276 |
raise KeyError(f"loader ne sait pas charger {art.id} (type {art.type})")
|
| 277 |
|
| 278 |
+
view_executor = DefaultEvaluationViewExecutor.from_registries(
|
| 279 |
+
metrics, projectors, loader,
|
| 280 |
+
)
|
| 281 |
|
| 282 |
# Pipeline executor + corpus runner.
|
| 283 |
registry_adapters = {
|
|
@@ -368,7 +368,9 @@ def _build_service(tmp_path: Path) -> tuple[BenchmarkService, dict[str, Path]]:
|
|
| 368 |
return _CORRECTED_TEXTS[art.document_id]
|
| 369 |
raise KeyError(f"loader: type non géré pour {art.id} ({art.type})")
|
| 370 |
|
| 371 |
-
view_executor = DefaultEvaluationViewExecutor(
|
|
|
|
|
|
|
| 372 |
|
| 373 |
registry_adapters = {
|
| 374 |
"simple_ocr": _SimpleOCRStub(),
|
|
|
|
| 368 |
return _CORRECTED_TEXTS[art.document_id]
|
| 369 |
raise KeyError(f"loader: type non géré pour {art.id} ({art.type})")
|
| 370 |
|
| 371 |
+
view_executor = DefaultEvaluationViewExecutor.from_registries(
|
| 372 |
+
metrics, projectors, loader,
|
| 373 |
+
)
|
| 374 |
|
| 375 |
registry_adapters = {
|
| 376 |
"simple_ocr": _SimpleOCRStub(),
|
|
@@ -266,7 +266,7 @@ class TestPersistenceRoundTrip:
|
|
| 266 |
from picarones.evaluation.views import DefaultEvaluationViewExecutor
|
| 267 |
from picarones.pipeline import CorpusRunner, PipelineExecutor
|
| 268 |
loader = lambda art: "" # noqa: E731 — non appelé par persist
|
| 269 |
-
view_executor = DefaultEvaluationViewExecutor(
|
| 270 |
MetricRegistry(), ProjectorRegistry(), loader,
|
| 271 |
)
|
| 272 |
runner = CorpusRunner(
|
|
@@ -296,7 +296,7 @@ class TestPersistenceRoundTrip:
|
|
| 296 |
from picarones.evaluation.views import DefaultEvaluationViewExecutor
|
| 297 |
from picarones.pipeline import CorpusRunner, PipelineExecutor
|
| 298 |
loader = lambda art: "" # noqa: E731
|
| 299 |
-
view_executor = DefaultEvaluationViewExecutor(
|
| 300 |
MetricRegistry(), ProjectorRegistry(), loader,
|
| 301 |
)
|
| 302 |
runner = CorpusRunner(
|
|
|
|
| 266 |
from picarones.evaluation.views import DefaultEvaluationViewExecutor
|
| 267 |
from picarones.pipeline import CorpusRunner, PipelineExecutor
|
| 268 |
loader = lambda art: "" # noqa: E731 — non appelé par persist
|
| 269 |
+
view_executor = DefaultEvaluationViewExecutor.from_registries(
|
| 270 |
MetricRegistry(), ProjectorRegistry(), loader,
|
| 271 |
)
|
| 272 |
runner = CorpusRunner(
|
|
|
|
| 296 |
from picarones.evaluation.views import DefaultEvaluationViewExecutor
|
| 297 |
from picarones.pipeline import CorpusRunner, PipelineExecutor
|
| 298 |
loader = lambda art: "" # noqa: E731
|
| 299 |
+
view_executor = DefaultEvaluationViewExecutor.from_registries(
|
| 300 |
MetricRegistry(), ProjectorRegistry(), loader,
|
| 301 |
)
|
| 302 |
runner = CorpusRunner(
|
|
@@ -280,7 +280,7 @@ class TestSmokeIntegration:
|
|
| 280 |
svc = RegistryService.bootstrap_defaults()
|
| 281 |
|
| 282 |
loader = lambda art: "" # noqa: E731 — non appelé ici
|
| 283 |
-
executor = DefaultEvaluationViewExecutor(
|
| 284 |
svc.metrics, svc.projectors, loader,
|
| 285 |
)
|
| 286 |
assert executor is not None # si le constructeur passe, c'est OK
|
|
|
|
| 280 |
svc = RegistryService.bootstrap_defaults()
|
| 281 |
|
| 282 |
loader = lambda art: "" # noqa: E731 — non appelé ici
|
| 283 |
+
executor = DefaultEvaluationViewExecutor.from_registries(
|
| 284 |
svc.metrics, svc.projectors, loader,
|
| 285 |
)
|
| 286 |
assert executor is not None # si le constructeur passe, c'est OK
|