Spaces:
Sleeping
phaseE: séparation core/ (Cercle 1) + measurements/ (Cercle 2)
Browse filesCinquième phase de la refonte en 3 cercles. Cible : isoler
physiquement le **noyau invariant** (abstractions du domaine,
orchestration) de toutes les **mesures officielles** qui s'empilaient
historiquement dans ``picarones/core/``.
Distinction conceptuelle (alignement DDD)
-----------------------------------------
- **Cercle 1 (``core/``)** : abstractions du domaine + orchestration.
Indépendantes de l'interface utilisateur. API publique stable, ne
cassent pas entre versions mineures.
- **Cercle 2 (``measurements/``)** : mesures et analyses au-delà du
noyau. Maintenues mais peuvent évoluer.
Critère de la doc ``architecture-cercles.md`` corrigé : le précédent
critère « si on supprime ce module, le produit reste viable » mêlait
deux questions distinctes. Remplacé par le critère DDD propre
(domaine vs adapters/présentation).
Migration physique
------------------
**41 modules métriques** déplacés de ``picarones/core/`` vers
``picarones/measurements/`` :
- baseline_comparison, builtin_hooks, calibration, char_scores,
confusion, cost_projection, difficulty, equivalence_profile,
error_absorption, hallucination, history, image_quality,
incremental_comparison, inter_engine, layout, levers, line_metrics,
longitudinal, marginal_cost, ner, ner_backends, normalization,
numerical_sequences, numerical_sequences_runner, pricing,
rare_tokens, readability, readability_runner, reading_order,
reliability, robustness, robustness_projection, searchability,
searchability_runner, specialization, statistics, structure,
taxonomy, taxonomy_comparison, throughput, worst_lines.
**Sous-package narrative/** (4 modules + 6 familles de détecteurs +
helper + templates) déplacé vers ``picarones/measurements/narrative/`` :
- facts, registry, arbiter, renderer (4 modules)
- detectors/{_helpers,ranking,pareto,stratum,quality,history,ensemble}.py
(7 fichiers)
- templates/{fr,en}.yaml (2 fichiers data, copiés en l'état)
Imports internes du sous-package narrative mis à jour pour pointer
vers ``picarones.measurements.narrative.X`` au lieu de
``picarones.core.narrative.X`` (self-contained, ne dépend plus des
shims).
53 shims rétrocompat dans ``picarones/core/``
---------------------------------------------
- 41 shims pour les modules métriques (1 par module).
- 4 shims pour les modules narrative top-level.
- 7 shims pour les détecteurs des 6 familles.
- 1 shim pour ``narrative/__init__.py``.
- 1 shim pour ``narrative/detectors/__init__.py``.
Total : 53 fichiers-shims de 16 lignes chacun. Comportement
strictement identique (réexport via wildcard ``import *``).
Bonus : correction d'un bug préexistant
---------------------------------------
``_mean_duration_per_engine`` était utilisé par ``detect_speed_winner``
mais absent de ``_helpers.py`` depuis le découpage du chantier 5
(narrative/detectors.py 1229L → 6 familles). Réintroduit dans
``measurements/narrative/detectors/_helpers.py``. ``detect_speed_winner``
fonctionne à nouveau sans warning ``fonctionnalité dégradée``.
État final ``picarones/core/``
------------------------------
14 modules Cercle 1 strict :
alto_metrics.py, builtin_metrics.py, corpus.py, jobs.py,
metric_hooks.py, metric_registry.py, metrics.py, modules.py,
pipeline_benchmark.py, pipeline_comparison.py, pipeline_runner.py,
pipeline_spec_loader.py, results.py, runner.py.
+ 55 shims rétrocompat (incluant les shims des phases A et B).
État final ``picarones/measurements/``
--------------------------------------
42 modules + sous-package narrative complet (15 fichiers Python
+ 2 fichiers YAML data).
Validation (sandbox sans pytest)
--------------------------------
- ``import picarones.core.X`` fonctionne pour tous les modules
déplacés (paramétrage 20 cas testés).
- Identité préservée : ``shim.f is measurements.f`` (3 paires testées,
dont ``Fact`` du moteur narratif).
- 18 détecteurs accessibles via ``picarones.core.narrative.detectors``
(rétrocompat tests Sprint 20, 23, 29, 36, 44, 46, 73).
- 12 hooks document-level + 12 agrégateurs corpus-level enregistrés.
- Métriques (ALTO, ALTO) découvrables via le registre typé.
- ``build_synthesis()`` produit des phrases factuelles.
- Vues du chantier 3 (``advanced_taxonomy``) produisent du HTML.
Tests
-----
+275 lignes dans tests/test_phaseE_migration.py organisés en 8 classes :
TestMeasurementsRetrocompat (parametrize sur 20 modules),
TestNarrativePackageMigration, TestIdentityThroughShim,
TestCoreIsLean (parametrize sur les 13 modules Cercle 1 stricts),
TestHooksStillRegistered, TestNarrativeIntegration,
TestChantier3ViewsAfterPhaseE, TestArchitectureCerclesDocUpdated.
Bilan cumulé phases A + B + C + E
---------------------------------
- ``picarones/core/`` : 66 modules → 14 modules réels + 55 shims.
- ``picarones/extras/`` : 24 modules + 6 renderers (Cercle 3).
- ``picarones/measurements/`` : 42 modules + sous-package narrative
complet (Cercle 2).
- ``picarones/importers/`` : 6 shims (vers extras/importers/).
- Aucune fonctionnalité supprimée. Aucun import historique cassé.
Phase suivante (D)
------------------
- ``docs/api-stable.md`` listant l'API publique du Cercle 1.
- ``test_public_api.py`` qui échoue si un nom du Cercle 1 disparaît.
- Statuer sur la version : ``2.0.0`` (refonte architecturale)
ou ``1.3.0`` (extension non-breaking) selon politique semver.
- docs/architecture-cercles.md +32 -9
- picarones/core/baseline_comparison.py +14 -224
- picarones/core/builtin_hooks.py +14 -577
- picarones/core/calibration.py +14 -318
- picarones/core/char_scores.py +14 -365
- picarones/core/confusion.py +14 -263
- picarones/core/cost_projection.py +14 -164
- picarones/core/difficulty.py +14 -197
- picarones/core/equivalence_profile.py +14 -194
- picarones/core/error_absorption.py +14 -271
- picarones/core/hallucination.py +15 -327
- picarones/core/history.py +14 -610
- picarones/core/image_quality.py +14 -386
- picarones/core/incremental_comparison.py +14 -248
- picarones/core/inter_engine.py +14 -479
- picarones/core/layout.py +14 -275
- picarones/core/levers.py +14 -556
- picarones/core/line_metrics.py +15 -282
- picarones/core/longitudinal.py +14 -368
- picarones/core/marginal_cost.py +14 -137
- picarones/core/narrative/__init__.py +11 -74
- picarones/core/narrative/arbiter.py +9 -223
- picarones/core/narrative/detectors/__init__.py +9 -125
- picarones/core/narrative/detectors/_helpers.py +9 -27
- picarones/core/narrative/detectors/ensemble.py +9 -92
- picarones/core/narrative/detectors/history.py +9 -276
- picarones/core/narrative/detectors/pareto.py +9 -132
- picarones/core/narrative/detectors/quality.py +9 -247
- picarones/core/narrative/detectors/ranking.py +9 -275
- picarones/core/narrative/detectors/stratum.py +9 -199
- picarones/core/narrative/facts.py +9 -208
- picarones/core/narrative/registry.py +9 -213
- picarones/core/narrative/renderer.py +9 -101
- picarones/core/ner.py +14 -304
- picarones/core/ner_backends.py +14 -222
- picarones/core/normalization.py +14 -415
- picarones/core/numerical_sequences.py +14 -417
- picarones/core/numerical_sequences_runner.py +14 -97
- picarones/core/pricing.py +14 -304
- picarones/core/rare_tokens.py +14 -249
- picarones/core/readability.py +14 -247
- picarones/core/readability_runner.py +14 -109
- picarones/core/reading_order.py +14 -191
- picarones/core/reliability.py +14 -355
- picarones/core/robustness.py +14 -726
- picarones/core/robustness_projection.py +14 -282
- picarones/core/searchability.py +14 -220
- picarones/core/searchability_runner.py +14 -76
- picarones/core/specialization.py +14 -182
- picarones/core/statistics.py +15 -1123
|
@@ -156,17 +156,40 @@ elles-mêmes dans `report/views/`, donc Cercle 2).
|
|
| 156 |
|
| 157 |
## Distinguer un module Cercle 1 vs Cercle 2
|
| 158 |
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
|
| 166 |
Exemple :
|
| 167 |
-
-
|
| 168 |
-
-
|
| 169 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
- Sans `taxonomy_intra_doc.py` : on a toujours un bench complet et
|
| 171 |
utile → Cercle 3.
|
| 172 |
|
|
|
|
| 156 |
|
| 157 |
## Distinguer un module Cercle 1 vs Cercle 2
|
| 158 |
|
| 159 |
+
Critère **corrigé** (alignement architecture hexagonale / DDD) :
|
| 160 |
+
|
| 161 |
+
> **Cercle 1 = abstractions et logique métier du domaine,
|
| 162 |
+
> indépendantes de l'interface utilisateur. Stables entre versions
|
| 163 |
+
> mineures.**
|
| 164 |
+
>
|
| 165 |
+
> **Cercle 2 = adapters concrets (engines, LLM, modules de référence),
|
| 166 |
+
> couches d'interface (report, cli, web), et mesures au-delà du noyau
|
| 167 |
+
> (measurements). Maintenus mais peuvent évoluer.**
|
| 168 |
+
|
| 169 |
+
Le critère « si on supprime ce module, le produit reste viable »
|
| 170 |
+
mélange deux questions distinctes (« est-ce indispensable ? » et
|
| 171 |
+
« est-ce une abstraction stable ? »). On préfère le critère DDD :
|
| 172 |
+
|
| 173 |
+
- **Cercle 1** : abstractions et orchestration qui définissent ce
|
| 174 |
+
que Picarones *est* logiquement (corpus, BaseModule, registres,
|
| 175 |
+
runner). Indépendant de l'interface utilisateur.
|
| 176 |
+
- **Cercle 2** : ce qui rend le domaine utilisable concrètement
|
| 177 |
+
(adapters, mesures, présentation HTML, CLI).
|
| 178 |
|
| 179 |
Exemple :
|
| 180 |
+
- `corpus.py` → Cercle 1 (abstraction du domaine).
|
| 181 |
+
- `runner.py` → Cercle 1 (orchestration du domaine).
|
| 182 |
+
- `confusion.py` → Cercle 2 (mesure au-delà du noyau, dans
|
| 183 |
+
``measurements/``).
|
| 184 |
+
- `report/generator.py` → Cercle 2 (couche de présentation, même si
|
| 185 |
+
essentielle à l'usage pratique).
|
| 186 |
+
- `engines/tesseract.py` → Cercle 2 (adapter concret).
|
| 187 |
+
|
| 188 |
+
> Note : la convention « `base.py` dans le dossier du concept »
|
| 189 |
+
> (`engines/base.py`, `llm/base.py`) reste dans son dossier d'origine.
|
| 190 |
+
> Ces contrats sont logiquement Cercle 1 (API publique stable) mais
|
| 191 |
+
> physiquement co-localisés avec leurs implémentations, comme dans
|
| 192 |
+
> Django, SQLAlchemy, FastAPI. Convention universelle Python.
|
| 193 |
- Sans `taxonomy_intra_doc.py` : on a toujours un bench complet et
|
| 194 |
utile → Cercle 3.
|
| 195 |
|
|
@@ -1,229 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
existe mais aucun détecteur narratif ne le lit. Ce module fournit
|
| 9 |
-
la couche de calcul qui répond à *« comment ce moteur se
|
| 10 |
-
comporte-t-il sur ce corpus, **par rapport à ses runs précédents
|
| 11 |
-
de mon institution** ? »*.
|
| 12 |
-
|
| 13 |
-
Sortie typique
|
| 14 |
-
--------------
|
| 15 |
-
Un dict par moteur :
|
| 16 |
-
|
| 17 |
-
.. code-block:: python
|
| 18 |
-
|
| 19 |
-
{
|
| 20 |
-
"engine_name": "tesseract",
|
| 21 |
-
"cer_current": 0.052,
|
| 22 |
-
"cer_historical_mean": 0.041,
|
| 23 |
-
"cer_historical_median": 0.040,
|
| 24 |
-
"n_runs": 12,
|
| 25 |
-
"absolute_delta": 0.011,
|
| 26 |
-
"relative_delta": 0.268, # +26,8 % vs moyenne
|
| 27 |
-
"off_baseline": True,
|
| 28 |
-
}
|
| 29 |
-
|
| 30 |
-
Le détecteur narratif ``engine_off_baseline`` (Sprint 73)
|
| 31 |
-
consomme cette structure pour émettre des Facts.
|
| 32 |
-
|
| 33 |
-
Garde-fous
|
| 34 |
-
----------
|
| 35 |
-
- ``min_runs`` (défaut 5) : si l'historique pour le moteur×corpus
|
| 36 |
-
contient moins de runs, on retourne ``None`` plutôt que de
|
| 37 |
-
comparer à un échantillon trop petit.
|
| 38 |
-
- ``corpus_name`` est utilisé pour ne comparer qu'aux runs **du
|
| 39 |
-
même corpus** (sinon on compare des pommes et des oranges :
|
| 40 |
-
registres paroissiaux vs imprimés modernes).
|
| 41 |
-
- Le run courant lui-même n'est pas inclus dans la baseline (on
|
| 42 |
-
passe le ``current_run_id`` à exclure).
|
| 43 |
"""
|
| 44 |
|
| 45 |
-
from
|
| 46 |
-
|
| 47 |
-
import logging
|
| 48 |
-
import statistics
|
| 49 |
-
from typing import Optional
|
| 50 |
-
|
| 51 |
-
logger = logging.getLogger(__name__)
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
def compute_engine_baseline(
|
| 55 |
-
history,
|
| 56 |
-
engine_name: str,
|
| 57 |
-
corpus_name: str,
|
| 58 |
-
current_cer: float,
|
| 59 |
-
*,
|
| 60 |
-
current_run_id: Optional[str] = None,
|
| 61 |
-
min_runs: int = 5,
|
| 62 |
-
relative_delta_threshold: float = 0.20,
|
| 63 |
-
) -> Optional[dict]:
|
| 64 |
-
"""Compare le CER courant d'un moteur à sa moyenne historique
|
| 65 |
-
sur le **même corpus**.
|
| 66 |
-
|
| 67 |
-
Parameters
|
| 68 |
-
----------
|
| 69 |
-
history:
|
| 70 |
-
Instance de ``BenchmarkHistory`` (ou compatible : doit
|
| 71 |
-
exposer une méthode ``query(engine, corpus, limit)``
|
| 72 |
-
retournant une liste d'``HistoryEntry`` avec attribut
|
| 73 |
-
``cer_mean`` et ``run_id``).
|
| 74 |
-
engine_name:
|
| 75 |
-
Nom du moteur dont on calcule la baseline.
|
| 76 |
-
corpus_name:
|
| 77 |
-
Nom du corpus — limite la comparaison aux runs antérieurs
|
| 78 |
-
sur ce même corpus.
|
| 79 |
-
current_cer:
|
| 80 |
-
CER moyen observé dans le run courant.
|
| 81 |
-
current_run_id:
|
| 82 |
-
Si fourni, le run portant cet identifiant est exclu de la
|
| 83 |
-
baseline (utile quand le run courant est déjà enregistré
|
| 84 |
-
dans l'historique avant d'appeler ce calcul).
|
| 85 |
-
min_runs:
|
| 86 |
-
Nombre minimum de runs historiques pour que la
|
| 87 |
-
comparaison soit considérée fiable. Sous ce seuil, on
|
| 88 |
-
retourne ``None``.
|
| 89 |
-
relative_delta_threshold:
|
| 90 |
-
Seuil au-delà duquel ``off_baseline`` vaut ``True``
|
| 91 |
-
(défaut : 0,20 = 20 % d'écart relatif).
|
| 92 |
-
|
| 93 |
-
Returns
|
| 94 |
-
-------
|
| 95 |
-
Optional[dict]
|
| 96 |
-
``None`` si :
|
| 97 |
-
- moins de ``min_runs`` runs historiques disponibles
|
| 98 |
-
- ``current_cer`` est ``None`` ou négatif
|
| 99 |
-
- tous les CER historiques sont ``None``
|
| 100 |
-
|
| 101 |
-
Sinon, dict avec les champs documentés dans le module.
|
| 102 |
-
"""
|
| 103 |
-
if current_cer is None or current_cer < 0:
|
| 104 |
-
return None
|
| 105 |
-
try:
|
| 106 |
-
entries = history.query(
|
| 107 |
-
engine=engine_name, corpus=corpus_name, limit=1000,
|
| 108 |
-
)
|
| 109 |
-
except Exception as exc: # pragma: no cover — défense
|
| 110 |
-
logger.warning(
|
| 111 |
-
"[baseline_comparison] query history a levé : %s", exc,
|
| 112 |
-
)
|
| 113 |
-
return None
|
| 114 |
-
|
| 115 |
-
historical_cers: list[float] = []
|
| 116 |
-
for entry in entries:
|
| 117 |
-
if current_run_id is not None and entry.run_id == current_run_id:
|
| 118 |
-
continue
|
| 119 |
-
cer = entry.cer_mean
|
| 120 |
-
if cer is None or cer < 0:
|
| 121 |
-
continue
|
| 122 |
-
historical_cers.append(float(cer))
|
| 123 |
-
|
| 124 |
-
if len(historical_cers) < min_runs:
|
| 125 |
-
return None
|
| 126 |
-
|
| 127 |
-
mean = statistics.fmean(historical_cers)
|
| 128 |
-
median = statistics.median(historical_cers)
|
| 129 |
-
absolute_delta = current_cer - mean
|
| 130 |
-
if mean > 0:
|
| 131 |
-
relative_delta = absolute_delta / mean
|
| 132 |
-
elif current_cer == 0:
|
| 133 |
-
relative_delta = 0.0
|
| 134 |
-
else:
|
| 135 |
-
# Baseline à 0 mais CER courant > 0 : écart infini —
|
| 136 |
-
# convention : on signale comme off_baseline avec
|
| 137 |
-
# relative_delta = None.
|
| 138 |
-
relative_delta = None
|
| 139 |
-
|
| 140 |
-
off_baseline = (
|
| 141 |
-
relative_delta is not None
|
| 142 |
-
and abs(relative_delta) > relative_delta_threshold
|
| 143 |
-
)
|
| 144 |
-
|
| 145 |
-
return {
|
| 146 |
-
"engine_name": engine_name,
|
| 147 |
-
"corpus_name": corpus_name,
|
| 148 |
-
"cer_current": float(current_cer),
|
| 149 |
-
"cer_historical_mean": mean,
|
| 150 |
-
"cer_historical_median": median,
|
| 151 |
-
"n_runs": len(historical_cers),
|
| 152 |
-
"absolute_delta": absolute_delta,
|
| 153 |
-
"relative_delta": relative_delta,
|
| 154 |
-
"off_baseline": off_baseline,
|
| 155 |
-
}
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
def compute_corpus_difficulty_percentile(
|
| 159 |
-
history,
|
| 160 |
-
current_difficulty: float,
|
| 161 |
-
*,
|
| 162 |
-
min_runs: int = 5,
|
| 163 |
-
) -> Optional[dict]:
|
| 164 |
-
"""Place la difficulté du corpus courant dans la distribution
|
| 165 |
-
des difficultés historiques.
|
| 166 |
-
|
| 167 |
-
Lit les difficultés stockées dans ``HistoryEntry.metadata``
|
| 168 |
-
sous la clé ``difficulty`` (convention de
|
| 169 |
-
``picarones/core/difficulty.py``).
|
| 170 |
-
|
| 171 |
-
Returns
|
| 172 |
-
-------
|
| 173 |
-
Optional[dict]
|
| 174 |
-
``{
|
| 175 |
-
"current_difficulty": float,
|
| 176 |
-
"percentile": float, # 0..100
|
| 177 |
-
"n_runs": int,
|
| 178 |
-
"median_historical": float,
|
| 179 |
-
"harder_than_usual": bool, # percentile > 75
|
| 180 |
-
"easier_than_usual": bool, # percentile < 25
|
| 181 |
-
}``
|
| 182 |
-
ou ``None`` si moins de ``min_runs`` runs historiques ont
|
| 183 |
-
une difficulté enregistrée.
|
| 184 |
-
"""
|
| 185 |
-
if current_difficulty is None:
|
| 186 |
-
return None
|
| 187 |
-
try:
|
| 188 |
-
entries = history.query(limit=1000)
|
| 189 |
-
except Exception as exc: # pragma: no cover
|
| 190 |
-
logger.warning(
|
| 191 |
-
"[baseline_comparison] query history a levé : %s", exc,
|
| 192 |
-
)
|
| 193 |
-
return None
|
| 194 |
-
|
| 195 |
-
historical_difficulties: list[float] = []
|
| 196 |
-
for entry in entries:
|
| 197 |
-
diff = entry.metadata.get("difficulty") if entry.metadata else None
|
| 198 |
-
if diff is None:
|
| 199 |
-
continue
|
| 200 |
-
try:
|
| 201 |
-
historical_difficulties.append(float(diff))
|
| 202 |
-
except (TypeError, ValueError):
|
| 203 |
-
continue
|
| 204 |
-
|
| 205 |
-
if len(historical_difficulties) < min_runs:
|
| 206 |
-
return None
|
| 207 |
-
|
| 208 |
-
sorted_diff = sorted(historical_difficulties)
|
| 209 |
-
n = len(sorted_diff)
|
| 210 |
-
# Percentile = % de corpus historiques de difficulté ≤
|
| 211 |
-
# current_difficulty. Convention courante (P_i = i/n × 100).
|
| 212 |
-
n_below = sum(1 for d in sorted_diff if d <= current_difficulty)
|
| 213 |
-
percentile = (n_below / n) * 100.0
|
| 214 |
-
median = statistics.median(sorted_diff)
|
| 215 |
-
|
| 216 |
-
return {
|
| 217 |
-
"current_difficulty": float(current_difficulty),
|
| 218 |
-
"percentile": percentile,
|
| 219 |
-
"n_runs": n,
|
| 220 |
-
"median_historical": median,
|
| 221 |
-
"harder_than_usual": percentile > 75.0,
|
| 222 |
-
"easier_than_usual": percentile < 25.0,
|
| 223 |
-
}
|
| 224 |
-
|
| 225 |
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
"
|
| 229 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.baseline_comparison`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.baseline_comparison import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.baseline_comparison import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.baseline_comparison as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,582 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
boucle d'agrégation (lignes 794-827 du runner pré-chantier-2).
|
| 9 |
-
|
| 10 |
-
Approche additive — rétrocompat stricte
|
| 11 |
-
---------------------------------------
|
| 12 |
-
Tous les hooks sont enregistrés sur les profils ``standard``,
|
| 13 |
-
``philological``, ``diagnostics`` et ``full`` (i.e. activés par
|
| 14 |
-
défaut quand le runner est appelé sans paramètre ``profile``). Le
|
| 15 |
-
profil ``minimal`` n'active aucun hook (pour bench massif où seul
|
| 16 |
-
CER/WER comptent). Les profils ``economics`` et ``pipeline`` sont
|
| 17 |
-
réservés pour des hooks futurs.
|
| 18 |
-
|
| 19 |
-
L'import de ce module **suffit** à peupler les registres :
|
| 20 |
-
:mod:`picarones.core.metric_hooks` se contente d'exposer les
|
| 21 |
-
décorateurs ; le runner ne dépend que d'une seule fonction —
|
| 22 |
-
``select_document_hooks(profile)`` — pour découvrir les hooks actifs.
|
| 23 |
-
|
| 24 |
-
Liste complète des hooks (Sprint d'origine)
|
| 25 |
-
-------------------------------------------
|
| 26 |
-
**Document-level** (12) :
|
| 27 |
-
|
| 28 |
-
- ``confusion`` (Sprint 5) — ``confusion_matrix``
|
| 29 |
-
- ``char_scores`` (Sprint 5) — ``char_scores``
|
| 30 |
-
- ``taxonomy`` (Sprint 5) — ``taxonomy``
|
| 31 |
-
- ``structure`` (Sprint 5) — ``structure``
|
| 32 |
-
- ``image_quality`` (Sprint 5) — ``image_quality``
|
| 33 |
-
- ``line_metrics`` (Sprint 10) — ``line_metrics``
|
| 34 |
-
- ``hallucination`` (Sprint 10) — ``hallucination_metrics``
|
| 35 |
-
- ``calibration`` (Sprint 42) — ``calibration_metrics``
|
| 36 |
-
- ``philological`` (Sprint 61) — ``philological_metrics``
|
| 37 |
-
- ``searchability`` (Sprint 86) — ``searchability_metrics``
|
| 38 |
-
- ``numerical_sequences`` (Sprint 86) — ``numerical_sequence_metrics``
|
| 39 |
-
- ``readability`` (Sprint 87) — ``readability_metrics``
|
| 40 |
-
|
| 41 |
-
**Corpus-level** (12) : un agrégateur par hook documentaire,
|
| 42 |
-
remplissant le champ ``aggregated_*`` correspondant du
|
| 43 |
-
``EngineReport``.
|
| 44 |
-
|
| 45 |
-
Le hook ``ner`` (Sprint 40) reste hors de ce mécanisme : il dépend
|
| 46 |
-
d'un ``EntityExtractor`` injecté à la main par l'utilisateur, ce
|
| 47 |
-
qui n'entre pas dans la sémantique des profils.
|
| 48 |
"""
|
| 49 |
|
| 50 |
-
from
|
| 51 |
-
|
| 52 |
-
import logging
|
| 53 |
-
from collections import Counter
|
| 54 |
-
from typing import Any, Optional
|
| 55 |
-
|
| 56 |
-
from picarones.core.metric_hooks import (
|
| 57 |
-
PROFILE_DIAGNOSTICS,
|
| 58 |
-
PROFILE_FULL,
|
| 59 |
-
PROFILE_PHILOLOGICAL,
|
| 60 |
-
PROFILE_STANDARD,
|
| 61 |
-
register_corpus_aggregator,
|
| 62 |
-
register_document_metric,
|
| 63 |
-
)
|
| 64 |
-
|
| 65 |
-
logger = logging.getLogger(__name__)
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
# Profils dans lesquels les 12 hooks "standard" s'activent. Égalent
|
| 69 |
-
# par construction le comportement runner pré-chantier-2 ; le profil
|
| 70 |
-
# ``minimal`` est volontairement absent.
|
| 71 |
-
_STANDARD_PROFILES = (
|
| 72 |
-
PROFILE_STANDARD,
|
| 73 |
-
PROFILE_PHILOLOGICAL,
|
| 74 |
-
PROFILE_DIAGNOSTICS,
|
| 75 |
-
PROFILE_FULL,
|
| 76 |
-
)
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 80 |
-
# Helper de calibration (déplacé depuis runner.py — chantier 2)
|
| 81 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
def calibration_from_engine_result(
|
| 85 |
-
ground_truth: str,
|
| 86 |
-
token_confidences: list,
|
| 87 |
-
) -> Optional[dict]:
|
| 88 |
-
"""Aligne les ``token_confidences`` du moteur sur la GT (bag-of-words)
|
| 89 |
-
pour produire les listes parallèles ``confidences`` / ``is_correct``,
|
| 90 |
-
puis appelle ``compute_calibration_metrics`` (Sprint 39).
|
| 91 |
-
|
| 92 |
-
Convention d'alignement (proxy bag-of-words avec multiplicité, comme
|
| 93 |
-
``oracle_token_recall`` du Sprint 35) : un token de l'hypothèse est
|
| 94 |
-
"correct" si la GT contient encore une occurrence de ce token.
|
| 95 |
-
|
| 96 |
-
Les confidences ``> 1.0`` sont supposées en pourcentage et
|
| 97 |
-
normalisées à ``[0, 1]``. Les confidences négatives (Tesseract met
|
| 98 |
-
-1 pour les non-mots) sont ignorées.
|
| 99 |
-
"""
|
| 100 |
-
from picarones.core.calibration import compute_calibration_metrics
|
| 101 |
-
|
| 102 |
-
if not token_confidences:
|
| 103 |
-
return None
|
| 104 |
-
|
| 105 |
-
gt_counter = Counter((ground_truth or "").split())
|
| 106 |
-
confidences: list[float] = []
|
| 107 |
-
is_correct: list[int] = []
|
| 108 |
-
|
| 109 |
-
for tc in token_confidences:
|
| 110 |
-
if not isinstance(tc, dict):
|
| 111 |
-
continue
|
| 112 |
-
token = str(tc.get("token", ""))
|
| 113 |
-
if not token:
|
| 114 |
-
continue
|
| 115 |
-
try:
|
| 116 |
-
conf = float(tc.get("confidence"))
|
| 117 |
-
except (TypeError, ValueError):
|
| 118 |
-
continue
|
| 119 |
-
if conf < 0:
|
| 120 |
-
continue
|
| 121 |
-
if conf > 1.0:
|
| 122 |
-
conf = conf / 100.0
|
| 123 |
-
if not 0.0 <= conf <= 1.0:
|
| 124 |
-
continue
|
| 125 |
-
if gt_counter[token] > 0:
|
| 126 |
-
is_correct.append(1)
|
| 127 |
-
gt_counter[token] -= 1
|
| 128 |
-
else:
|
| 129 |
-
is_correct.append(0)
|
| 130 |
-
confidences.append(conf)
|
| 131 |
-
|
| 132 |
-
if not confidences:
|
| 133 |
-
return None
|
| 134 |
-
return compute_calibration_metrics(confidences, is_correct)
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 138 |
-
# Document-level hooks (12)
|
| 139 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
@register_document_metric(
|
| 143 |
-
name="confusion",
|
| 144 |
-
attribute="confusion_matrix",
|
| 145 |
-
profiles=_STANDARD_PROFILES,
|
| 146 |
-
requires_success=True,
|
| 147 |
-
)
|
| 148 |
-
def _confusion_hook(*, ground_truth, hypothesis, **_):
|
| 149 |
-
from picarones.core.confusion import build_confusion_matrix
|
| 150 |
-
return build_confusion_matrix(ground_truth, hypothesis).as_dict()
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
@register_document_metric(
|
| 154 |
-
name="char_scores",
|
| 155 |
-
attribute="char_scores",
|
| 156 |
-
profiles=_STANDARD_PROFILES,
|
| 157 |
-
requires_success=True,
|
| 158 |
-
)
|
| 159 |
-
def _char_scores_hook(*, ground_truth, hypothesis, **_):
|
| 160 |
-
from picarones.core.char_scores import (
|
| 161 |
-
compute_diacritic_score,
|
| 162 |
-
compute_ligature_score,
|
| 163 |
-
)
|
| 164 |
-
lig = compute_ligature_score(ground_truth, hypothesis)
|
| 165 |
-
diac = compute_diacritic_score(ground_truth, hypothesis)
|
| 166 |
-
return {"ligature": lig.as_dict(), "diacritic": diac.as_dict()}
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
@register_document_metric(
|
| 170 |
-
name="taxonomy",
|
| 171 |
-
attribute="taxonomy",
|
| 172 |
-
profiles=_STANDARD_PROFILES,
|
| 173 |
-
requires_success=True,
|
| 174 |
-
)
|
| 175 |
-
def _taxonomy_hook(*, ground_truth, hypothesis, **_):
|
| 176 |
-
from picarones.core.taxonomy import classify_errors
|
| 177 |
-
return classify_errors(ground_truth, hypothesis).as_dict()
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
@register_document_metric(
|
| 181 |
-
name="structure",
|
| 182 |
-
attribute="structure",
|
| 183 |
-
profiles=_STANDARD_PROFILES,
|
| 184 |
-
requires_success=True,
|
| 185 |
-
)
|
| 186 |
-
def _structure_hook(*, ground_truth, hypothesis, **_):
|
| 187 |
-
from picarones.core.structure import analyze_structure
|
| 188 |
-
return analyze_structure(ground_truth, hypothesis).as_dict()
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
@register_document_metric(
|
| 192 |
-
name="line_metrics",
|
| 193 |
-
attribute="line_metrics",
|
| 194 |
-
profiles=_STANDARD_PROFILES,
|
| 195 |
-
requires_success=True,
|
| 196 |
-
)
|
| 197 |
-
def _line_metrics_hook(*, ground_truth, hypothesis, **_):
|
| 198 |
-
from picarones.core.line_metrics import compute_line_metrics
|
| 199 |
-
return compute_line_metrics(ground_truth, hypothesis).as_dict()
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
@register_document_metric(
|
| 203 |
-
name="hallucination",
|
| 204 |
-
attribute="hallucination_metrics",
|
| 205 |
-
profiles=_STANDARD_PROFILES,
|
| 206 |
-
requires_success=True,
|
| 207 |
-
)
|
| 208 |
-
def _hallucination_hook(*, ground_truth, hypothesis, **_):
|
| 209 |
-
from picarones.core.hallucination import compute_hallucination_metrics
|
| 210 |
-
return compute_hallucination_metrics(ground_truth, hypothesis).as_dict()
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
@register_document_metric(
|
| 214 |
-
name="calibration",
|
| 215 |
-
attribute="calibration_metrics",
|
| 216 |
-
profiles=_STANDARD_PROFILES,
|
| 217 |
-
requires_token_confidences=True,
|
| 218 |
-
)
|
| 219 |
-
def _calibration_hook(*, ground_truth, ocr_result, **_):
|
| 220 |
-
return calibration_from_engine_result(
|
| 221 |
-
ground_truth, ocr_result.token_confidences,
|
| 222 |
-
)
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
@register_document_metric(
|
| 226 |
-
name="image_quality",
|
| 227 |
-
attribute="image_quality",
|
| 228 |
-
profiles=_STANDARD_PROFILES,
|
| 229 |
-
# Pas de requires_success : on analyse l'image quel que soit le
|
| 230 |
-
# résultat OCR (pour comparer un échec OCR à la qualité image).
|
| 231 |
-
)
|
| 232 |
-
def _image_quality_hook(*, image_path, **_):
|
| 233 |
-
from picarones.core.image_quality import analyze_image_quality
|
| 234 |
-
iq = analyze_image_quality(image_path)
|
| 235 |
-
if iq.error is not None:
|
| 236 |
-
return None
|
| 237 |
-
return iq.as_dict()
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
@register_document_metric(
|
| 241 |
-
name="philological",
|
| 242 |
-
attribute="philological_metrics",
|
| 243 |
-
profiles=_STANDARD_PROFILES,
|
| 244 |
-
# Pas de requires_success : le runner pré-chantier-2 calculait
|
| 245 |
-
# même sur échec OCR (avec hyp=""). Les modules philologiques
|
| 246 |
-
# retournent ``None`` quand la GT n'a pas de signal exploitable
|
| 247 |
-
# — comportement adaptive intact.
|
| 248 |
-
)
|
| 249 |
-
def _philological_hook(*, ground_truth, hypothesis, **_):
|
| 250 |
-
from picarones.core.philological_runner import compute_philological_metrics
|
| 251 |
-
return compute_philological_metrics(ground_truth, hypothesis)
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
@register_document_metric(
|
| 255 |
-
name="searchability",
|
| 256 |
-
attribute="searchability_metrics",
|
| 257 |
-
profiles=_STANDARD_PROFILES,
|
| 258 |
-
)
|
| 259 |
-
def _searchability_hook(*, ground_truth, hypothesis, **_):
|
| 260 |
-
from picarones.core.searchability_runner import compute_searchability_metrics
|
| 261 |
-
return compute_searchability_metrics(ground_truth, hypothesis)
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
@register_document_metric(
|
| 265 |
-
name="numerical_sequences",
|
| 266 |
-
attribute="numerical_sequence_metrics",
|
| 267 |
-
profiles=_STANDARD_PROFILES,
|
| 268 |
-
)
|
| 269 |
-
def _numerical_sequences_hook(*, ground_truth, hypothesis, **_):
|
| 270 |
-
from picarones.core.numerical_sequences_runner import (
|
| 271 |
-
compute_numerical_sequence_metrics_adaptive,
|
| 272 |
-
)
|
| 273 |
-
return compute_numerical_sequence_metrics_adaptive(ground_truth, hypothesis)
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
@register_document_metric(
|
| 277 |
-
name="readability",
|
| 278 |
-
attribute="readability_metrics",
|
| 279 |
-
profiles=_STANDARD_PROFILES,
|
| 280 |
-
)
|
| 281 |
-
def _readability_hook(*, ground_truth, hypothesis, corpus_lang, **_):
|
| 282 |
-
from picarones.core.readability_runner import compute_readability_metrics
|
| 283 |
-
return compute_readability_metrics(ground_truth, hypothesis, lang=corpus_lang)
|
| 284 |
-
|
| 285 |
-
|
| 286 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 287 |
-
# Corpus-level aggregators (12)
|
| 288 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 289 |
-
|
| 290 |
-
|
| 291 |
-
@register_corpus_aggregator(
|
| 292 |
-
name="confusion",
|
| 293 |
-
attribute="aggregated_confusion",
|
| 294 |
-
profiles=_STANDARD_PROFILES,
|
| 295 |
-
)
|
| 296 |
-
def _aggregate_confusion(doc_results: list) -> Optional[dict]:
|
| 297 |
-
from picarones.core.confusion import (
|
| 298 |
-
ConfusionMatrix, aggregate_confusion_matrices,
|
| 299 |
-
)
|
| 300 |
-
matrices = [
|
| 301 |
-
ConfusionMatrix(**dr.confusion_matrix)
|
| 302 |
-
for dr in doc_results
|
| 303 |
-
if dr.confusion_matrix is not None
|
| 304 |
-
]
|
| 305 |
-
if not matrices:
|
| 306 |
-
return None
|
| 307 |
-
return aggregate_confusion_matrices(matrices).as_compact_dict(min_count=2)
|
| 308 |
-
|
| 309 |
-
|
| 310 |
-
@register_corpus_aggregator(
|
| 311 |
-
name="char_scores",
|
| 312 |
-
attribute="aggregated_char_scores",
|
| 313 |
-
profiles=_STANDARD_PROFILES,
|
| 314 |
-
)
|
| 315 |
-
def _aggregate_char_scores(doc_results: list) -> Optional[dict]:
|
| 316 |
-
from picarones.core.char_scores import (
|
| 317 |
-
DiacriticScore,
|
| 318 |
-
LigatureScore,
|
| 319 |
-
aggregate_diacritic_scores,
|
| 320 |
-
aggregate_ligature_scores,
|
| 321 |
-
)
|
| 322 |
-
lig_scores = [
|
| 323 |
-
LigatureScore(**dr.char_scores["ligature"])
|
| 324 |
-
for dr in doc_results
|
| 325 |
-
if dr.char_scores is not None
|
| 326 |
-
]
|
| 327 |
-
diac_scores = [
|
| 328 |
-
DiacriticScore(**dr.char_scores["diacritic"])
|
| 329 |
-
for dr in doc_results
|
| 330 |
-
if dr.char_scores is not None
|
| 331 |
-
]
|
| 332 |
-
if not lig_scores:
|
| 333 |
-
return None
|
| 334 |
-
return {
|
| 335 |
-
"ligature": aggregate_ligature_scores(lig_scores),
|
| 336 |
-
"diacritic": aggregate_diacritic_scores(diac_scores),
|
| 337 |
-
}
|
| 338 |
-
|
| 339 |
-
|
| 340 |
-
@register_corpus_aggregator(
|
| 341 |
-
name="taxonomy",
|
| 342 |
-
attribute="aggregated_taxonomy",
|
| 343 |
-
profiles=_STANDARD_PROFILES,
|
| 344 |
-
)
|
| 345 |
-
def _aggregate_taxonomy(doc_results: list) -> Optional[dict]:
|
| 346 |
-
from picarones.core.taxonomy import TaxonomyResult, aggregate_taxonomy
|
| 347 |
-
results = [
|
| 348 |
-
TaxonomyResult.from_dict(dr.taxonomy)
|
| 349 |
-
for dr in doc_results
|
| 350 |
-
if dr.taxonomy is not None
|
| 351 |
-
]
|
| 352 |
-
if not results:
|
| 353 |
-
return None
|
| 354 |
-
return aggregate_taxonomy(results)
|
| 355 |
-
|
| 356 |
-
|
| 357 |
-
@register_corpus_aggregator(
|
| 358 |
-
name="structure",
|
| 359 |
-
attribute="aggregated_structure",
|
| 360 |
-
profiles=_STANDARD_PROFILES,
|
| 361 |
-
)
|
| 362 |
-
def _aggregate_structure(doc_results: list) -> Optional[dict]:
|
| 363 |
-
from picarones.core.structure import StructureResult, aggregate_structure
|
| 364 |
-
results = [
|
| 365 |
-
StructureResult.from_dict(dr.structure)
|
| 366 |
-
for dr in doc_results
|
| 367 |
-
if dr.structure is not None
|
| 368 |
-
]
|
| 369 |
-
if not results:
|
| 370 |
-
return None
|
| 371 |
-
return aggregate_structure(results)
|
| 372 |
-
|
| 373 |
-
|
| 374 |
-
@register_corpus_aggregator(
|
| 375 |
-
name="image_quality",
|
| 376 |
-
attribute="aggregated_image_quality",
|
| 377 |
-
profiles=_STANDARD_PROFILES,
|
| 378 |
-
)
|
| 379 |
-
def _aggregate_image_quality(doc_results: list) -> Optional[dict]:
|
| 380 |
-
from picarones.core.image_quality import (
|
| 381 |
-
ImageQualityResult, aggregate_image_quality,
|
| 382 |
-
)
|
| 383 |
-
results = [
|
| 384 |
-
ImageQualityResult.from_dict(dr.image_quality)
|
| 385 |
-
for dr in doc_results
|
| 386 |
-
if dr.image_quality is not None
|
| 387 |
-
]
|
| 388 |
-
if not results:
|
| 389 |
-
return None
|
| 390 |
-
return aggregate_image_quality(results)
|
| 391 |
-
|
| 392 |
-
|
| 393 |
-
@register_corpus_aggregator(
|
| 394 |
-
name="line_metrics",
|
| 395 |
-
attribute="aggregated_line_metrics",
|
| 396 |
-
profiles=_STANDARD_PROFILES,
|
| 397 |
-
)
|
| 398 |
-
def _aggregate_line_metrics(doc_results: list) -> Optional[dict]:
|
| 399 |
-
from picarones.core.line_metrics import (
|
| 400 |
-
LineMetrics, aggregate_line_metrics,
|
| 401 |
-
)
|
| 402 |
-
results = [
|
| 403 |
-
LineMetrics.from_dict(dr.line_metrics)
|
| 404 |
-
for dr in doc_results
|
| 405 |
-
if dr.line_metrics is not None
|
| 406 |
-
]
|
| 407 |
-
if not results:
|
| 408 |
-
return None
|
| 409 |
-
return aggregate_line_metrics(results)
|
| 410 |
-
|
| 411 |
-
|
| 412 |
-
@register_corpus_aggregator(
|
| 413 |
-
name="hallucination",
|
| 414 |
-
attribute="aggregated_hallucination",
|
| 415 |
-
profiles=_STANDARD_PROFILES,
|
| 416 |
-
)
|
| 417 |
-
def _aggregate_hallucination(doc_results: list) -> Optional[dict]:
|
| 418 |
-
from picarones.core.hallucination import (
|
| 419 |
-
HallucinationMetrics, aggregate_hallucination_metrics,
|
| 420 |
-
)
|
| 421 |
-
results = [
|
| 422 |
-
HallucinationMetrics.from_dict(dr.hallucination_metrics)
|
| 423 |
-
for dr in doc_results
|
| 424 |
-
if dr.hallucination_metrics is not None
|
| 425 |
-
]
|
| 426 |
-
if not results:
|
| 427 |
-
return None
|
| 428 |
-
return aggregate_hallucination_metrics(results)
|
| 429 |
-
|
| 430 |
-
|
| 431 |
-
@register_corpus_aggregator(
|
| 432 |
-
name="calibration",
|
| 433 |
-
attribute="aggregated_calibration",
|
| 434 |
-
profiles=_STANDARD_PROFILES,
|
| 435 |
-
)
|
| 436 |
-
def _aggregate_calibration(doc_results: list) -> Optional[dict]:
|
| 437 |
-
"""Agrège la calibration micro sur tous les docs.
|
| 438 |
-
|
| 439 |
-
Recalcule ECE/MCE à partir de la **somme des bins** de chaque
|
| 440 |
-
document : pour chaque bin, on additionne ``count``, on agrège la
|
| 441 |
-
confiance moyenne pondérée par count, et on agrège l'accuracy
|
| 442 |
-
pondérée par count. L'ECE micro est ensuite la moyenne pondérée
|
| 443 |
-
par bin de ``|conf - acc|``.
|
| 444 |
-
|
| 445 |
-
Comportement déplacé verbatim depuis ``runner._aggregate_calibration``
|
| 446 |
-
(chantier 2 — rétrocompat octet par octet du sérialisé).
|
| 447 |
-
"""
|
| 448 |
-
relevant = [
|
| 449 |
-
dr for dr in doc_results
|
| 450 |
-
if dr.calibration_metrics is not None
|
| 451 |
-
and (dr.calibration_metrics.get("bins") or [])
|
| 452 |
-
]
|
| 453 |
-
if not relevant:
|
| 454 |
-
return None
|
| 455 |
-
|
| 456 |
-
n_bins = relevant[0].calibration_metrics.get("n_bins", 10)
|
| 457 |
-
sum_conf: list[float] = [0.0] * n_bins
|
| 458 |
-
sum_acc: list[float] = [0.0] * n_bins
|
| 459 |
-
counts: list[int] = [0] * n_bins
|
| 460 |
-
bin_lows: list[float] = [
|
| 461 |
-
b["bin_low"] for b in relevant[0].calibration_metrics["bins"]
|
| 462 |
-
]
|
| 463 |
-
bin_highs: list[float] = [
|
| 464 |
-
b["bin_high"] for b in relevant[0].calibration_metrics["bins"]
|
| 465 |
-
]
|
| 466 |
-
|
| 467 |
-
for dr in relevant:
|
| 468 |
-
m = dr.calibration_metrics
|
| 469 |
-
if m.get("n_bins") != n_bins:
|
| 470 |
-
logger.warning(
|
| 471 |
-
"[aggregate_calibration] %s : n_bins=%s ≠ %s — ignoré",
|
| 472 |
-
dr.doc_id, m.get("n_bins"), n_bins,
|
| 473 |
-
)
|
| 474 |
-
continue
|
| 475 |
-
for k, b in enumerate(m["bins"]):
|
| 476 |
-
n = int(b.get("count") or 0)
|
| 477 |
-
if n == 0:
|
| 478 |
-
continue
|
| 479 |
-
counts[k] += n
|
| 480 |
-
sum_conf[k] += float(b.get("avg_confidence") or 0.0) * n
|
| 481 |
-
sum_acc[k] += float(b.get("accuracy") or 0.0) * n
|
| 482 |
-
|
| 483 |
-
total = sum(counts)
|
| 484 |
-
if total == 0:
|
| 485 |
-
return None
|
| 486 |
-
|
| 487 |
-
bins: list[dict] = []
|
| 488 |
-
ece = 0.0
|
| 489 |
-
mce = 0.0
|
| 490 |
-
for k in range(n_bins):
|
| 491 |
-
n = counts[k]
|
| 492 |
-
if n == 0:
|
| 493 |
-
bins.append({
|
| 494 |
-
"bin_low": bin_lows[k] if k < len(bin_lows) else k / n_bins,
|
| 495 |
-
"bin_high": bin_highs[k] if k < len(bin_highs) else (k + 1) / n_bins,
|
| 496 |
-
"avg_confidence": None,
|
| 497 |
-
"accuracy": None,
|
| 498 |
-
"count": 0,
|
| 499 |
-
"gap": None,
|
| 500 |
-
})
|
| 501 |
-
continue
|
| 502 |
-
avg_conf = sum_conf[k] / n
|
| 503 |
-
accuracy = sum_acc[k] / n
|
| 504 |
-
gap = abs(avg_conf - accuracy)
|
| 505 |
-
bins.append({
|
| 506 |
-
"bin_low": bin_lows[k] if k < len(bin_lows) else k / n_bins,
|
| 507 |
-
"bin_high": bin_highs[k] if k < len(bin_highs) else (k + 1) / n_bins,
|
| 508 |
-
"avg_confidence": avg_conf,
|
| 509 |
-
"accuracy": accuracy,
|
| 510 |
-
"count": n,
|
| 511 |
-
"gap": gap,
|
| 512 |
-
})
|
| 513 |
-
ece += (n / total) * gap
|
| 514 |
-
if gap > mce:
|
| 515 |
-
mce = gap
|
| 516 |
-
|
| 517 |
-
overall_acc = sum(sum_acc) / total
|
| 518 |
-
overall_conf = sum(sum_conf) / total
|
| 519 |
-
|
| 520 |
-
return {
|
| 521 |
-
"ece": ece,
|
| 522 |
-
"mce": mce,
|
| 523 |
-
"n_bins": n_bins,
|
| 524 |
-
"n_predictions": total,
|
| 525 |
-
"overall_accuracy": overall_acc,
|
| 526 |
-
"overall_confidence": overall_conf,
|
| 527 |
-
"bins": bins,
|
| 528 |
-
"doc_count": len(relevant),
|
| 529 |
-
}
|
| 530 |
-
|
| 531 |
-
|
| 532 |
-
@register_corpus_aggregator(
|
| 533 |
-
name="philological",
|
| 534 |
-
attribute="aggregated_philological",
|
| 535 |
-
profiles=_STANDARD_PROFILES,
|
| 536 |
-
)
|
| 537 |
-
def _aggregate_philological(doc_results: list) -> Optional[dict]:
|
| 538 |
-
from picarones.core.philological_runner import aggregate_philological_metrics
|
| 539 |
-
return aggregate_philological_metrics(
|
| 540 |
-
[dr.philological_metrics for dr in doc_results],
|
| 541 |
-
)
|
| 542 |
-
|
| 543 |
-
|
| 544 |
-
@register_corpus_aggregator(
|
| 545 |
-
name="searchability",
|
| 546 |
-
attribute="aggregated_searchability",
|
| 547 |
-
profiles=_STANDARD_PROFILES,
|
| 548 |
-
)
|
| 549 |
-
def _aggregate_searchability(doc_results: list) -> Optional[dict]:
|
| 550 |
-
from picarones.core.searchability_runner import aggregate_searchability_metrics
|
| 551 |
-
return aggregate_searchability_metrics(
|
| 552 |
-
[dr.searchability_metrics for dr in doc_results],
|
| 553 |
-
)
|
| 554 |
-
|
| 555 |
-
|
| 556 |
-
@register_corpus_aggregator(
|
| 557 |
-
name="numerical_sequences",
|
| 558 |
-
attribute="aggregated_numerical_sequences",
|
| 559 |
-
profiles=_STANDARD_PROFILES,
|
| 560 |
-
)
|
| 561 |
-
def _aggregate_numerical_sequences(doc_results: list) -> Optional[dict]:
|
| 562 |
-
from picarones.core.numerical_sequences_runner import (
|
| 563 |
-
aggregate_numerical_sequence_metrics,
|
| 564 |
-
)
|
| 565 |
-
return aggregate_numerical_sequence_metrics(
|
| 566 |
-
[dr.numerical_sequence_metrics for dr in doc_results],
|
| 567 |
-
)
|
| 568 |
-
|
| 569 |
-
|
| 570 |
-
@register_corpus_aggregator(
|
| 571 |
-
name="readability",
|
| 572 |
-
attribute="aggregated_readability",
|
| 573 |
-
profiles=_STANDARD_PROFILES,
|
| 574 |
-
)
|
| 575 |
-
def _aggregate_readability(doc_results: list) -> Optional[dict]:
|
| 576 |
-
from picarones.core.readability_runner import aggregate_readability_metrics
|
| 577 |
-
return aggregate_readability_metrics(
|
| 578 |
-
[dr.readability_metrics for dr in doc_results],
|
| 579 |
-
)
|
| 580 |
-
|
| 581 |
|
| 582 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.builtin_hooks`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.builtin_hooks import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.builtin_hooks import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.builtin_hooks as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,323 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
ligne (Tesseract via le ``tsv``, Pero OCR via le ``PageLayout``,
|
| 9 |
-
Mistral OCR via ``confidence``, Google Vision via ``Word.confidence``).
|
| 10 |
-
La question naturelle pour un workflow patrimonial est : *« quand le
|
| 11 |
-
moteur dit qu'il est sûr, est-il vraiment sûr ? »*. Pour une équipe
|
| 12 |
-
qui doit vérifier humainement un corpus de 50 000 pages, la différence
|
| 13 |
-
entre vérifier 100 % vs 15 % du volume est l'effet de la calibration.
|
| 14 |
-
|
| 15 |
-
Ce module fournit les trois mesures classiques :
|
| 16 |
-
|
| 17 |
-
- **Expected Calibration Error (ECE)** — moyenne pondérée par bin de
|
| 18 |
-
l'écart absolu entre confiance moyenne et précision moyenne.
|
| 19 |
-
``ECE = 0`` ↔ moteur parfaitement calibré ; ``ECE`` élevé ↔ écart
|
| 20 |
-
systématique entre confiance affichée et fiabilité réelle.
|
| 21 |
-
- **Maximum Calibration Error (MCE)** — max de cet écart sur les bins.
|
| 22 |
-
Utile pour repérer le pire mensonge du moteur (ex. il dit toujours
|
| 23 |
-
95 % de confiance et il a tort une fois sur deux).
|
| 24 |
-
- **Reliability diagram** — table ``[(bin_low, bin_high, avg_conf,
|
| 25 |
-
accuracy, count)]`` qui peut être rendue en SVG côté serveur ou en
|
| 26 |
-
Chart.js côté navigateur dans un sprint suivant.
|
| 27 |
-
|
| 28 |
-
Stratégie de découpage
|
| 29 |
-
----------------------
|
| 30 |
-
Comme pour le NER (Sprint 38) et la divergence (Sprints 35-37),
|
| 31 |
-
on découpe :
|
| 32 |
-
|
| 33 |
-
- **Sprint 39** (ici) — couche de calcul pure : entrée = deux listes
|
| 34 |
-
parallèles ``confidences`` (∈ [0, 1]) et ``is_correct`` (bool/0-1).
|
| 35 |
-
Aucune dépendance externe.
|
| 36 |
-
- **Sprint à venir** — exposition de ``token_confidences`` sur
|
| 37 |
-
``EngineResult``, alignement caractère/token avec la GT pour produire
|
| 38 |
-
``is_correct``, intégration dans le runner et vue HTML reliability.
|
| 39 |
-
|
| 40 |
-
Ce qui est explicitement hors scope
|
| 41 |
-
-----------------------------------
|
| 42 |
-
Ce sprint ne touche **aucun adaptateur OCR**. Aucune confiance n'est
|
| 43 |
-
extraite ; on calcule uniquement à partir de séquences de prédictions
|
| 44 |
-
fournies en entrée. C'est ce qui permet de tester rigoureusement les
|
| 45 |
-
invariants mathématiques (ECE = 0 ↔ calibré, ECE = |bias| pour bias
|
| 46 |
-
constant, etc.) sans dépendre d'un backend.
|
| 47 |
"""
|
| 48 |
|
| 49 |
-
from
|
| 50 |
-
|
| 51 |
-
import logging
|
| 52 |
-
from dataclasses import dataclass
|
| 53 |
-
from typing import Iterable
|
| 54 |
-
|
| 55 |
-
logger = logging.getLogger(__name__)
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 59 |
-
# Modèle de données
|
| 60 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
@dataclass(frozen=True)
|
| 64 |
-
class CalibrationBin:
|
| 65 |
-
"""Un bin du reliability diagram.
|
| 66 |
-
|
| 67 |
-
Attributs
|
| 68 |
-
---------
|
| 69 |
-
bin_low, bin_high:
|
| 70 |
-
Bornes du bin sur l'axe de confiance (``[bin_low, bin_high)`` —
|
| 71 |
-
sauf le dernier bin qui inclut ``1.0``).
|
| 72 |
-
avg_confidence:
|
| 73 |
-
Moyenne des confidences des prédictions tombées dans le bin.
|
| 74 |
-
``None`` si le bin est vide.
|
| 75 |
-
accuracy:
|
| 76 |
-
Fraction de prédictions correctes dans le bin (``∈ [0, 1]``).
|
| 77 |
-
``None`` si le bin est vide.
|
| 78 |
-
count:
|
| 79 |
-
Nombre de prédictions dans le bin.
|
| 80 |
-
"""
|
| 81 |
-
|
| 82 |
-
bin_low: float
|
| 83 |
-
bin_high: float
|
| 84 |
-
avg_confidence: float | None
|
| 85 |
-
accuracy: float | None
|
| 86 |
-
count: int
|
| 87 |
-
|
| 88 |
-
@property
|
| 89 |
-
def gap(self) -> float | None:
|
| 90 |
-
"""Écart absolu ``|confidence - accuracy|`` ou ``None`` si vide."""
|
| 91 |
-
if self.avg_confidence is None or self.accuracy is None:
|
| 92 |
-
return None
|
| 93 |
-
return abs(self.avg_confidence - self.accuracy)
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 97 |
-
# Validation
|
| 98 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
def _validate_inputs(
|
| 102 |
-
confidences: list[float],
|
| 103 |
-
is_correct: list[bool | int],
|
| 104 |
-
) -> None:
|
| 105 |
-
if len(confidences) != len(is_correct):
|
| 106 |
-
raise ValueError(
|
| 107 |
-
f"Longueurs incompatibles : confidences={len(confidences)} "
|
| 108 |
-
f"vs is_correct={len(is_correct)}"
|
| 109 |
-
)
|
| 110 |
-
for i, c in enumerate(confidences):
|
| 111 |
-
if not (0.0 <= float(c) <= 1.0):
|
| 112 |
-
raise ValueError(
|
| 113 |
-
f"Confiance hors [0, 1] à l'index {i} : {c!r}"
|
| 114 |
-
)
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 118 |
-
# Reliability diagram (binning)
|
| 119 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
def reliability_diagram(
|
| 123 |
-
confidences: Iterable[float],
|
| 124 |
-
is_correct: Iterable[bool | int],
|
| 125 |
-
n_bins: int = 10,
|
| 126 |
-
) -> list[CalibrationBin]:
|
| 127 |
-
"""Découpe les prédictions en ``n_bins`` bins équidistants par confiance
|
| 128 |
-
et calcule pour chacun la confiance moyenne, la précision et le compte.
|
| 129 |
-
|
| 130 |
-
Parameters
|
| 131 |
-
----------
|
| 132 |
-
confidences:
|
| 133 |
-
Confidences des prédictions, ``∈ [0, 1]``.
|
| 134 |
-
is_correct:
|
| 135 |
-
Indicateur booléen (1 = prédiction correcte, 0 = incorrecte).
|
| 136 |
-
n_bins:
|
| 137 |
-
Nombre de bins (défaut : 10). Bornes : ``[k/n_bins, (k+1)/n_bins)``
|
| 138 |
-
sauf le dernier bin qui inclut ``1.0``.
|
| 139 |
-
|
| 140 |
-
Returns
|
| 141 |
-
-------
|
| 142 |
-
list[CalibrationBin]
|
| 143 |
-
Liste de ``n_bins`` bins, dans l'ordre croissant des confidences.
|
| 144 |
-
"""
|
| 145 |
-
if n_bins < 1:
|
| 146 |
-
raise ValueError(f"n_bins doit être ≥ 1 — reçu {n_bins}")
|
| 147 |
-
|
| 148 |
-
confs = [float(c) for c in confidences]
|
| 149 |
-
correct = [int(bool(x)) for x in is_correct]
|
| 150 |
-
_validate_inputs(confs, correct)
|
| 151 |
-
|
| 152 |
-
bin_width = 1.0 / n_bins
|
| 153 |
-
sums: list[float] = [0.0] * n_bins
|
| 154 |
-
correct_counts: list[int] = [0] * n_bins
|
| 155 |
-
counts: list[int] = [0] * n_bins
|
| 156 |
-
|
| 157 |
-
for c, ok in zip(confs, correct):
|
| 158 |
-
# Calcul du bin index par multiplication ``c * n_bins`` plutôt que
|
| 159 |
-
# division ``c / bin_width`` pour éviter les pièges de
|
| 160 |
-
# représentation flottante (ex. ``0.6 / 0.1 = 5.999…`` en IEEE 754
|
| 161 |
-
# qui placerait 0.6 dans le bin [0.5, 0.6) au lieu de [0.6, 0.7)).
|
| 162 |
-
if c >= 1.0:
|
| 163 |
-
idx = n_bins - 1
|
| 164 |
-
else:
|
| 165 |
-
idx = int(c * n_bins)
|
| 166 |
-
# Garde-fou en cas d'arrondi flottant
|
| 167 |
-
if idx >= n_bins:
|
| 168 |
-
idx = n_bins - 1
|
| 169 |
-
elif idx < 0:
|
| 170 |
-
idx = 0
|
| 171 |
-
sums[idx] += c
|
| 172 |
-
correct_counts[idx] += ok
|
| 173 |
-
counts[idx] += 1
|
| 174 |
-
|
| 175 |
-
bins: list[CalibrationBin] = []
|
| 176 |
-
for k in range(n_bins):
|
| 177 |
-
low = k * bin_width
|
| 178 |
-
high = (k + 1) * bin_width
|
| 179 |
-
n = counts[k]
|
| 180 |
-
if n == 0:
|
| 181 |
-
bins.append(CalibrationBin(low, high, None, None, 0))
|
| 182 |
-
else:
|
| 183 |
-
bins.append(CalibrationBin(
|
| 184 |
-
bin_low=low,
|
| 185 |
-
bin_high=high,
|
| 186 |
-
avg_confidence=sums[k] / n,
|
| 187 |
-
accuracy=correct_counts[k] / n,
|
| 188 |
-
count=n,
|
| 189 |
-
))
|
| 190 |
-
return bins
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 194 |
-
# ECE et MCE
|
| 195 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
def expected_calibration_error(
|
| 199 |
-
confidences: Iterable[float],
|
| 200 |
-
is_correct: Iterable[bool | int],
|
| 201 |
-
n_bins: int = 10,
|
| 202 |
-
) -> float:
|
| 203 |
-
"""Expected Calibration Error : moyenne pondérée par bin de l'écart
|
| 204 |
-
absolu confiance ↔ précision.
|
| 205 |
-
|
| 206 |
-
``ECE = sum_k (n_k / N) * |avg_conf_k - accuracy_k|``
|
| 207 |
-
|
| 208 |
-
où la somme porte sur les bins non vides.
|
| 209 |
-
|
| 210 |
-
Returns
|
| 211 |
-
-------
|
| 212 |
-
float
|
| 213 |
-
``∈ [0, 1]``. ``0`` ↔ calibration parfaite.
|
| 214 |
-
"""
|
| 215 |
-
bins = reliability_diagram(confidences, is_correct, n_bins=n_bins)
|
| 216 |
-
total = sum(b.count for b in bins)
|
| 217 |
-
if total == 0:
|
| 218 |
-
return 0.0
|
| 219 |
-
ece = 0.0
|
| 220 |
-
for b in bins:
|
| 221 |
-
if b.count == 0 or b.gap is None:
|
| 222 |
-
continue
|
| 223 |
-
ece += (b.count / total) * b.gap
|
| 224 |
-
return ece
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
def maximum_calibration_error(
|
| 228 |
-
confidences: Iterable[float],
|
| 229 |
-
is_correct: Iterable[bool | int],
|
| 230 |
-
n_bins: int = 10,
|
| 231 |
-
) -> float:
|
| 232 |
-
"""Maximum Calibration Error : pire écart confiance ↔ précision sur
|
| 233 |
-
tous les bins non vides.
|
| 234 |
-
|
| 235 |
-
Utile pour repérer un mensonge ponctuel du moteur (ex. il dit 95 %
|
| 236 |
-
de confiance et il a tort une fois sur deux dans ce bin).
|
| 237 |
-
|
| 238 |
-
Returns
|
| 239 |
-
-------
|
| 240 |
-
float
|
| 241 |
-
``∈ [0, 1]``. ``0`` ↔ calibration parfaite.
|
| 242 |
-
"""
|
| 243 |
-
bins = reliability_diagram(confidences, is_correct, n_bins=n_bins)
|
| 244 |
-
gaps = [b.gap for b in bins if b.gap is not None]
|
| 245 |
-
return max(gaps) if gaps else 0.0
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 249 |
-
# Vue agrégée
|
| 250 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
def compute_calibration_metrics(
|
| 254 |
-
confidences: Iterable[float],
|
| 255 |
-
is_correct: Iterable[bool | int],
|
| 256 |
-
n_bins: int = 10,
|
| 257 |
-
) -> dict:
|
| 258 |
-
"""Calcule l'ensemble des métriques de calibration en un appel.
|
| 259 |
-
|
| 260 |
-
Returns
|
| 261 |
-
-------
|
| 262 |
-
dict
|
| 263 |
-
``{
|
| 264 |
-
"ece": float,
|
| 265 |
-
"mce": float,
|
| 266 |
-
"n_bins": int,
|
| 267 |
-
"n_predictions": int,
|
| 268 |
-
"overall_accuracy": float,
|
| 269 |
-
"overall_confidence": float,
|
| 270 |
-
"bins": [
|
| 271 |
-
{"bin_low", "bin_high", "avg_confidence",
|
| 272 |
-
"accuracy", "count", "gap"},
|
| 273 |
-
...
|
| 274 |
-
],
|
| 275 |
-
}``
|
| 276 |
-
"""
|
| 277 |
-
confs = list(confidences)
|
| 278 |
-
correct = list(is_correct)
|
| 279 |
-
bins = reliability_diagram(confs, correct, n_bins=n_bins)
|
| 280 |
-
total = sum(b.count for b in bins)
|
| 281 |
-
overall_acc = (
|
| 282 |
-
sum(int(bool(x)) for x in correct) / total if total > 0 else 0.0
|
| 283 |
-
)
|
| 284 |
-
overall_conf = (
|
| 285 |
-
sum(float(c) for c in confs) / total if total > 0 else 0.0
|
| 286 |
-
)
|
| 287 |
-
|
| 288 |
-
ece = 0.0
|
| 289 |
-
if total > 0:
|
| 290 |
-
for b in bins:
|
| 291 |
-
if b.gap is None:
|
| 292 |
-
continue
|
| 293 |
-
ece += (b.count / total) * b.gap
|
| 294 |
-
mce = max((b.gap for b in bins if b.gap is not None), default=0.0)
|
| 295 |
-
|
| 296 |
-
return {
|
| 297 |
-
"ece": ece,
|
| 298 |
-
"mce": mce,
|
| 299 |
-
"n_bins": n_bins,
|
| 300 |
-
"n_predictions": total,
|
| 301 |
-
"overall_accuracy": overall_acc,
|
| 302 |
-
"overall_confidence": overall_conf,
|
| 303 |
-
"bins": [
|
| 304 |
-
{
|
| 305 |
-
"bin_low": b.bin_low,
|
| 306 |
-
"bin_high": b.bin_high,
|
| 307 |
-
"avg_confidence": b.avg_confidence,
|
| 308 |
-
"accuracy": b.accuracy,
|
| 309 |
-
"count": b.count,
|
| 310 |
-
"gap": b.gap,
|
| 311 |
-
}
|
| 312 |
-
for b in bins
|
| 313 |
-
],
|
| 314 |
-
}
|
| 315 |
-
|
| 316 |
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
"
|
| 320 |
-
|
| 321 |
-
"maximum_calibration_error",
|
| 322 |
-
"compute_calibration_metrics",
|
| 323 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.calibration`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.calibration import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.calibration import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.calibration as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
@@ -1,370 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
deux ou plusieurs glyphes fusionnés : fi (fi), fl (fl), œ, æ, etc.
|
| 10 |
-
|
| 11 |
-
Pour chaque ligature présente dans le GT, on vérifie si l'OCR a produit
|
| 12 |
-
soit le caractère Unicode équivalent, soit la séquence décomposée équivalente.
|
| 13 |
-
|
| 14 |
-
Diacritiques
|
| 15 |
-
-----------
|
| 16 |
-
Accents, cédilles, trémas et autres signes diacritiques. Pour chaque caractère
|
| 17 |
-
accentué dans le GT, on vérifie si l'OCR a conservé le diacritique ou l'a
|
| 18 |
-
remplacé par la lettre de base.
|
| 19 |
"""
|
| 20 |
|
| 21 |
-
from
|
| 22 |
-
|
| 23 |
-
from dataclasses import dataclass, field
|
| 24 |
-
from typing import Optional
|
| 25 |
-
|
| 26 |
-
import unicodedata
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
# ---------------------------------------------------------------------------
|
| 30 |
-
# Tables de ligatures (char ligature → séquences équivalentes acceptées)
|
| 31 |
-
# ---------------------------------------------------------------------------
|
| 32 |
-
|
| 33 |
-
#: Table principale des ligatures et leurs équivalents acceptés.
|
| 34 |
-
#: Clé = caractère ligature Unicode ; valeur = liste de séquences équivalentes.
|
| 35 |
-
LIGATURE_TABLE: dict[str, list[str]] = {
|
| 36 |
-
# Ligatures typographiques latines (Unicode Letterlike Symbols / Alphabetic Presentation Forms)
|
| 37 |
-
"\uFB00": ["ff"], # ff ff
|
| 38 |
-
"\uFB01": ["fi"], # fi fi
|
| 39 |
-
"\uFB02": ["fl"], # fl fl
|
| 40 |
-
"\uFB03": ["ffi"], # ffi ffi
|
| 41 |
-
"\uFB04": ["ffl"], # ffl ffl
|
| 42 |
-
"\uFB05": ["st", "\u017Ft"], # ſt st / ſt
|
| 43 |
-
"\uFB06": ["st"], # st st (variante)
|
| 44 |
-
# Ligatures latines patrimoniales (Unicode Latin Extended Additional)
|
| 45 |
-
"\u0153": ["oe"], # œ oe
|
| 46 |
-
"\u00E6": ["ae"], # æ ae
|
| 47 |
-
"\u0152": ["OE"], # Œ OE
|
| 48 |
-
"\u00C6": ["AE"], # Æ AE
|
| 49 |
-
# Abréviations latines / médiévales
|
| 50 |
-
"\uA751": ["per", "p\u0332"], # ꝑ per / p̲
|
| 51 |
-
"\uA753": ["pro"], # ꝓ pro
|
| 52 |
-
"\uA757": ["que"], # ꝗ que
|
| 53 |
-
# Ligatures germaniques
|
| 54 |
-
"\u00DF": ["ss"], # ß ss
|
| 55 |
-
"\u1E9E": ["SS"], # ẞ SS
|
| 56 |
-
}
|
| 57 |
-
|
| 58 |
-
# Ensemble de toutes les ligatures pour recherche rapide
|
| 59 |
-
_ALL_LIGATURES: frozenset[str] = frozenset(LIGATURE_TABLE)
|
| 60 |
-
|
| 61 |
-
# Mapping inverse : séquence → ligature
|
| 62 |
-
_SEQ_TO_LIGATURE: dict[str, str] = {}
|
| 63 |
-
for _lig, _seqs in LIGATURE_TABLE.items():
|
| 64 |
-
for _seq in _seqs:
|
| 65 |
-
_SEQ_TO_LIGATURE[_seq] = _lig
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
# ---------------------------------------------------------------------------
|
| 69 |
-
# Table des caractères diacritiques
|
| 70 |
-
# ---------------------------------------------------------------------------
|
| 71 |
-
|
| 72 |
-
def _build_diacritic_map() -> dict[str, str]:
|
| 73 |
-
"""Construit automatiquement la table diacritique depuis l'Unicode."""
|
| 74 |
-
table: dict[str, str] = {}
|
| 75 |
-
for codepoint in range(0x00C0, 0x0250): # Latin Étendu A + B
|
| 76 |
-
ch = chr(codepoint)
|
| 77 |
-
nfd = unicodedata.normalize("NFD", ch)
|
| 78 |
-
if len(nfd) > 1: # le caractère est décomposable
|
| 79 |
-
base = nfd[0] # lettre de base
|
| 80 |
-
if base.isalpha() and base != ch:
|
| 81 |
-
table[ch] = base
|
| 82 |
-
# Compléments manuels
|
| 83 |
-
table.update({
|
| 84 |
-
"\u0107": "c", # ć
|
| 85 |
-
"\u0119": "e", # ę
|
| 86 |
-
"\u0142": "l", # ł
|
| 87 |
-
"\u0144": "n", # ń
|
| 88 |
-
"\u015B": "s", # ś
|
| 89 |
-
"\u017A": "z", # ź
|
| 90 |
-
"\u017C": "z", # ż
|
| 91 |
-
})
|
| 92 |
-
return table
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
DIACRITIC_MAP: dict[str, str] = _build_diacritic_map()
|
| 96 |
-
_ALL_DIACRITICS: frozenset[str] = frozenset(DIACRITIC_MAP)
|
| 97 |
-
|
| 98 |
-
# Ligatures qui NE sont PAS des diacritiques (pour éviter les doublons)
|
| 99 |
-
_LIGATURE_SET: frozenset[str] = frozenset(LIGATURE_TABLE)
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
# ---------------------------------------------------------------------------
|
| 103 |
-
# Résultats structurés
|
| 104 |
-
# ---------------------------------------------------------------------------
|
| 105 |
-
|
| 106 |
-
@dataclass
|
| 107 |
-
class LigatureScore:
|
| 108 |
-
"""Score de reconnaissance des ligatures pour une paire (GT, OCR)."""
|
| 109 |
-
|
| 110 |
-
total_in_gt: int = 0
|
| 111 |
-
"""Nombre de ligatures présentes dans le GT."""
|
| 112 |
-
correctly_recognized: int = 0
|
| 113 |
-
"""Nombre de ligatures correctement transcrites (unicode ou équivalent)."""
|
| 114 |
-
score: float = 0.0
|
| 115 |
-
"""Taux de reconnaissance = correctly_recognized / total_in_gt. 1.0 si total=0."""
|
| 116 |
-
per_ligature: dict[str, dict] = field(default_factory=dict)
|
| 117 |
-
"""Détail par ligature : {'fi': {'gt_count': 5, 'ocr_correct': 3, 'score': 0.6}}"""
|
| 118 |
-
|
| 119 |
-
def as_dict(self) -> dict:
|
| 120 |
-
return {
|
| 121 |
-
"total_in_gt": self.total_in_gt,
|
| 122 |
-
"correctly_recognized": self.correctly_recognized,
|
| 123 |
-
"score": round(self.score, 4),
|
| 124 |
-
"per_ligature": {
|
| 125 |
-
k: {kk: round(vv, 4) if isinstance(vv, float) else vv for kk, vv in v.items()}
|
| 126 |
-
for k, v in self.per_ligature.items()
|
| 127 |
-
},
|
| 128 |
-
}
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
@dataclass
|
| 132 |
-
class DiacriticScore:
|
| 133 |
-
"""Score de conservation des diacritiques pour une paire (GT, OCR)."""
|
| 134 |
-
|
| 135 |
-
total_in_gt: int = 0
|
| 136 |
-
"""Nombre de caractères accentués dans le GT."""
|
| 137 |
-
correctly_recognized: int = 0
|
| 138 |
-
"""Nombre de diacritiques correctement conservés."""
|
| 139 |
-
score: float = 0.0
|
| 140 |
-
"""Taux de conservation = correctly_recognized / total_in_gt. 1.0 si total=0."""
|
| 141 |
-
per_diacritic: dict[str, dict] = field(default_factory=dict)
|
| 142 |
-
"""Détail par caractère diacritique."""
|
| 143 |
-
|
| 144 |
-
def as_dict(self) -> dict:
|
| 145 |
-
return {
|
| 146 |
-
"total_in_gt": self.total_in_gt,
|
| 147 |
-
"correctly_recognized": self.correctly_recognized,
|
| 148 |
-
"score": round(self.score, 4),
|
| 149 |
-
"per_diacritic": {
|
| 150 |
-
k: {kk: round(vv, 4) if isinstance(vv, float) else vv for kk, vv in v.items()}
|
| 151 |
-
for k, v in self.per_diacritic.items()
|
| 152 |
-
},
|
| 153 |
-
}
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
# ---------------------------------------------------------------------------
|
| 157 |
-
# Calcul des scores
|
| 158 |
-
# ---------------------------------------------------------------------------
|
| 159 |
-
|
| 160 |
-
def compute_ligature_score(ground_truth: str, hypothesis: str) -> LigatureScore:
|
| 161 |
-
"""Calcule le score de reconnaissance des ligatures.
|
| 162 |
-
|
| 163 |
-
Pour chaque ligature dans le GT, on vérifie si l'OCR a produit :
|
| 164 |
-
- Exactement le même caractère ligature Unicode (ex. fi → fi)
|
| 165 |
-
- Ou la séquence de lettres équivalente (ex. fi → fi)
|
| 166 |
-
|
| 167 |
-
Les deux sont considérés comme corrects — ce qui correspond à la pratique
|
| 168 |
-
éditoriale patrimoniaux (certains éditeurs développent les ligatures).
|
| 169 |
-
|
| 170 |
-
Parameters
|
| 171 |
-
----------
|
| 172 |
-
ground_truth:
|
| 173 |
-
Texte de référence.
|
| 174 |
-
hypothesis:
|
| 175 |
-
Texte produit par l'OCR.
|
| 176 |
-
|
| 177 |
-
Returns
|
| 178 |
-
-------
|
| 179 |
-
LigatureScore
|
| 180 |
-
"""
|
| 181 |
-
if not ground_truth:
|
| 182 |
-
return LigatureScore(score=1.0)
|
| 183 |
-
|
| 184 |
-
# Construire un index de position dans l'hypothèse pour recherche rapide
|
| 185 |
-
hyp_norm = unicodedata.normalize("NFC", hypothesis)
|
| 186 |
-
gt_norm = unicodedata.normalize("NFC", ground_truth)
|
| 187 |
-
|
| 188 |
-
per_lig: dict[str, dict] = {}
|
| 189 |
-
total = 0
|
| 190 |
-
correct = 0
|
| 191 |
-
|
| 192 |
-
# Trouver toutes les ligatures dans le GT
|
| 193 |
-
i = 0
|
| 194 |
-
while i < len(gt_norm):
|
| 195 |
-
ch = gt_norm[i]
|
| 196 |
-
if ch in _ALL_LIGATURES:
|
| 197 |
-
total += 1
|
| 198 |
-
equivalents = [ch] + LIGATURE_TABLE[ch] # unicode direct ou séquences équivalentes
|
| 199 |
-
|
| 200 |
-
# Vérifier si la position correspondante dans l'OCR contient l'équivalent
|
| 201 |
-
is_correct = _check_char_at_context(gt_norm, hyp_norm, i, ch, equivalents)
|
| 202 |
-
if is_correct:
|
| 203 |
-
correct += 1
|
| 204 |
-
|
| 205 |
-
if ch not in per_lig:
|
| 206 |
-
per_lig[ch] = {"gt_count": 0, "ocr_correct": 0, "score": 0.0}
|
| 207 |
-
per_lig[ch]["gt_count"] += 1
|
| 208 |
-
if is_correct:
|
| 209 |
-
per_lig[ch]["ocr_correct"] += 1
|
| 210 |
-
i += 1
|
| 211 |
-
|
| 212 |
-
# Calculer les scores individuels
|
| 213 |
-
for lig_data in per_lig.values():
|
| 214 |
-
lig_data["score"] = (
|
| 215 |
-
lig_data["ocr_correct"] / lig_data["gt_count"]
|
| 216 |
-
if lig_data["gt_count"] > 0
|
| 217 |
-
else 1.0
|
| 218 |
-
)
|
| 219 |
-
|
| 220 |
-
score = correct / total if total > 0 else 1.0
|
| 221 |
-
return LigatureScore(
|
| 222 |
-
total_in_gt=total,
|
| 223 |
-
correctly_recognized=correct,
|
| 224 |
-
score=score,
|
| 225 |
-
per_ligature=per_lig,
|
| 226 |
-
)
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
def compute_diacritic_score(ground_truth: str, hypothesis: str) -> DiacriticScore:
|
| 230 |
-
"""Calcule le score de conservation des diacritiques.
|
| 231 |
-
|
| 232 |
-
Pour chaque caractère accentué dans le GT, on vérifie si l'OCR a produit
|
| 233 |
-
le même caractère (conservation) ou a substitué la lettre de base (perte).
|
| 234 |
-
On accepte aussi les formes NFD équivalentes.
|
| 235 |
-
|
| 236 |
-
Parameters
|
| 237 |
-
----------
|
| 238 |
-
ground_truth:
|
| 239 |
-
Texte de référence.
|
| 240 |
-
hypothesis:
|
| 241 |
-
Texte produit par l'OCR.
|
| 242 |
-
|
| 243 |
-
Returns
|
| 244 |
-
-------
|
| 245 |
-
DiacriticScore
|
| 246 |
-
"""
|
| 247 |
-
if not ground_truth:
|
| 248 |
-
return DiacriticScore(score=1.0)
|
| 249 |
-
|
| 250 |
-
gt_norm = unicodedata.normalize("NFC", ground_truth)
|
| 251 |
-
hyp_norm = unicodedata.normalize("NFC", hypothesis)
|
| 252 |
-
|
| 253 |
-
per_diac: dict[str, dict] = {}
|
| 254 |
-
total = 0
|
| 255 |
-
correct = 0
|
| 256 |
-
|
| 257 |
-
# Utiliser difflib pour l'alignement
|
| 258 |
-
import difflib
|
| 259 |
-
matcher = difflib.SequenceMatcher(None, gt_norm, hyp_norm, autojunk=False)
|
| 260 |
-
gt_to_hyp: dict[int, Optional[int]] = {}
|
| 261 |
-
|
| 262 |
-
for tag, i1, i2, j1, j2 in matcher.get_opcodes():
|
| 263 |
-
if tag == "equal":
|
| 264 |
-
for k in range(i2 - i1):
|
| 265 |
-
gt_to_hyp[i1 + k] = j1 + k
|
| 266 |
-
elif tag == "replace" and (i2 - i1) == (j2 - j1):
|
| 267 |
-
for k in range(i2 - i1):
|
| 268 |
-
gt_to_hyp[i1 + k] = j1 + k
|
| 269 |
-
else:
|
| 270 |
-
# delete ou replace de longueurs différentes
|
| 271 |
-
for k in range(i1, i2):
|
| 272 |
-
gt_to_hyp[k] = None
|
| 273 |
-
|
| 274 |
-
for i, ch in enumerate(gt_norm):
|
| 275 |
-
if ch in _ALL_DIACRITICS and ch not in _LIGATURE_SET:
|
| 276 |
-
total += 1
|
| 277 |
-
hyp_pos = gt_to_hyp.get(i)
|
| 278 |
-
is_correct = False
|
| 279 |
-
if hyp_pos is not None and hyp_pos < len(hyp_norm):
|
| 280 |
-
hyp_ch = hyp_norm[hyp_pos]
|
| 281 |
-
is_correct = (hyp_ch == ch)
|
| 282 |
-
if is_correct:
|
| 283 |
-
correct += 1
|
| 284 |
-
|
| 285 |
-
if ch not in per_diac:
|
| 286 |
-
per_diac[ch] = {"gt_count": 0, "ocr_correct": 0, "score": 0.0}
|
| 287 |
-
per_diac[ch]["gt_count"] += 1
|
| 288 |
-
if is_correct:
|
| 289 |
-
per_diac[ch]["ocr_correct"] += 1
|
| 290 |
-
|
| 291 |
-
for diac_data in per_diac.values():
|
| 292 |
-
diac_data["score"] = (
|
| 293 |
-
diac_data["ocr_correct"] / diac_data["gt_count"]
|
| 294 |
-
if diac_data["gt_count"] > 0
|
| 295 |
-
else 1.0
|
| 296 |
-
)
|
| 297 |
-
|
| 298 |
-
score = correct / total if total > 0 else 1.0
|
| 299 |
-
return DiacriticScore(
|
| 300 |
-
total_in_gt=total,
|
| 301 |
-
correctly_recognized=correct,
|
| 302 |
-
score=score,
|
| 303 |
-
per_diacritic=per_diac,
|
| 304 |
-
)
|
| 305 |
-
|
| 306 |
-
|
| 307 |
-
def _check_char_at_context(
|
| 308 |
-
gt: str,
|
| 309 |
-
hyp: str,
|
| 310 |
-
gt_pos: int,
|
| 311 |
-
gt_char: str,
|
| 312 |
-
equivalents: list[str],
|
| 313 |
-
) -> bool:
|
| 314 |
-
"""Vérifie si la position correspondante dans l'hypothèse contient un équivalent.
|
| 315 |
-
|
| 316 |
-
Cherche dans une fenêtre de ±5 caractères autour de la position estimée
|
| 317 |
-
pour tolérer les décalages d'alignement OCR.
|
| 318 |
-
"""
|
| 319 |
-
# Position estimée dans l'hypothèse (ratio proportionnel)
|
| 320 |
-
if len(gt) == 0:
|
| 321 |
-
return False
|
| 322 |
-
est_pos = int(gt_pos * len(hyp) / len(gt)) if len(gt) > 0 else 0
|
| 323 |
-
window = 5
|
| 324 |
-
start = max(0, est_pos - window)
|
| 325 |
-
end = min(len(hyp), est_pos + window + len(gt_char))
|
| 326 |
-
context = hyp[start:end]
|
| 327 |
-
for equiv in equivalents:
|
| 328 |
-
if equiv in context:
|
| 329 |
-
return True
|
| 330 |
-
return False
|
| 331 |
-
|
| 332 |
-
|
| 333 |
-
def aggregate_ligature_scores(scores: list[LigatureScore]) -> dict:
|
| 334 |
-
"""Agrège les scores de ligatures sur un corpus."""
|
| 335 |
-
total_gt = sum(s.total_in_gt for s in scores)
|
| 336 |
-
total_correct = sum(s.correctly_recognized for s in scores)
|
| 337 |
-
score = total_correct / total_gt if total_gt > 0 else 1.0
|
| 338 |
-
|
| 339 |
-
# Agrégation par ligature
|
| 340 |
-
per_lig: dict[str, dict] = {}
|
| 341 |
-
for s in scores:
|
| 342 |
-
for lig, data in s.per_ligature.items():
|
| 343 |
-
if lig not in per_lig:
|
| 344 |
-
per_lig[lig] = {"gt_count": 0, "ocr_correct": 0}
|
| 345 |
-
per_lig[lig]["gt_count"] += data["gt_count"]
|
| 346 |
-
per_lig[lig]["ocr_correct"] += data["ocr_correct"]
|
| 347 |
-
for lig_data in per_lig.values():
|
| 348 |
-
lig_data["score"] = (
|
| 349 |
-
lig_data["ocr_correct"] / lig_data["gt_count"]
|
| 350 |
-
if lig_data["gt_count"] > 0 else 1.0
|
| 351 |
-
)
|
| 352 |
-
|
| 353 |
-
return {
|
| 354 |
-
"score": round(score, 4),
|
| 355 |
-
"total_in_gt": total_gt,
|
| 356 |
-
"correctly_recognized": total_correct,
|
| 357 |
-
"per_ligature": per_lig,
|
| 358 |
-
}
|
| 359 |
-
|
| 360 |
|
| 361 |
-
|
| 362 |
-
|
| 363 |
-
|
| 364 |
-
|
| 365 |
-
score = total_correct / total_gt if total_gt > 0 else 1.0
|
| 366 |
-
return {
|
| 367 |
-
"score": round(score, 4),
|
| 368 |
-
"total_in_gt": total_gt,
|
| 369 |
-
"correctly_recognized": total_correct,
|
| 370 |
-
}
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.char_scores`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.char_scores import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.char_scores import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.char_scores as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,268 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
de la distance de Levenshtein (via difflib.SequenceMatcher), ce qui permet
|
| 11 |
-
d'identifier les substitutions, insertions et suppressions.
|
| 12 |
-
|
| 13 |
-
La matrice est stockée comme un dict de dict :
|
| 14 |
-
``{gt_char: {ocr_char: count}}``
|
| 15 |
-
|
| 16 |
-
La valeur spéciale ``"∅"`` (U+2205) représente un caractère vide :
|
| 17 |
-
- ``{"a": {"∅": 3}}`` → 'a' supprimé 3 fois dans l'OCR
|
| 18 |
-
- ``{"∅": {"x": 2}}`` → 'x' inséré 2 fois dans l'OCR (absent du GT)
|
| 19 |
"""
|
| 20 |
|
| 21 |
-
from
|
| 22 |
-
|
| 23 |
-
import difflib
|
| 24 |
-
from collections import defaultdict
|
| 25 |
-
from dataclasses import dataclass, field
|
| 26 |
-
|
| 27 |
-
# Symbole représentant un caractère absent (insertion / suppression)
|
| 28 |
-
EMPTY_CHAR = "∅"
|
| 29 |
-
|
| 30 |
-
# Caractères non pertinents à ignorer dans la matrice (espaces, sauts de ligne)
|
| 31 |
-
_WHITESPACE = set(" \t\n\r")
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
@dataclass
|
| 35 |
-
class ConfusionMatrix:
|
| 36 |
-
"""Matrice de confusion unicode pour une paire (GT, OCR)."""
|
| 37 |
-
|
| 38 |
-
matrix: dict[str, dict[str, int]] = field(default_factory=dict)
|
| 39 |
-
"""Clé externe = char GT ; clé interne = char OCR ; valeur = count."""
|
| 40 |
-
|
| 41 |
-
total_substitutions: int = 0
|
| 42 |
-
total_insertions: int = 0
|
| 43 |
-
total_deletions: int = 0
|
| 44 |
-
|
| 45 |
-
@property
|
| 46 |
-
def total_errors(self) -> int:
|
| 47 |
-
return self.total_substitutions + self.total_insertions + self.total_deletions
|
| 48 |
-
|
| 49 |
-
def top_confusions(self, n: int = 20) -> list[dict]:
|
| 50 |
-
"""Retourne les n confusions les plus fréquentes (substitutions uniquement)."""
|
| 51 |
-
pairs: list[tuple[str, str, int]] = []
|
| 52 |
-
for gt_char, ocr_counts in self.matrix.items():
|
| 53 |
-
if gt_char == EMPTY_CHAR:
|
| 54 |
-
continue # insertions
|
| 55 |
-
for ocr_char, count in ocr_counts.items():
|
| 56 |
-
if ocr_char == EMPTY_CHAR:
|
| 57 |
-
continue # suppressions
|
| 58 |
-
if gt_char != ocr_char:
|
| 59 |
-
pairs.append((gt_char, ocr_char, count))
|
| 60 |
-
pairs.sort(key=lambda x: -x[2])
|
| 61 |
-
return [
|
| 62 |
-
{"gt": gt, "ocr": ocr, "count": cnt}
|
| 63 |
-
for gt, ocr, cnt in pairs[:n]
|
| 64 |
-
]
|
| 65 |
-
|
| 66 |
-
def as_compact_dict(self, min_count: int = 1) -> dict:
|
| 67 |
-
"""Sérialise la matrice en éliminant les entrées rares."""
|
| 68 |
-
compact: dict[str, dict[str, int]] = {}
|
| 69 |
-
for gt_char, ocr_counts in self.matrix.items():
|
| 70 |
-
filtered = {
|
| 71 |
-
oc: cnt for oc, cnt in ocr_counts.items()
|
| 72 |
-
if cnt >= min_count
|
| 73 |
-
}
|
| 74 |
-
if filtered:
|
| 75 |
-
compact[gt_char] = filtered
|
| 76 |
-
return {
|
| 77 |
-
"matrix": compact,
|
| 78 |
-
"total_substitutions": self.total_substitutions,
|
| 79 |
-
"total_insertions": self.total_insertions,
|
| 80 |
-
"total_deletions": self.total_deletions,
|
| 81 |
-
}
|
| 82 |
-
|
| 83 |
-
def as_dict(self) -> dict:
|
| 84 |
-
return self.as_compact_dict(min_count=1)
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
def build_confusion_matrix(
|
| 88 |
-
ground_truth: str,
|
| 89 |
-
hypothesis: str,
|
| 90 |
-
ignore_whitespace: bool = True,
|
| 91 |
-
ignore_correct: bool = True,
|
| 92 |
-
) -> ConfusionMatrix:
|
| 93 |
-
"""Construit la matrice de confusion unicode pour une paire GT/OCR.
|
| 94 |
-
|
| 95 |
-
Parameters
|
| 96 |
-
----------
|
| 97 |
-
ground_truth:
|
| 98 |
-
Texte de référence (vérité terrain).
|
| 99 |
-
hypothesis:
|
| 100 |
-
Texte produit par l'OCR.
|
| 101 |
-
ignore_whitespace:
|
| 102 |
-
Si True, ignore les espaces, tabulations et sauts de ligne.
|
| 103 |
-
ignore_correct:
|
| 104 |
-
Si True, n'enregistre pas les paires identiques (gt_char == ocr_char).
|
| 105 |
-
Par défaut True pour réduire la taille de la matrice.
|
| 106 |
-
|
| 107 |
-
Returns
|
| 108 |
-
-------
|
| 109 |
-
ConfusionMatrix
|
| 110 |
-
"""
|
| 111 |
-
matrix: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int))
|
| 112 |
-
n_subs = n_ins = n_dels = 0
|
| 113 |
-
|
| 114 |
-
if not ground_truth and not hypothesis:
|
| 115 |
-
return ConfusionMatrix(dict(matrix), 0, 0, 0)
|
| 116 |
-
|
| 117 |
-
# SequenceMatcher sur listes de chars pour un alignement précis
|
| 118 |
-
matcher = difflib.SequenceMatcher(None, ground_truth, hypothesis, autojunk=False)
|
| 119 |
-
|
| 120 |
-
for tag, i1, i2, j1, j2 in matcher.get_opcodes():
|
| 121 |
-
if tag == "equal":
|
| 122 |
-
if not ignore_correct:
|
| 123 |
-
for ch in ground_truth[i1:i2]:
|
| 124 |
-
if ignore_whitespace and ch in _WHITESPACE:
|
| 125 |
-
continue
|
| 126 |
-
matrix[ch][ch] += 1
|
| 127 |
-
elif tag == "replace":
|
| 128 |
-
# Aligner char par char les séquences de longueurs différentes
|
| 129 |
-
gt_seg = ground_truth[i1:i2]
|
| 130 |
-
oc_seg = hypothesis[j1:j2]
|
| 131 |
-
_align_segments(gt_seg, oc_seg, matrix, ignore_whitespace)
|
| 132 |
-
# Substitutions = longueur commune, surplus = insertions ou suppressions
|
| 133 |
-
n_subs += min(len(gt_seg), len(oc_seg))
|
| 134 |
-
surplus = abs(len(gt_seg) - len(oc_seg))
|
| 135 |
-
if len(gt_seg) > len(oc_seg):
|
| 136 |
-
n_dels += surplus
|
| 137 |
-
else:
|
| 138 |
-
n_ins += surplus
|
| 139 |
-
elif tag == "delete":
|
| 140 |
-
for ch in ground_truth[i1:i2]:
|
| 141 |
-
if ignore_whitespace and ch in _WHITESPACE:
|
| 142 |
-
continue
|
| 143 |
-
matrix[ch][EMPTY_CHAR] += 1
|
| 144 |
-
n_dels += 1
|
| 145 |
-
elif tag == "insert":
|
| 146 |
-
for ch in hypothesis[j1:j2]:
|
| 147 |
-
if ignore_whitespace and ch in _WHITESPACE:
|
| 148 |
-
continue
|
| 149 |
-
matrix[EMPTY_CHAR][ch] += 1
|
| 150 |
-
n_ins += 1
|
| 151 |
-
|
| 152 |
-
# Convertir defaultdict en dict normal
|
| 153 |
-
result_matrix: dict[str, dict[str, int]] = {
|
| 154 |
-
k: dict(v) for k, v in matrix.items()
|
| 155 |
-
}
|
| 156 |
-
|
| 157 |
-
return ConfusionMatrix(
|
| 158 |
-
matrix=result_matrix,
|
| 159 |
-
total_substitutions=n_subs,
|
| 160 |
-
total_insertions=n_ins,
|
| 161 |
-
total_deletions=n_dels,
|
| 162 |
-
)
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
def _align_segments(
|
| 166 |
-
gt_seg: str,
|
| 167 |
-
oc_seg: str,
|
| 168 |
-
matrix: dict,
|
| 169 |
-
ignore_whitespace: bool,
|
| 170 |
-
) -> None:
|
| 171 |
-
"""Aligne deux segments de longueurs potentiellement différentes."""
|
| 172 |
-
if not gt_seg:
|
| 173 |
-
for ch in oc_seg:
|
| 174 |
-
if ignore_whitespace and ch in _WHITESPACE:
|
| 175 |
-
continue
|
| 176 |
-
matrix[EMPTY_CHAR][ch] += 1
|
| 177 |
-
return
|
| 178 |
-
if not oc_seg:
|
| 179 |
-
for ch in gt_seg:
|
| 180 |
-
if ignore_whitespace and ch in _WHITESPACE:
|
| 181 |
-
continue
|
| 182 |
-
matrix[ch][EMPTY_CHAR] += 1
|
| 183 |
-
return
|
| 184 |
-
|
| 185 |
-
if len(gt_seg) == len(oc_seg):
|
| 186 |
-
# Substitutions 1-pour-1
|
| 187 |
-
for g, o in zip(gt_seg, oc_seg):
|
| 188 |
-
if ignore_whitespace and (g in _WHITESPACE or o in _WHITESPACE):
|
| 189 |
-
continue
|
| 190 |
-
matrix[g][o] += 1
|
| 191 |
-
else:
|
| 192 |
-
# Longueurs différentes : utiliser SequenceMatcher récursif sur segments courts
|
| 193 |
-
sub = difflib.SequenceMatcher(None, gt_seg, oc_seg, autojunk=False)
|
| 194 |
-
for tag2, i1, i2, j1, j2 in sub.get_opcodes():
|
| 195 |
-
if tag2 == "equal":
|
| 196 |
-
pass
|
| 197 |
-
elif tag2 == "replace":
|
| 198 |
-
# Régression simple : aligner par troncature
|
| 199 |
-
for g, o in zip(gt_seg[i1:i2], oc_seg[j1:j2]):
|
| 200 |
-
if ignore_whitespace and (g in _WHITESPACE or o in _WHITESPACE):
|
| 201 |
-
continue
|
| 202 |
-
matrix[g][o] += 1
|
| 203 |
-
elif tag2 == "delete":
|
| 204 |
-
for g in gt_seg[i1:i2]:
|
| 205 |
-
if ignore_whitespace and g in _WHITESPACE:
|
| 206 |
-
continue
|
| 207 |
-
matrix[g][EMPTY_CHAR] += 1
|
| 208 |
-
elif tag2 == "insert":
|
| 209 |
-
for o in oc_seg[j1:j2]:
|
| 210 |
-
if ignore_whitespace and o in _WHITESPACE:
|
| 211 |
-
continue
|
| 212 |
-
matrix[EMPTY_CHAR][o] += 1
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
def aggregate_confusion_matrices(matrices: list[ConfusionMatrix]) -> ConfusionMatrix:
|
| 216 |
-
"""Agrège plusieurs matrices de confusion en une seule.
|
| 217 |
-
|
| 218 |
-
Utile pour obtenir la matrice agrégée sur l'ensemble du corpus.
|
| 219 |
-
"""
|
| 220 |
-
combined: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int))
|
| 221 |
-
total_subs = total_ins = total_dels = 0
|
| 222 |
-
|
| 223 |
-
for cm in matrices:
|
| 224 |
-
for gt_char, ocr_counts in cm.matrix.items():
|
| 225 |
-
for ocr_char, count in ocr_counts.items():
|
| 226 |
-
combined[gt_char][ocr_char] += count
|
| 227 |
-
total_subs += cm.total_substitutions
|
| 228 |
-
total_ins += cm.total_insertions
|
| 229 |
-
total_dels += cm.total_deletions
|
| 230 |
-
|
| 231 |
-
return ConfusionMatrix(
|
| 232 |
-
matrix={k: dict(v) for k, v in combined.items()},
|
| 233 |
-
total_substitutions=total_subs,
|
| 234 |
-
total_insertions=total_ins,
|
| 235 |
-
total_deletions=total_dels,
|
| 236 |
-
)
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
def top_confused_chars(
|
| 240 |
-
matrix: ConfusionMatrix,
|
| 241 |
-
n: int = 15,
|
| 242 |
-
exclude_empty: bool = True,
|
| 243 |
-
) -> list[dict]:
|
| 244 |
-
"""Retourne les caractères GT les plus souvent confondus.
|
| 245 |
-
|
| 246 |
-
Retourne une liste triée par nombre total d'erreurs décroissant :
|
| 247 |
-
``[{"char": "ſ", "total_errors": 47, "top_substitutes": [...]}, ...]``
|
| 248 |
-
"""
|
| 249 |
-
char_stats: dict[str, dict] = {}
|
| 250 |
-
for gt_char, ocr_counts in matrix.matrix.items():
|
| 251 |
-
if exclude_empty and gt_char == EMPTY_CHAR:
|
| 252 |
-
continue
|
| 253 |
-
error_count = sum(
|
| 254 |
-
cnt for oc, cnt in ocr_counts.items()
|
| 255 |
-
if (oc != gt_char) and (not exclude_empty or oc != EMPTY_CHAR)
|
| 256 |
-
)
|
| 257 |
-
if error_count > 0:
|
| 258 |
-
top_subs = sorted(
|
| 259 |
-
[{"ocr": oc, "count": cnt} for oc, cnt in ocr_counts.items() if oc != gt_char],
|
| 260 |
-
key=lambda x: -x["count"],
|
| 261 |
-
)[:5]
|
| 262 |
-
char_stats[gt_char] = {
|
| 263 |
-
"char": gt_char,
|
| 264 |
-
"total_errors": error_count,
|
| 265 |
-
"top_substitutes": top_subs,
|
| 266 |
-
}
|
| 267 |
|
| 268 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.confusion`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.confusion import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.confusion import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.confusion as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,169 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
unité (1 000 pages). Pour décider business-side, il faut projeter
|
| 9 |
-
ce coût sur le **volume cible** que l'utilisateur prévoit de
|
| 10 |
-
traiter — payer 50 € de plus sur 50 pages est trivial, sur
|
| 11 |
-
5 millions ça change tout.
|
| 12 |
-
|
| 13 |
-
Sortie typique
|
| 14 |
-
--------------
|
| 15 |
-
*« Pour vos 80 000 pages BMS — Tesseract = 3 €, Pero = 0 € (local
|
| 16 |
-
amorti), Mistral OCR = 280 €, GPT-4o post-correction = 600 €. »*
|
| 17 |
-
|
| 18 |
-
Aucun seuil arbitraire imposé : le module fournit les chiffres,
|
| 19 |
-
le chercheur arbitre selon son budget.
|
| 20 |
-
|
| 21 |
-
Dépendance
|
| 22 |
-
----------
|
| 23 |
-
S'appuie sur ``picarones.core.pricing`` (Sprint 20) qui expose
|
| 24 |
-
``EngineCost.cost_per_1k_pages_eur`` et
|
| 25 |
-
``co2_per_1k_pages_g``.
|
| 26 |
"""
|
| 27 |
|
| 28 |
-
from
|
| 29 |
-
|
| 30 |
-
import logging
|
| 31 |
-
from dataclasses import dataclass
|
| 32 |
-
from typing import Optional
|
| 33 |
-
|
| 34 |
-
from picarones.core.pricing import EngineCost
|
| 35 |
-
|
| 36 |
-
logger = logging.getLogger(__name__)
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
@dataclass(frozen=True)
|
| 40 |
-
class ProjectedCost:
|
| 41 |
-
"""Coût total projeté d'un moteur pour un volume cible."""
|
| 42 |
-
engine_key: str
|
| 43 |
-
target_pages: int
|
| 44 |
-
cost_total_eur: Optional[float]
|
| 45 |
-
co2_total_g: Optional[float]
|
| 46 |
-
cost_per_1k_pages_eur: Optional[float]
|
| 47 |
-
co2_per_1k_pages_g: Optional[float]
|
| 48 |
-
type: str # "local" / "cloud_api" / "unknown"
|
| 49 |
-
|
| 50 |
-
def as_dict(self) -> dict:
|
| 51 |
-
return {
|
| 52 |
-
"engine_key": self.engine_key,
|
| 53 |
-
"target_pages": self.target_pages,
|
| 54 |
-
"cost_total_eur": self.cost_total_eur,
|
| 55 |
-
"co2_total_g": self.co2_total_g,
|
| 56 |
-
"cost_per_1k_pages_eur": self.cost_per_1k_pages_eur,
|
| 57 |
-
"co2_per_1k_pages_g": self.co2_per_1k_pages_g,
|
| 58 |
-
"type": self.type,
|
| 59 |
-
}
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
def project_cost_total(
|
| 63 |
-
engine_cost: EngineCost, target_pages: int,
|
| 64 |
-
) -> Optional[float]:
|
| 65 |
-
"""Coût total projeté en euros pour ``target_pages`` pages.
|
| 66 |
-
|
| 67 |
-
Retourne ``None`` si ``cost_per_1k_pages_eur`` est ``None``
|
| 68 |
-
(données insuffisantes) ou si ``target_pages`` est négatif.
|
| 69 |
-
"""
|
| 70 |
-
if target_pages < 0:
|
| 71 |
-
return None
|
| 72 |
-
if engine_cost.cost_per_1k_pages_eur is None:
|
| 73 |
-
return None
|
| 74 |
-
return engine_cost.cost_per_1k_pages_eur * target_pages / 1000.0
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
def project_co2_total(
|
| 78 |
-
engine_cost: EngineCost, target_pages: int,
|
| 79 |
-
) -> Optional[float]:
|
| 80 |
-
"""Empreinte CO₂ totale en grammes pour ``target_pages`` pages."""
|
| 81 |
-
if target_pages < 0:
|
| 82 |
-
return None
|
| 83 |
-
if engine_cost.co2_per_1k_pages_g is None:
|
| 84 |
-
return None
|
| 85 |
-
return engine_cost.co2_per_1k_pages_g * target_pages / 1000.0
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
def project_engine(
|
| 89 |
-
engine_cost: EngineCost, target_pages: int,
|
| 90 |
-
) -> ProjectedCost:
|
| 91 |
-
"""Retourne le ``ProjectedCost`` complet pour un moteur."""
|
| 92 |
-
return ProjectedCost(
|
| 93 |
-
engine_key=engine_cost.engine_key,
|
| 94 |
-
target_pages=int(target_pages),
|
| 95 |
-
cost_total_eur=project_cost_total(engine_cost, target_pages),
|
| 96 |
-
co2_total_g=project_co2_total(engine_cost, target_pages),
|
| 97 |
-
cost_per_1k_pages_eur=engine_cost.cost_per_1k_pages_eur,
|
| 98 |
-
co2_per_1k_pages_g=engine_cost.co2_per_1k_pages_g,
|
| 99 |
-
type=engine_cost.type,
|
| 100 |
-
)
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
def project_all_engines(
|
| 104 |
-
engine_costs: dict[str, EngineCost],
|
| 105 |
-
target_pages: int,
|
| 106 |
-
) -> dict[str, ProjectedCost]:
|
| 107 |
-
"""Projette les coûts de plusieurs moteurs sur le volume cible.
|
| 108 |
-
|
| 109 |
-
Retourne un dict ``{engine_name: ProjectedCost}`` avec entrée
|
| 110 |
-
pour chaque moteur, y compris ceux sans données de coût (où
|
| 111 |
-
``cost_total_eur`` sera ``None``).
|
| 112 |
-
"""
|
| 113 |
-
if target_pages < 0:
|
| 114 |
-
raise ValueError("target_pages doit être ≥ 0")
|
| 115 |
-
return {
|
| 116 |
-
name: project_engine(cost, target_pages)
|
| 117 |
-
for name, cost in engine_costs.items()
|
| 118 |
-
}
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
def cost_gap_table(
|
| 122 |
-
projections: dict[str, ProjectedCost],
|
| 123 |
-
baseline_engine: str,
|
| 124 |
-
) -> dict[str, dict[str, Optional[float]]]:
|
| 125 |
-
"""Pour chaque moteur, écart de coût total vs baseline.
|
| 126 |
-
|
| 127 |
-
Retourne ``{engine: {"total": float, "delta_abs": float,
|
| 128 |
-
"delta_rel": float}}`` où :
|
| 129 |
-
|
| 130 |
-
- ``delta_abs`` = ``cost - cost_baseline`` (None si l'un des
|
| 131 |
-
deux est None)
|
| 132 |
-
- ``delta_rel`` = ``delta_abs / cost_baseline`` (None si
|
| 133 |
-
baseline = 0 ou None)
|
| 134 |
-
|
| 135 |
-
Lève ``KeyError`` si la baseline est inconnue.
|
| 136 |
-
"""
|
| 137 |
-
if baseline_engine not in projections:
|
| 138 |
-
raise KeyError(
|
| 139 |
-
f"baseline {baseline_engine!r} absente des projections",
|
| 140 |
-
)
|
| 141 |
-
baseline_total = projections[baseline_engine].cost_total_eur
|
| 142 |
-
out: dict[str, dict[str, Optional[float]]] = {}
|
| 143 |
-
for name, proj in projections.items():
|
| 144 |
-
total = proj.cost_total_eur
|
| 145 |
-
if total is None or baseline_total is None:
|
| 146 |
-
delta_abs: Optional[float] = None
|
| 147 |
-
delta_rel: Optional[float] = None
|
| 148 |
-
else:
|
| 149 |
-
delta_abs = total - baseline_total
|
| 150 |
-
if baseline_total != 0:
|
| 151 |
-
delta_rel = delta_abs / baseline_total
|
| 152 |
-
else:
|
| 153 |
-
delta_rel = None
|
| 154 |
-
out[name] = {
|
| 155 |
-
"total": total,
|
| 156 |
-
"delta_abs": delta_abs,
|
| 157 |
-
"delta_rel": delta_rel,
|
| 158 |
-
}
|
| 159 |
-
return out
|
| 160 |
-
|
| 161 |
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
"
|
| 165 |
-
|
| 166 |
-
"project_engine",
|
| 167 |
-
"project_all_engines",
|
| 168 |
-
"cost_gap_table",
|
| 169 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.cost_projection`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.cost_projection import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.cost_projection import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.cost_projection as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,202 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
+ w_quality * (1 - image_quality_score)
|
| 10 |
-
+ w_density * special_char_density
|
| 11 |
-
|
| 12 |
-
où :
|
| 13 |
-
- variance_norm : variance inter-moteurs du CER, normalisée [0, 1]
|
| 14 |
-
- image_quality : score de qualité image [0, 1] (netteté, contraste…)
|
| 15 |
-
- special_chars : densité de caractères spéciaux dans la GT [0, 1]
|
| 16 |
-
|
| 17 |
-
Les poids sont configurables (défaut : 0.4 / 0.35 / 0.25).
|
| 18 |
-
|
| 19 |
-
Score final : [0, 1] — 0 = document facile, 1 = très difficile.
|
| 20 |
"""
|
| 21 |
|
| 22 |
-
from
|
| 23 |
-
|
| 24 |
-
import re
|
| 25 |
-
from dataclasses import dataclass
|
| 26 |
-
from typing import Optional
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
# Poids par défaut
|
| 30 |
-
_W_VARIANCE = 0.40
|
| 31 |
-
_W_QUALITY = 0.35
|
| 32 |
-
_W_DENSITY = 0.25
|
| 33 |
-
|
| 34 |
-
# Caractères spéciaux patrimoniaux (ligatures, abréviations, diacritiques rares)
|
| 35 |
-
_SPECIAL_CHARS_RE = re.compile(
|
| 36 |
-
r"[ſœæꝑꝓ&]" # ligatures / abréviations médiévales
|
| 37 |
-
r"|[ḁ-ỿ]" # Latin Étendu Additionnel (diacritiques rares)
|
| 38 |
-
r"|[\u0300-\u036f]" # Diacritiques combinants
|
| 39 |
-
r"|[\ufb00-\ufb06]" # Formes de présentation latines (fi, fl…)
|
| 40 |
-
r"|[IVXLCDM]{3,}" # Chiffres romains (3+ caractères)
|
| 41 |
-
)
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
@dataclass
|
| 45 |
-
class DifficultyScore:
|
| 46 |
-
"""Score de difficulté intrinsèque d'un document."""
|
| 47 |
-
doc_id: str
|
| 48 |
-
score: float
|
| 49 |
-
"""Score global [0, 1] — plus élevé = plus difficile."""
|
| 50 |
-
variance_component: float
|
| 51 |
-
"""Composante variance inter-moteurs [0, 1]."""
|
| 52 |
-
quality_component: float
|
| 53 |
-
"""Composante qualité image inversée [0, 1]."""
|
| 54 |
-
density_component: float
|
| 55 |
-
"""Composante densité caractères spéciaux [0, 1]."""
|
| 56 |
-
cer_variance: float
|
| 57 |
-
"""Variance brute du CER entre moteurs."""
|
| 58 |
-
image_quality_score: float
|
| 59 |
-
"""Score de qualité image (si disponible, sinon 0.5)."""
|
| 60 |
-
special_char_ratio: float
|
| 61 |
-
"""Ratio caractères spéciaux / longueur GT."""
|
| 62 |
-
|
| 63 |
-
def as_dict(self) -> dict:
|
| 64 |
-
return {
|
| 65 |
-
"doc_id": self.doc_id,
|
| 66 |
-
"score": round(self.score, 4),
|
| 67 |
-
"variance_component": round(self.variance_component, 4),
|
| 68 |
-
"quality_component": round(self.quality_component, 4),
|
| 69 |
-
"density_component": round(self.density_component, 4),
|
| 70 |
-
"cer_variance": round(self.cer_variance, 6),
|
| 71 |
-
"image_quality_score": round(self.image_quality_score, 4),
|
| 72 |
-
"special_char_ratio": round(self.special_char_ratio, 4),
|
| 73 |
-
}
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
def _special_char_density(text: str) -> float:
|
| 77 |
-
"""Ratio de caractères spéciaux patrimoniaux dans le texte."""
|
| 78 |
-
if not text:
|
| 79 |
-
return 0.0
|
| 80 |
-
matches = len(_SPECIAL_CHARS_RE.findall(text))
|
| 81 |
-
return min(1.0, matches / len(text))
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
def _variance(values: list[float]) -> float:
|
| 85 |
-
"""Variance d'une liste de valeurs."""
|
| 86 |
-
if len(values) < 2:
|
| 87 |
-
return 0.0
|
| 88 |
-
mu = sum(values) / len(values)
|
| 89 |
-
return sum((v - mu) ** 2 for v in values) / len(values)
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
def compute_difficulty_score(
|
| 93 |
-
doc_id: str,
|
| 94 |
-
ground_truth: str,
|
| 95 |
-
cer_per_engine: list[float],
|
| 96 |
-
image_quality_score: Optional[float] = None,
|
| 97 |
-
weights: tuple[float, float, float] = (_W_VARIANCE, _W_QUALITY, _W_DENSITY),
|
| 98 |
-
) -> DifficultyScore:
|
| 99 |
-
"""Calcule le score de difficulté intrinsèque pour un document.
|
| 100 |
-
|
| 101 |
-
Parameters
|
| 102 |
-
----------
|
| 103 |
-
doc_id : identifiant du document
|
| 104 |
-
ground_truth : texte de référence
|
| 105 |
-
cer_per_engine : liste des CER (un par moteur concurrent)
|
| 106 |
-
image_quality_score: score de qualité image [0, 1] (None → 0.5 neutre)
|
| 107 |
-
weights : (w_variance, w_quality, w_density)
|
| 108 |
-
|
| 109 |
-
Returns
|
| 110 |
-
-------
|
| 111 |
-
DifficultyScore
|
| 112 |
-
"""
|
| 113 |
-
w_var, w_qual, w_den = weights
|
| 114 |
-
|
| 115 |
-
# 1. Variance inter-moteurs (normalisée sur [0, 1] — variance max ≈ 0.25)
|
| 116 |
-
cer_var = _variance(cer_per_engine)
|
| 117 |
-
variance_norm = min(1.0, cer_var / 0.25)
|
| 118 |
-
|
| 119 |
-
# 2. Qualité image inversée
|
| 120 |
-
iq = image_quality_score if image_quality_score is not None else 0.5
|
| 121 |
-
iq = max(0.0, min(1.0, iq))
|
| 122 |
-
quality_component = 1.0 - iq
|
| 123 |
-
|
| 124 |
-
# 3. Densité de caractères spéciaux
|
| 125 |
-
density = _special_char_density(ground_truth)
|
| 126 |
-
# Amplifier légèrement (la densité brute est souvent faible)
|
| 127 |
-
density_component = min(1.0, density * 3.0)
|
| 128 |
-
|
| 129 |
-
# Score combiné
|
| 130 |
-
score = (
|
| 131 |
-
w_var * variance_norm
|
| 132 |
-
+ w_qual * quality_component
|
| 133 |
-
+ w_den * density_component
|
| 134 |
-
)
|
| 135 |
-
score = max(0.0, min(1.0, score))
|
| 136 |
-
|
| 137 |
-
return DifficultyScore(
|
| 138 |
-
doc_id=doc_id,
|
| 139 |
-
score=score,
|
| 140 |
-
variance_component=variance_norm,
|
| 141 |
-
quality_component=quality_component,
|
| 142 |
-
density_component=density_component,
|
| 143 |
-
cer_variance=cer_var,
|
| 144 |
-
image_quality_score=iq,
|
| 145 |
-
special_char_ratio=density,
|
| 146 |
-
)
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
def compute_all_difficulties(
|
| 150 |
-
doc_ids: list[str],
|
| 151 |
-
ground_truths: dict[str, str],
|
| 152 |
-
cer_map: dict[str, dict[str, float]],
|
| 153 |
-
image_quality_map: Optional[dict[str, float]] = None,
|
| 154 |
-
) -> dict[str, DifficultyScore]:
|
| 155 |
-
"""Calcule les scores de difficulté pour tous les documents d'un corpus.
|
| 156 |
-
|
| 157 |
-
Parameters
|
| 158 |
-
----------
|
| 159 |
-
doc_ids : liste des identifiants de documents
|
| 160 |
-
ground_truths : {doc_id → gt_text}
|
| 161 |
-
cer_map : {doc_id → {engine_name → cer}}
|
| 162 |
-
image_quality_map : {doc_id → quality_score} (facultatif)
|
| 163 |
-
|
| 164 |
-
Returns
|
| 165 |
-
-------
|
| 166 |
-
{doc_id → DifficultyScore}
|
| 167 |
-
"""
|
| 168 |
-
result = {}
|
| 169 |
-
for doc_id in doc_ids:
|
| 170 |
-
gt = ground_truths.get(doc_id, "")
|
| 171 |
-
engine_cers = list(cer_map.get(doc_id, {}).values())
|
| 172 |
-
iq = (image_quality_map or {}).get(doc_id)
|
| 173 |
-
result[doc_id] = compute_difficulty_score(
|
| 174 |
-
doc_id=doc_id,
|
| 175 |
-
ground_truth=gt,
|
| 176 |
-
cer_per_engine=engine_cers,
|
| 177 |
-
image_quality_score=iq,
|
| 178 |
-
)
|
| 179 |
-
return result
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
def difficulty_label(score: float) -> str:
|
| 183 |
-
"""Retourne un label lisible pour un score de difficulté."""
|
| 184 |
-
if score < 0.25:
|
| 185 |
-
return "Facile"
|
| 186 |
-
if score < 0.50:
|
| 187 |
-
return "Modéré"
|
| 188 |
-
if score < 0.75:
|
| 189 |
-
return "Difficile"
|
| 190 |
-
return "Très difficile"
|
| 191 |
-
|
| 192 |
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
return COLOR_GREEN
|
| 198 |
-
if score < 0.50:
|
| 199 |
-
return COLOR_YELLOW
|
| 200 |
-
if score < 0.75:
|
| 201 |
-
return COLOR_ORANGE
|
| 202 |
-
return COLOR_RED
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.difficulty`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.difficulty import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.difficulty import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.difficulty as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,199 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
(``medieval_french``, ``early_modern_french``, etc.) appliquent un
|
| 9 |
-
**bloc entier** de transformations. Mais un éditeur peut vouloir
|
| 10 |
-
nuancer : *« je tolère ``ſ → s`` mais pas ``u → v`` »* — par
|
| 11 |
-
exemple parce qu'il édite un imprimé du XVIᵉ où u/v sont
|
| 12 |
-
distinctes mais où le s long doit être normalisé.
|
| 13 |
-
|
| 14 |
-
Ce module **éclate** chaque profil en règles d'équivalence
|
| 15 |
-
**nommées et indépendantes** que l'utilisateur peut activer ou
|
| 16 |
-
désactiver une par une. La couche de calcul retourne le CER
|
| 17 |
-
recalculé avec un sous-ensemble personnalisé.
|
| 18 |
-
|
| 19 |
-
Format
|
| 20 |
-
------
|
| 21 |
-
Chaque règle a :
|
| 22 |
-
|
| 23 |
-
- ``name`` : identifiant stable utilisé dans les URLs et l'UX
|
| 24 |
-
(ex. ``"longs_s"``, ``"u_eq_v"``)
|
| 25 |
-
- ``source`` : caractère ou séquence à remplacer
|
| 26 |
-
- ``target`` : caractère ou séquence cible
|
| 27 |
-
- ``description`` : phrase courte FR destinée à l'utilisateur
|
| 28 |
-
- ``profile_tag`` : nom du profil dont elle est issue (utile pour
|
| 29 |
-
grouper dans l'UX)
|
| 30 |
-
|
| 31 |
-
Stratégie de découpage
|
| 32 |
-
----------------------
|
| 33 |
-
Couche de calcul d'abord (pattern Sprint 71/75/76). L'UX panneau
|
| 34 |
-
avancé (cases à cocher + recalcul JS client + URL state) suivra
|
| 35 |
-
dans un sprint dédié — la couche calcul livrée ici est une
|
| 36 |
-
fondation suffisante pour qu'un développeur frontend câble la vue.
|
| 37 |
"""
|
| 38 |
|
| 39 |
-
from
|
| 40 |
-
|
| 41 |
-
import logging
|
| 42 |
-
from dataclasses import dataclass
|
| 43 |
-
from typing import Iterable, Optional
|
| 44 |
-
|
| 45 |
-
from picarones.core.normalization import (
|
| 46 |
-
DIPLOMATIC_EN_EARLY_MODERN,
|
| 47 |
-
DIPLOMATIC_FR_EARLY_MODERN,
|
| 48 |
-
DIPLOMATIC_LATIN_MEDIEVAL,
|
| 49 |
-
DIPLOMATIC_MINIMAL,
|
| 50 |
-
)
|
| 51 |
-
|
| 52 |
-
logger = logging.getLogger(__name__)
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
@dataclass(frozen=True)
|
| 56 |
-
class EquivalenceRule:
|
| 57 |
-
"""Une équivalence diplomatique nommée et indépendante."""
|
| 58 |
-
name: str
|
| 59 |
-
source: str
|
| 60 |
-
target: str
|
| 61 |
-
description: str
|
| 62 |
-
profile_tag: str
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
# Catalogue : on dérive des profils existants en attribuant un nom
|
| 66 |
-
# stable à chaque transformation. Les doublons (ex. ``ſ → s``
|
| 67 |
-
# présent dans plusieurs profils) sont fusionnés sous un nom unique
|
| 68 |
-
# (le premier rencontré).
|
| 69 |
-
def _build_catalog() -> dict[str, EquivalenceRule]:
|
| 70 |
-
catalog: dict[str, EquivalenceRule] = {}
|
| 71 |
-
|
| 72 |
-
# Noms canoniques pour les transformations courantes
|
| 73 |
-
canonical_names: dict[tuple[str, str], tuple[str, str]] = {
|
| 74 |
-
("ſ", "s"): ("longs_s", "s long ſ → s"),
|
| 75 |
-
("u", "v"): ("u_eq_v", "u/v interchangeables (vpon → upon)"),
|
| 76 |
-
("i", "j"): ("i_eq_j", "i/j interchangeables (ioy → joy)"),
|
| 77 |
-
("y", "i"): ("y_eq_i", "y → i (Latin médiéval)"),
|
| 78 |
-
("vv", "w"): ("vv_eq_w", "vv → w (anglais moderne)"),
|
| 79 |
-
("æ", "ae"): ("ae_ligature", "æ → ae"),
|
| 80 |
-
("œ", "oe"): ("oe_ligature", "œ → oe"),
|
| 81 |
-
("þ", "th"): ("thorn_th", "þ (thorn) → th"),
|
| 82 |
-
("ð", "th"): ("eth_th", "ð (eth) → th"),
|
| 83 |
-
("ȝ", "y"): ("yogh_y", "ȝ (yogh) → y"),
|
| 84 |
-
("&", "et"): ("ampersand_et", "& → et (esperluette)"),
|
| 85 |
-
("ỹ", "yn"): ("y_tilde_yn", "ỹ → yn"),
|
| 86 |
-
("ꝑ", "per"): ("p_per", "ꝑ → per (abréviation Capelli)"),
|
| 87 |
-
("ꝓ", "pro"): ("p_pro", "ꝓ → pro (abréviation Capelli)"),
|
| 88 |
-
("ꝗ", "que"): ("q_que", "ꝗ → que (q barré)"),
|
| 89 |
-
}
|
| 90 |
-
|
| 91 |
-
sources = [
|
| 92 |
-
("medieval_french", DIPLOMATIC_LATIN_MEDIEVAL),
|
| 93 |
-
("early_modern_french", DIPLOMATIC_FR_EARLY_MODERN),
|
| 94 |
-
("early_modern_english", DIPLOMATIC_EN_EARLY_MODERN),
|
| 95 |
-
("minimal", DIPLOMATIC_MINIMAL),
|
| 96 |
-
]
|
| 97 |
-
|
| 98 |
-
for profile_tag, profile_dict in sources:
|
| 99 |
-
for source, target in profile_dict.items():
|
| 100 |
-
key = (source, target)
|
| 101 |
-
if key in canonical_names:
|
| 102 |
-
name, desc = canonical_names[key]
|
| 103 |
-
else:
|
| 104 |
-
# Fallback : générer un nom à partir des codepoints
|
| 105 |
-
name = f"{source}_to_{target}".replace(" ", "_")
|
| 106 |
-
desc = f"{source} → {target}"
|
| 107 |
-
if name in catalog:
|
| 108 |
-
# On garde le profile_tag du premier rencontré, mais
|
| 109 |
-
# on note que la règle est partagée.
|
| 110 |
-
continue
|
| 111 |
-
catalog[name] = EquivalenceRule(
|
| 112 |
-
name=name,
|
| 113 |
-
source=source,
|
| 114 |
-
target=target,
|
| 115 |
-
description=desc,
|
| 116 |
-
profile_tag=profile_tag,
|
| 117 |
-
)
|
| 118 |
-
return catalog
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
BUILTIN_EQUIVALENCES: dict[str, EquivalenceRule] = _build_catalog()
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
def list_equivalences_by_profile(
|
| 125 |
-
profile_name: Optional[str] = None,
|
| 126 |
-
) -> list[EquivalenceRule]:
|
| 127 |
-
"""Liste les règles d'équivalence disponibles.
|
| 128 |
-
|
| 129 |
-
Si ``profile_name`` est fourni, ne retourne que les règles dont
|
| 130 |
-
``profile_tag == profile_name`` (ou les règles dérivées de
|
| 131 |
-
plusieurs profils dont au moins un est ``profile_name``).
|
| 132 |
-
"""
|
| 133 |
-
if profile_name is None:
|
| 134 |
-
return list(BUILTIN_EQUIVALENCES.values())
|
| 135 |
-
return [
|
| 136 |
-
rule for rule in BUILTIN_EQUIVALENCES.values()
|
| 137 |
-
if rule.profile_tag == profile_name
|
| 138 |
-
]
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
def apply_selected_equivalences(
|
| 142 |
-
text: Optional[str],
|
| 143 |
-
selected_names: Iterable[str],
|
| 144 |
-
) -> str:
|
| 145 |
-
"""Applique uniquement les règles dont le nom est dans
|
| 146 |
-
``selected_names``.
|
| 147 |
-
|
| 148 |
-
L'ordre d'application est l'ordre du catalogue interne — les
|
| 149 |
-
transformations sont appliquées séquentiellement sur le texte.
|
| 150 |
-
Les règles inconnues sont silencieusement ignorées (avec
|
| 151 |
-
warning).
|
| 152 |
-
"""
|
| 153 |
-
if not text:
|
| 154 |
-
return text or ""
|
| 155 |
-
selected_set = set(selected_names)
|
| 156 |
-
if not selected_set:
|
| 157 |
-
return text
|
| 158 |
-
out = text
|
| 159 |
-
for name, rule in BUILTIN_EQUIVALENCES.items():
|
| 160 |
-
if name not in selected_set:
|
| 161 |
-
continue
|
| 162 |
-
out = out.replace(rule.source, rule.target)
|
| 163 |
-
# Détection des règles inconnues (pour logger explicite)
|
| 164 |
-
unknown = selected_set - set(BUILTIN_EQUIVALENCES.keys())
|
| 165 |
-
if unknown:
|
| 166 |
-
logger.warning(
|
| 167 |
-
"[equivalence_profile] règles inconnues ignorées : %s",
|
| 168 |
-
sorted(unknown),
|
| 169 |
-
)
|
| 170 |
-
return out
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
def compute_cer_with_equivalences(
|
| 174 |
-
reference: Optional[str],
|
| 175 |
-
hypothesis: Optional[str],
|
| 176 |
-
selected_names: Iterable[str],
|
| 177 |
-
) -> float:
|
| 178 |
-
"""Calcule le CER après application des équivalences sélectionnées
|
| 179 |
-
sur les **deux** côtés (GT et hypothèse).
|
| 180 |
-
|
| 181 |
-
Utilise ``picarones.core.metrics.compute_metrics`` et extrait
|
| 182 |
-
le champ ``cer`` du résultat.
|
| 183 |
-
"""
|
| 184 |
-
from picarones.core.metrics import compute_metrics
|
| 185 |
-
|
| 186 |
-
selected_list = list(selected_names)
|
| 187 |
-
ref = apply_selected_equivalences(reference or "", selected_list)
|
| 188 |
-
hyp = apply_selected_equivalences(hypothesis or "", selected_list)
|
| 189 |
-
result = compute_metrics(ref, hyp)
|
| 190 |
-
return result.cer
|
| 191 |
-
|
| 192 |
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
"
|
| 196 |
-
|
| 197 |
-
"apply_selected_equivalences",
|
| 198 |
-
"compute_cer_with_equivalences",
|
| 199 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.equivalence_profile`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.equivalence_profile import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.equivalence_profile import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.equivalence_profile as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
@@ -1,276 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
entre OCR amont, ce n'est pas qu'il « améliore » tous les
|
| 9 |
-
moteurs — c'est qu'il introduit ses propres biais qui dominent
|
| 10 |
-
ceux de l'OCR. Mesurer la dégradation par étape ne suffit
|
| 11 |
-
pas : il faut **séparer** les deux flux.
|
| 12 |
-
|
| 13 |
-
À chaque jonction où un module transforme un artefact, on
|
| 14 |
-
mesure :
|
| 15 |
-
|
| 16 |
-
- **Taux de correction** : parmi les erreurs présentes en
|
| 17 |
-
entrée du module, combien sont corrigées en sortie ?
|
| 18 |
-
- **Taux d'introduction** : parmi les erreurs présentes en
|
| 19 |
-
sortie, combien sont **nouvelles** (absentes en entrée) ?
|
| 20 |
-
|
| 21 |
-
C'est la généralisation du score de sur-normalisation
|
| 22 |
-
(chantier A.I.7) à toute jonction. La formule s'applique
|
| 23 |
-
uniformément à OCR→LLM, OCR→reconstructor, VLM→ALTO_mapper —
|
| 24 |
-
toute jonction qui transforme un artefact en un autre du même
|
| 25 |
-
type.
|
| 26 |
-
|
| 27 |
-
Méthode (token-level)
|
| 28 |
-
---------------------
|
| 29 |
-
On split en tokens whitespace ``reference``, ``before``,
|
| 30 |
-
``after``. On compare en **multiset** (un token GT consommé
|
| 31 |
-
au plus une fois) :
|
| 32 |
-
|
| 33 |
-
- ``errors_before`` = tokens GT non retrouvés dans ``before``
|
| 34 |
-
- ``errors_after`` = tokens GT non retrouvés dans ``after``
|
| 35 |
-
- ``corrected`` = ``errors_before \\ errors_after``
|
| 36 |
-
(présents avant, absents après → corrigés)
|
| 37 |
-
- ``introduced`` = ``errors_after \\ errors_before``
|
| 38 |
-
(absents avant, présents après → introduits)
|
| 39 |
-
|
| 40 |
-
Garde-fou : le module ne classe pas les erreurs (visuelles,
|
| 41 |
-
abréviations, etc.) — c'est une métrique d'**absorption de
|
| 42 |
-
volume**, pas de qualité éditoriale. L'intersection sémantique
|
| 43 |
-
avec ``taxonomy`` (Sprint 5) est documentée dans le glossaire.
|
| 44 |
-
|
| 45 |
-
Sortie
|
| 46 |
-
------
|
| 47 |
-
``compute_error_absorption(reference, before, after)`` retourne :
|
| 48 |
-
|
| 49 |
-
.. code-block:: text
|
| 50 |
-
|
| 51 |
-
{
|
| 52 |
-
"n_gt_tokens": int,
|
| 53 |
-
"n_errors_before": int,
|
| 54 |
-
"n_errors_after": int,
|
| 55 |
-
"n_corrected": int,
|
| 56 |
-
"n_introduced": int,
|
| 57 |
-
"n_kept_wrong": int,
|
| 58 |
-
"correction_rate": float | None, # n_corrected / n_errors_before
|
| 59 |
-
"introduction_rate": float | None, # n_introduced / n_errors_after
|
| 60 |
-
"net_improvement": int, # n_corrected - n_introduced
|
| 61 |
-
"corrected_tokens": list[str],
|
| 62 |
-
"introduced_tokens": list[str],
|
| 63 |
-
}
|
| 64 |
-
|
| 65 |
-
``aggregate_error_absorption(per_doc_results)`` somme les
|
| 66 |
-
compteurs corpus-wide et recalcule les taux *micro*.
|
| 67 |
"""
|
| 68 |
|
| 69 |
-
from
|
| 70 |
-
|
| 71 |
-
import logging
|
| 72 |
-
from collections import Counter
|
| 73 |
-
from typing import Iterable, Optional
|
| 74 |
-
|
| 75 |
-
logger = logging.getLogger(__name__)
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
def _split_words(text: Optional[str]) -> list[str]:
|
| 79 |
-
if not text:
|
| 80 |
-
return []
|
| 81 |
-
return text.split()
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
def _missing_tokens(
|
| 85 |
-
reference: list[str], hypothesis: list[str],
|
| 86 |
-
) -> Counter:
|
| 87 |
-
"""Tokens GT manquants en hypothèse au sens multiset.
|
| 88 |
-
|
| 89 |
-
Un token GT compte plusieurs fois s'il apparaît plusieurs
|
| 90 |
-
fois ; chaque occurrence en hypothèse en absorbe au plus
|
| 91 |
-
une. Retourne un Counter ``{token: nb_occurrences_manquees}``.
|
| 92 |
-
"""
|
| 93 |
-
ref_count = Counter(reference)
|
| 94 |
-
hyp_count = Counter(hypothesis)
|
| 95 |
-
missing: Counter = Counter()
|
| 96 |
-
for token, n_ref in ref_count.items():
|
| 97 |
-
n_hyp = hyp_count.get(token, 0)
|
| 98 |
-
if n_hyp < n_ref:
|
| 99 |
-
missing[token] = n_ref - n_hyp
|
| 100 |
-
return missing
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
def compute_error_absorption(
|
| 104 |
-
reference: Optional[str],
|
| 105 |
-
before: Optional[str],
|
| 106 |
-
after: Optional[str],
|
| 107 |
-
*,
|
| 108 |
-
case_sensitive: bool = False,
|
| 109 |
-
) -> Optional[dict]:
|
| 110 |
-
"""Mesure l'absorption d'erreur entre ``before`` et ``after``.
|
| 111 |
-
|
| 112 |
-
Parameters
|
| 113 |
-
----------
|
| 114 |
-
reference:
|
| 115 |
-
GT (vérité terrain).
|
| 116 |
-
before:
|
| 117 |
-
Sortie de l'étape précédente (typiquement OCR amont).
|
| 118 |
-
after:
|
| 119 |
-
Sortie de l'étape courante (typiquement post-correction LLM).
|
| 120 |
-
case_sensitive:
|
| 121 |
-
Si False (défaut), match case-insensitive — la sortie
|
| 122 |
-
``corrected_tokens``/``introduced_tokens`` reste en casse
|
| 123 |
-
GT originale.
|
| 124 |
-
|
| 125 |
-
Returns
|
| 126 |
-
-------
|
| 127 |
-
dict | None
|
| 128 |
-
``None`` si la GT est vide ou ne contient aucun token.
|
| 129 |
-
"""
|
| 130 |
-
ref_tokens = _split_words(reference)
|
| 131 |
-
if not ref_tokens:
|
| 132 |
-
return None
|
| 133 |
-
before_tokens = _split_words(before)
|
| 134 |
-
after_tokens = _split_words(after)
|
| 135 |
-
|
| 136 |
-
if case_sensitive:
|
| 137 |
-
ref_match = list(ref_tokens)
|
| 138 |
-
before_match = list(before_tokens)
|
| 139 |
-
after_match = list(after_tokens)
|
| 140 |
-
else:
|
| 141 |
-
ref_match = [t.lower() for t in ref_tokens]
|
| 142 |
-
before_match = [t.lower() for t in before_tokens]
|
| 143 |
-
after_match = [t.lower() for t in after_tokens]
|
| 144 |
-
|
| 145 |
-
# Map case-insensitive token → liste de casses GT originales
|
| 146 |
-
ref_orig_by_match: dict[str, list[str]] = {}
|
| 147 |
-
for orig, m in zip(ref_tokens, ref_match):
|
| 148 |
-
ref_orig_by_match.setdefault(m, []).append(orig)
|
| 149 |
-
|
| 150 |
-
missing_before = _missing_tokens(ref_match, before_match)
|
| 151 |
-
missing_after = _missing_tokens(ref_match, after_match)
|
| 152 |
-
|
| 153 |
-
n_errors_before = sum(missing_before.values())
|
| 154 |
-
n_errors_after = sum(missing_after.values())
|
| 155 |
-
|
| 156 |
-
# Calcul corrigé / introduit en multiset
|
| 157 |
-
corrected_counter: Counter = Counter()
|
| 158 |
-
introduced_counter: Counter = Counter()
|
| 159 |
-
kept_wrong_counter: Counter = Counter()
|
| 160 |
-
all_tokens = set(missing_before) | set(missing_after)
|
| 161 |
-
for tok in all_tokens:
|
| 162 |
-
nb = missing_before.get(tok, 0)
|
| 163 |
-
na = missing_after.get(tok, 0)
|
| 164 |
-
if nb > na:
|
| 165 |
-
corrected_counter[tok] = nb - na
|
| 166 |
-
kept_wrong_counter[tok] = na
|
| 167 |
-
elif na > nb:
|
| 168 |
-
introduced_counter[tok] = na - nb
|
| 169 |
-
kept_wrong_counter[tok] = nb
|
| 170 |
-
else:
|
| 171 |
-
kept_wrong_counter[tok] = nb
|
| 172 |
-
|
| 173 |
-
n_corrected = sum(corrected_counter.values())
|
| 174 |
-
n_introduced = sum(introduced_counter.values())
|
| 175 |
-
n_kept_wrong = sum(kept_wrong_counter.values())
|
| 176 |
-
|
| 177 |
-
correction_rate = (
|
| 178 |
-
n_corrected / n_errors_before
|
| 179 |
-
if n_errors_before > 0 else None
|
| 180 |
-
)
|
| 181 |
-
introduction_rate = (
|
| 182 |
-
n_introduced / n_errors_after
|
| 183 |
-
if n_errors_after > 0 else None
|
| 184 |
-
)
|
| 185 |
-
|
| 186 |
-
def _expand(counter: Counter) -> list[str]:
|
| 187 |
-
out: list[str] = []
|
| 188 |
-
for tok, count in counter.items():
|
| 189 |
-
origs = ref_orig_by_match.get(tok, [tok])
|
| 190 |
-
# Ne renvoie que la casse représentative GT
|
| 191 |
-
display = origs[0] if origs else tok
|
| 192 |
-
out.extend([display] * count)
|
| 193 |
-
return out
|
| 194 |
-
|
| 195 |
-
return {
|
| 196 |
-
"n_gt_tokens": len(ref_tokens),
|
| 197 |
-
"n_errors_before": n_errors_before,
|
| 198 |
-
"n_errors_after": n_errors_after,
|
| 199 |
-
"n_corrected": n_corrected,
|
| 200 |
-
"n_introduced": n_introduced,
|
| 201 |
-
"n_kept_wrong": n_kept_wrong,
|
| 202 |
-
"correction_rate": correction_rate,
|
| 203 |
-
"introduction_rate": introduction_rate,
|
| 204 |
-
"net_improvement": n_corrected - n_introduced,
|
| 205 |
-
"corrected_tokens": _expand(corrected_counter),
|
| 206 |
-
"introduced_tokens": _expand(introduced_counter),
|
| 207 |
-
}
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
def aggregate_error_absorption(
|
| 211 |
-
per_doc: Iterable[Optional[dict]],
|
| 212 |
-
*,
|
| 213 |
-
sample_tokens: int = 50,
|
| 214 |
-
) -> Optional[dict]:
|
| 215 |
-
"""Agrège les compteurs corpus-wide et recalcule les taux
|
| 216 |
-
*micro*.
|
| 217 |
-
|
| 218 |
-
Parameters
|
| 219 |
-
----------
|
| 220 |
-
per_doc:
|
| 221 |
-
Itérable de sorties de ``compute_error_absorption`` (ou
|
| 222 |
-
``None`` pour les docs sans GT).
|
| 223 |
-
sample_tokens:
|
| 224 |
-
Nombre maximal de tokens corrigés/introduits gardés dans
|
| 225 |
-
l'échantillon (cap pour ne pas exploser le JSON).
|
| 226 |
-
|
| 227 |
-
Returns
|
| 228 |
-
-------
|
| 229 |
-
dict | None
|
| 230 |
-
``None`` si aucune entry valide.
|
| 231 |
-
"""
|
| 232 |
-
docs = [d for d in per_doc if d]
|
| 233 |
-
if not docs:
|
| 234 |
-
return None
|
| 235 |
-
n_gt = sum(int(d.get("n_gt_tokens") or 0) for d in docs)
|
| 236 |
-
n_errors_before = sum(int(d.get("n_errors_before") or 0) for d in docs)
|
| 237 |
-
n_errors_after = sum(int(d.get("n_errors_after") or 0) for d in docs)
|
| 238 |
-
n_corrected = sum(int(d.get("n_corrected") or 0) for d in docs)
|
| 239 |
-
n_introduced = sum(int(d.get("n_introduced") or 0) for d in docs)
|
| 240 |
-
n_kept_wrong = sum(int(d.get("n_kept_wrong") or 0) for d in docs)
|
| 241 |
-
correction_rate = (
|
| 242 |
-
n_corrected / n_errors_before if n_errors_before > 0 else None
|
| 243 |
-
)
|
| 244 |
-
introduction_rate = (
|
| 245 |
-
n_introduced / n_errors_after if n_errors_after > 0 else None
|
| 246 |
-
)
|
| 247 |
-
corrected_sample: list[str] = []
|
| 248 |
-
introduced_sample: list[str] = []
|
| 249 |
-
for d in docs:
|
| 250 |
-
corrected_sample.extend(d.get("corrected_tokens") or [])
|
| 251 |
-
introduced_sample.extend(d.get("introduced_tokens") or [])
|
| 252 |
-
if (
|
| 253 |
-
len(corrected_sample) >= sample_tokens
|
| 254 |
-
and len(introduced_sample) >= sample_tokens
|
| 255 |
-
):
|
| 256 |
-
break
|
| 257 |
-
return {
|
| 258 |
-
"n_docs": len(docs),
|
| 259 |
-
"n_gt_tokens": n_gt,
|
| 260 |
-
"n_errors_before": n_errors_before,
|
| 261 |
-
"n_errors_after": n_errors_after,
|
| 262 |
-
"n_corrected": n_corrected,
|
| 263 |
-
"n_introduced": n_introduced,
|
| 264 |
-
"n_kept_wrong": n_kept_wrong,
|
| 265 |
-
"correction_rate": correction_rate,
|
| 266 |
-
"introduction_rate": introduction_rate,
|
| 267 |
-
"net_improvement": n_corrected - n_introduced,
|
| 268 |
-
"corrected_tokens_sample": corrected_sample[:sample_tokens],
|
| 269 |
-
"introduced_tokens_sample": introduced_sample[:sample_tokens],
|
| 270 |
-
}
|
| 271 |
-
|
| 272 |
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
"
|
| 276 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.error_absorption`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.error_absorption import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.error_absorption import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.error_absorption as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,331 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
- Blocs hallucinés : segments continus de la sortie sans correspondance GT au-delà d'un seuil
|
| 9 |
-
- Badge hallucination : True si ancrage faible ou ratio de longueur anormal
|
| 10 |
-
"""
|
| 11 |
-
|
| 12 |
-
from __future__ import annotations
|
| 13 |
-
|
| 14 |
-
import re
|
| 15 |
-
from dataclasses import dataclass
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
# ---------------------------------------------------------------------------
|
| 19 |
-
# Helpers texte
|
| 20 |
-
# ---------------------------------------------------------------------------
|
| 21 |
-
|
| 22 |
-
def _tokenize(text: str) -> list[str]:
|
| 23 |
-
"""Découpe en mots (minuscules, sans ponctuation)."""
|
| 24 |
-
return re.findall(r"[^\s]+", text.lower())
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
def _ngrams(tokens: list[str], n: int) -> list[tuple[str, ...]]:
|
| 28 |
-
"""Génère les n-grammes d'une liste de tokens."""
|
| 29 |
-
if len(tokens) < n:
|
| 30 |
-
return [tuple(tokens)] if tokens else []
|
| 31 |
-
return [tuple(tokens[i:i + n]) for i in range(len(tokens) - n + 1)]
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
# ---------------------------------------------------------------------------
|
| 35 |
-
# Blocs hallucinés (segments continus sans ancrage)
|
| 36 |
-
# ---------------------------------------------------------------------------
|
| 37 |
-
|
| 38 |
-
@dataclass
|
| 39 |
-
class HallucinatedBlock:
|
| 40 |
-
"""Segment continu de la sortie sans correspondance dans le GT."""
|
| 41 |
-
start_token: int
|
| 42 |
-
end_token: int
|
| 43 |
-
text: str
|
| 44 |
-
length: int # nombre de tokens
|
| 45 |
-
|
| 46 |
-
def as_dict(self) -> dict:
|
| 47 |
-
return {
|
| 48 |
-
"start_token": self.start_token,
|
| 49 |
-
"end_token": self.end_token,
|
| 50 |
-
"text": self.text,
|
| 51 |
-
"length": self.length,
|
| 52 |
-
}
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
def _detect_hallucinated_blocks(
|
| 56 |
-
hyp_tokens: list[str],
|
| 57 |
-
gt_token_set: set[str],
|
| 58 |
-
tolerance: int = 3,
|
| 59 |
-
min_block_length: int = 4,
|
| 60 |
-
) -> list[HallucinatedBlock]:
|
| 61 |
-
"""Détecte les blocs de tokens hypothèse sans correspondance dans le GT.
|
| 62 |
-
|
| 63 |
-
Un bloc est un segment contigu de tokens hypothèse dont aucun n'est présent
|
| 64 |
-
dans le vocabulaire GT. Une tolérance de ``tolerance`` tokens connus interrompus
|
| 65 |
-
est acceptée avant de clore un bloc.
|
| 66 |
-
|
| 67 |
-
Parameters
|
| 68 |
-
----------
|
| 69 |
-
hyp_tokens:
|
| 70 |
-
Tokens de la sortie OCR/VLM.
|
| 71 |
-
gt_token_set:
|
| 72 |
-
Ensemble des tokens du GT (pour recherche O(1)).
|
| 73 |
-
tolerance:
|
| 74 |
-
Nombre de tokens connus consécutifs interrompant un bloc avant de le clore.
|
| 75 |
-
min_block_length:
|
| 76 |
-
Longueur minimale (tokens) pour qu'un bloc soit signalé.
|
| 77 |
-
|
| 78 |
-
Returns
|
| 79 |
-
-------
|
| 80 |
-
list[HallucinatedBlock]
|
| 81 |
-
"""
|
| 82 |
-
blocks: list[HallucinatedBlock] = []
|
| 83 |
-
if not hyp_tokens:
|
| 84 |
-
return blocks
|
| 85 |
-
|
| 86 |
-
in_block = False
|
| 87 |
-
block_start = 0
|
| 88 |
-
consecutive_known = 0
|
| 89 |
-
|
| 90 |
-
for i, tok in enumerate(hyp_tokens):
|
| 91 |
-
is_unknown = tok not in gt_token_set
|
| 92 |
-
if is_unknown:
|
| 93 |
-
if not in_block:
|
| 94 |
-
in_block = True
|
| 95 |
-
block_start = i
|
| 96 |
-
consecutive_known = 0
|
| 97 |
-
else:
|
| 98 |
-
consecutive_known = 0
|
| 99 |
-
else:
|
| 100 |
-
if in_block:
|
| 101 |
-
consecutive_known += 1
|
| 102 |
-
if consecutive_known >= tolerance:
|
| 103 |
-
# Clore le bloc
|
| 104 |
-
end = i - consecutive_known
|
| 105 |
-
length = end - block_start + 1
|
| 106 |
-
if length >= min_block_length:
|
| 107 |
-
text = " ".join(hyp_tokens[block_start:end + 1])
|
| 108 |
-
blocks.append(HallucinatedBlock(
|
| 109 |
-
start_token=block_start,
|
| 110 |
-
end_token=end,
|
| 111 |
-
text=text,
|
| 112 |
-
length=length,
|
| 113 |
-
))
|
| 114 |
-
in_block = False
|
| 115 |
-
consecutive_known = 0
|
| 116 |
-
|
| 117 |
-
# Bloc non terminé
|
| 118 |
-
if in_block:
|
| 119 |
-
end = len(hyp_tokens) - 1
|
| 120 |
-
length = end - block_start + 1
|
| 121 |
-
if length >= min_block_length:
|
| 122 |
-
text = " ".join(hyp_tokens[block_start:end + 1])
|
| 123 |
-
blocks.append(HallucinatedBlock(
|
| 124 |
-
start_token=block_start,
|
| 125 |
-
end_token=end,
|
| 126 |
-
text=text,
|
| 127 |
-
length=length,
|
| 128 |
-
))
|
| 129 |
-
|
| 130 |
-
return blocks
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
# ---------------------------------------------------------------------------
|
| 134 |
-
# Résultat structuré
|
| 135 |
-
# ---------------------------------------------------------------------------
|
| 136 |
-
|
| 137 |
-
@dataclass
|
| 138 |
-
class HallucinationMetrics:
|
| 139 |
-
"""Métriques de détection des hallucinations pour une paire (GT, hypothèse)."""
|
| 140 |
-
|
| 141 |
-
net_insertion_rate: float
|
| 142 |
-
"""Taux d'insertion nette : tokens hypothèse absents du GT / total tokens hypothèse."""
|
| 143 |
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
"""Score d'ancrage : proportion des trigrammes hypothèse présents dans les trigrammes GT.
|
| 149 |
-
Score élevé → l'hypothèse s'ancre bien dans le GT. Score faible → hallucinations probables."""
|
| 150 |
-
|
| 151 |
-
hallucinated_blocks: list[HallucinatedBlock]
|
| 152 |
-
"""Segments continus de la sortie sans correspondance GT (au-dessus du seuil de tolérance)."""
|
| 153 |
-
|
| 154 |
-
is_hallucinating: bool
|
| 155 |
-
"""True si anchor_score < anchor_threshold OU length_ratio > length_ratio_threshold."""
|
| 156 |
-
|
| 157 |
-
# Détails supplémentaires
|
| 158 |
-
gt_word_count: int = 0
|
| 159 |
-
hyp_word_count: int = 0
|
| 160 |
-
net_inserted_words: int = 0
|
| 161 |
-
anchor_threshold_used: float = 0.5
|
| 162 |
-
length_ratio_threshold_used: float = 1.2
|
| 163 |
-
ngram_size_used: int = 3
|
| 164 |
-
|
| 165 |
-
def as_dict(self) -> dict:
|
| 166 |
-
return {
|
| 167 |
-
"net_insertion_rate": round(self.net_insertion_rate, 6),
|
| 168 |
-
"length_ratio": round(self.length_ratio, 6),
|
| 169 |
-
"anchor_score": round(self.anchor_score, 6),
|
| 170 |
-
"hallucinated_blocks": [b.as_dict() for b in self.hallucinated_blocks],
|
| 171 |
-
"is_hallucinating": self.is_hallucinating,
|
| 172 |
-
"gt_word_count": self.gt_word_count,
|
| 173 |
-
"hyp_word_count": self.hyp_word_count,
|
| 174 |
-
"net_inserted_words": self.net_inserted_words,
|
| 175 |
-
"anchor_threshold_used": self.anchor_threshold_used,
|
| 176 |
-
"length_ratio_threshold_used": self.length_ratio_threshold_used,
|
| 177 |
-
"ngram_size_used": self.ngram_size_used,
|
| 178 |
-
}
|
| 179 |
-
|
| 180 |
-
@classmethod
|
| 181 |
-
def from_dict(cls, d: dict) -> "HallucinationMetrics":
|
| 182 |
-
blocks = [
|
| 183 |
-
HallucinatedBlock(**b) for b in d.get("hallucinated_blocks", [])
|
| 184 |
-
]
|
| 185 |
-
return cls(
|
| 186 |
-
net_insertion_rate=d.get("net_insertion_rate", 0.0),
|
| 187 |
-
length_ratio=d.get("length_ratio", 1.0),
|
| 188 |
-
anchor_score=d.get("anchor_score", 1.0),
|
| 189 |
-
hallucinated_blocks=blocks,
|
| 190 |
-
is_hallucinating=d.get("is_hallucinating", False),
|
| 191 |
-
gt_word_count=d.get("gt_word_count", 0),
|
| 192 |
-
hyp_word_count=d.get("hyp_word_count", 0),
|
| 193 |
-
net_inserted_words=d.get("net_inserted_words", 0),
|
| 194 |
-
anchor_threshold_used=d.get("anchor_threshold_used", 0.5),
|
| 195 |
-
length_ratio_threshold_used=d.get("length_ratio_threshold_used", 1.2),
|
| 196 |
-
ngram_size_used=d.get("ngram_size_used", 3),
|
| 197 |
-
)
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
# ---------------------------------------------------------------------------
|
| 201 |
-
# Calcul principal
|
| 202 |
-
# ---------------------------------------------------------------------------
|
| 203 |
-
|
| 204 |
-
def compute_hallucination_metrics(
|
| 205 |
-
reference: str,
|
| 206 |
-
hypothesis: str,
|
| 207 |
-
n: int = 3,
|
| 208 |
-
length_ratio_threshold: float = 1.2,
|
| 209 |
-
anchor_threshold: float = 0.5,
|
| 210 |
-
block_tolerance: int = 3,
|
| 211 |
-
min_block_length: int = 4,
|
| 212 |
-
) -> HallucinationMetrics:
|
| 213 |
-
"""Calcule les métriques de détection des hallucinations VLM/LLM.
|
| 214 |
-
|
| 215 |
-
Parameters
|
| 216 |
-
----------
|
| 217 |
-
reference:
|
| 218 |
-
Texte de vérité terrain (GT).
|
| 219 |
-
hypothesis:
|
| 220 |
-
Texte produit par le modèle.
|
| 221 |
-
n:
|
| 222 |
-
Taille des n-grammes pour le score d'ancrage (défaut : trigrammes).
|
| 223 |
-
length_ratio_threshold:
|
| 224 |
-
Seuil de ratio de longueur au-dessus duquel on signale une hallucination potentielle.
|
| 225 |
-
anchor_threshold:
|
| 226 |
-
Seuil de score d'ancrage en dessous duquel on signale une hallucination potentielle.
|
| 227 |
-
block_tolerance:
|
| 228 |
-
Nombre de tokens connus consécutifs acceptés dans un bloc halluciné.
|
| 229 |
-
min_block_length:
|
| 230 |
-
Longueur minimale (tokens) pour signaler un bloc halluciné.
|
| 231 |
-
|
| 232 |
-
Returns
|
| 233 |
-
-------
|
| 234 |
-
HallucinationMetrics
|
| 235 |
-
"""
|
| 236 |
-
gt_tokens = _tokenize(reference)
|
| 237 |
-
hyp_tokens = _tokenize(hypothesis)
|
| 238 |
-
|
| 239 |
-
gt_len_chars = len(reference.strip())
|
| 240 |
-
hyp_len_chars = len(hypothesis.strip())
|
| 241 |
-
|
| 242 |
-
# ── Ratio de longueur ────────────────────────────────────────────────
|
| 243 |
-
if gt_len_chars == 0:
|
| 244 |
-
length_ratio = 1.0 if hyp_len_chars == 0 else float("inf")
|
| 245 |
-
else:
|
| 246 |
-
length_ratio = hyp_len_chars / gt_len_chars
|
| 247 |
-
|
| 248 |
-
# ── Taux d'insertion nette ───────────────────────────────────────────
|
| 249 |
-
gt_token_set = set(gt_tokens)
|
| 250 |
-
hyp_token_count = len(hyp_tokens)
|
| 251 |
-
|
| 252 |
-
if hyp_token_count == 0:
|
| 253 |
-
net_insertion_rate = 0.0
|
| 254 |
-
net_inserted_words = 0
|
| 255 |
-
else:
|
| 256 |
-
net_inserted = [t for t in hyp_tokens if t not in gt_token_set]
|
| 257 |
-
net_inserted_words = len(net_inserted)
|
| 258 |
-
net_insertion_rate = net_inserted_words / hyp_token_count
|
| 259 |
-
|
| 260 |
-
# ── Score d'ancrage (n-grammes) ────────────────────���─────────────────
|
| 261 |
-
gt_ngrams = set(_ngrams(gt_tokens, n))
|
| 262 |
-
hyp_ngrams = _ngrams(hyp_tokens, n)
|
| 263 |
-
|
| 264 |
-
if not hyp_ngrams:
|
| 265 |
-
# Pas de n-grammes dans l'hypothèse → ancrage parfait (hypothèse vide ou trop courte)
|
| 266 |
-
anchor_score = 1.0 if not gt_ngrams else 0.0
|
| 267 |
-
elif not gt_ngrams:
|
| 268 |
-
anchor_score = 0.0
|
| 269 |
-
else:
|
| 270 |
-
anchored = sum(1 for ng in hyp_ngrams if ng in gt_ngrams)
|
| 271 |
-
anchor_score = anchored / len(hyp_ngrams)
|
| 272 |
-
|
| 273 |
-
# ── Blocs hallucinés ─────────────────────────────────────────────────
|
| 274 |
-
blocks = _detect_hallucinated_blocks(
|
| 275 |
-
hyp_tokens=hyp_tokens,
|
| 276 |
-
gt_token_set=gt_token_set,
|
| 277 |
-
tolerance=block_tolerance,
|
| 278 |
-
min_block_length=min_block_length,
|
| 279 |
-
)
|
| 280 |
-
|
| 281 |
-
# ── Badge hallucination ──────────────────────────────────────────────
|
| 282 |
-
is_hallucinating = (
|
| 283 |
-
anchor_score < anchor_threshold
|
| 284 |
-
or length_ratio > length_ratio_threshold
|
| 285 |
-
)
|
| 286 |
-
|
| 287 |
-
return HallucinationMetrics(
|
| 288 |
-
net_insertion_rate=net_insertion_rate,
|
| 289 |
-
length_ratio=min(length_ratio, 9.99), # plafonner pour la sérialisation
|
| 290 |
-
anchor_score=anchor_score,
|
| 291 |
-
hallucinated_blocks=blocks,
|
| 292 |
-
is_hallucinating=is_hallucinating,
|
| 293 |
-
gt_word_count=len(gt_tokens),
|
| 294 |
-
hyp_word_count=hyp_token_count,
|
| 295 |
-
net_inserted_words=net_inserted_words,
|
| 296 |
-
anchor_threshold_used=anchor_threshold,
|
| 297 |
-
length_ratio_threshold_used=length_ratio_threshold,
|
| 298 |
-
ngram_size_used=n,
|
| 299 |
-
)
|
| 300 |
-
|
| 301 |
-
|
| 302 |
-
# ---------------------------------------------------------------------------
|
| 303 |
-
# Agrégation sur un corpus
|
| 304 |
-
# ---------------------------------------------------------------------------
|
| 305 |
-
|
| 306 |
-
def aggregate_hallucination_metrics(results: list[HallucinationMetrics]) -> dict:
|
| 307 |
-
"""Agrège les métriques d'hallucination sur un corpus.
|
| 308 |
-
|
| 309 |
-
Returns
|
| 310 |
-
-------
|
| 311 |
-
dict
|
| 312 |
-
Statistiques agrégées : anchor_score moyen, taux de documents hallucinés…
|
| 313 |
-
"""
|
| 314 |
-
if not results:
|
| 315 |
-
return {}
|
| 316 |
|
| 317 |
-
|
| 318 |
-
anchor_values = [r.anchor_score for r in results]
|
| 319 |
-
ratio_values = [r.length_ratio for r in results]
|
| 320 |
-
insertion_values = [r.net_insertion_rate for r in results]
|
| 321 |
-
hallucinating_count = sum(1 for r in results if r.is_hallucinating)
|
| 322 |
|
| 323 |
-
|
| 324 |
-
|
| 325 |
-
|
| 326 |
-
|
| 327 |
-
"net_insertion_rate_mean": round(sum(insertion_values) / n, 6),
|
| 328 |
-
"hallucinating_doc_count": hallucinating_count,
|
| 329 |
-
"hallucinating_doc_rate": round(hallucinating_count / n, 6),
|
| 330 |
-
"document_count": n,
|
| 331 |
-
}
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.hallucination`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.hallucination import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
| 12 |
+
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
+
from picarones.measurements.hallucination import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.hallucination as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,615 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
- La détection de régression compare le dernier run à une baseline configurable.
|
| 9 |
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
run_id TEXT PRIMARY KEY — UUID ou hash du run
|
| 14 |
-
timestamp TEXT — ISO 8601
|
| 15 |
-
corpus_name TEXT
|
| 16 |
-
engine_name TEXT
|
| 17 |
-
cer_mean REAL
|
| 18 |
-
wer_mean REAL
|
| 19 |
-
doc_count INTEGER
|
| 20 |
-
metadata TEXT — JSON
|
| 21 |
-
|
| 22 |
-
Usage
|
| 23 |
-
-----
|
| 24 |
-
>>> from picarones.core.history import BenchmarkHistory
|
| 25 |
-
>>> history = BenchmarkHistory("~/.picarones/history.db")
|
| 26 |
-
>>> history.record(benchmark_result)
|
| 27 |
-
>>> df = history.query(engine="tesseract", corpus="chroniques")
|
| 28 |
-
>>> regression = history.detect_regression(engine="tesseract", threshold=0.02)
|
| 29 |
"""
|
| 30 |
|
| 31 |
-
from
|
| 32 |
-
|
| 33 |
-
import json
|
| 34 |
-
import logging
|
| 35 |
-
import sqlite3
|
| 36 |
-
import uuid
|
| 37 |
-
from dataclasses import dataclass, field
|
| 38 |
-
from datetime import datetime, timezone
|
| 39 |
-
from pathlib import Path
|
| 40 |
-
from typing import TYPE_CHECKING, Optional
|
| 41 |
-
|
| 42 |
-
if TYPE_CHECKING:
|
| 43 |
-
from picarones.core.results import BenchmarkResult
|
| 44 |
-
|
| 45 |
-
logger = logging.getLogger(__name__)
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
# ---------------------------------------------------------------------------
|
| 49 |
-
# Structures de données
|
| 50 |
-
# ---------------------------------------------------------------------------
|
| 51 |
-
|
| 52 |
-
@dataclass
|
| 53 |
-
class HistoryEntry:
|
| 54 |
-
"""Un enregistrement dans l'historique des benchmarks."""
|
| 55 |
-
run_id: str
|
| 56 |
-
timestamp: str
|
| 57 |
-
corpus_name: str
|
| 58 |
-
engine_name: str
|
| 59 |
-
cer_mean: Optional[float]
|
| 60 |
-
wer_mean: Optional[float]
|
| 61 |
-
doc_count: int
|
| 62 |
-
metadata: dict = field(default_factory=dict)
|
| 63 |
-
|
| 64 |
-
@property
|
| 65 |
-
def cer_percent(self) -> Optional[float]:
|
| 66 |
-
return self.cer_mean * 100 if self.cer_mean is not None else None
|
| 67 |
-
|
| 68 |
-
def as_dict(self) -> dict:
|
| 69 |
-
return {
|
| 70 |
-
"run_id": self.run_id,
|
| 71 |
-
"timestamp": self.timestamp,
|
| 72 |
-
"corpus_name": self.corpus_name,
|
| 73 |
-
"engine_name": self.engine_name,
|
| 74 |
-
"cer_mean": self.cer_mean,
|
| 75 |
-
"wer_mean": self.wer_mean,
|
| 76 |
-
"doc_count": self.doc_count,
|
| 77 |
-
"metadata": self.metadata,
|
| 78 |
-
}
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
@dataclass
|
| 82 |
-
class RegressionResult:
|
| 83 |
-
"""Résultat d'une détection de régression."""
|
| 84 |
-
engine_name: str
|
| 85 |
-
corpus_name: str
|
| 86 |
-
baseline_run_id: str
|
| 87 |
-
baseline_timestamp: str
|
| 88 |
-
baseline_cer: Optional[float]
|
| 89 |
-
current_run_id: str
|
| 90 |
-
current_timestamp: str
|
| 91 |
-
current_cer: Optional[float]
|
| 92 |
-
delta_cer: Optional[float]
|
| 93 |
-
"""Delta CER (current - baseline). Positif = régression."""
|
| 94 |
-
is_regression: bool
|
| 95 |
-
threshold: float
|
| 96 |
-
|
| 97 |
-
def as_dict(self) -> dict:
|
| 98 |
-
return {
|
| 99 |
-
"engine_name": self.engine_name,
|
| 100 |
-
"corpus_name": self.corpus_name,
|
| 101 |
-
"baseline_run_id": self.baseline_run_id,
|
| 102 |
-
"baseline_timestamp": self.baseline_timestamp,
|
| 103 |
-
"baseline_cer": self.baseline_cer,
|
| 104 |
-
"current_run_id": self.current_run_id,
|
| 105 |
-
"current_timestamp": self.current_timestamp,
|
| 106 |
-
"current_cer": self.current_cer,
|
| 107 |
-
"delta_cer": self.delta_cer,
|
| 108 |
-
"is_regression": self.is_regression,
|
| 109 |
-
"threshold": self.threshold,
|
| 110 |
-
}
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
# ---------------------------------------------------------------------------
|
| 114 |
-
# BenchmarkHistory
|
| 115 |
-
# ---------------------------------------------------------------------------
|
| 116 |
-
|
| 117 |
-
class BenchmarkHistory:
|
| 118 |
-
"""Gestionnaire de l'historique des benchmarks dans SQLite.
|
| 119 |
-
|
| 120 |
-
Parameters
|
| 121 |
-
----------
|
| 122 |
-
db_path:
|
| 123 |
-
Chemin vers le fichier SQLite. Utiliser ``":memory:"`` pour les tests.
|
| 124 |
-
|
| 125 |
-
Examples
|
| 126 |
-
--------
|
| 127 |
-
>>> history = BenchmarkHistory("~/.picarones/history.db")
|
| 128 |
-
>>> history.record(benchmark)
|
| 129 |
-
>>> entries = history.query(engine="tesseract")
|
| 130 |
-
>>> for e in entries:
|
| 131 |
-
... print(e.timestamp, f"CER={e.cer_percent:.2f}%")
|
| 132 |
-
"""
|
| 133 |
-
|
| 134 |
-
_CREATE_TABLE = """
|
| 135 |
-
CREATE TABLE IF NOT EXISTS runs (
|
| 136 |
-
run_id TEXT PRIMARY KEY,
|
| 137 |
-
timestamp TEXT NOT NULL,
|
| 138 |
-
corpus_name TEXT NOT NULL,
|
| 139 |
-
engine_name TEXT NOT NULL,
|
| 140 |
-
cer_mean REAL,
|
| 141 |
-
wer_mean REAL,
|
| 142 |
-
doc_count INTEGER,
|
| 143 |
-
metadata TEXT
|
| 144 |
-
);
|
| 145 |
-
CREATE INDEX IF NOT EXISTS idx_engine ON runs (engine_name);
|
| 146 |
-
CREATE INDEX IF NOT EXISTS idx_corpus ON runs (corpus_name);
|
| 147 |
-
CREATE INDEX IF NOT EXISTS idx_timestamp ON runs (timestamp);
|
| 148 |
-
"""
|
| 149 |
-
|
| 150 |
-
def __init__(self, db_path: str = "~/.picarones/history.db") -> None:
|
| 151 |
-
if db_path != ":memory:":
|
| 152 |
-
path = Path(db_path).expanduser()
|
| 153 |
-
path.parent.mkdir(parents=True, exist_ok=True)
|
| 154 |
-
self.db_path = str(path)
|
| 155 |
-
else:
|
| 156 |
-
self.db_path = ":memory:"
|
| 157 |
-
self._conn: Optional[sqlite3.Connection] = None
|
| 158 |
-
self._init_db()
|
| 159 |
-
|
| 160 |
-
def _connect(self) -> sqlite3.Connection:
|
| 161 |
-
if self._conn is None:
|
| 162 |
-
self._conn = sqlite3.connect(self.db_path)
|
| 163 |
-
self._conn.row_factory = sqlite3.Row
|
| 164 |
-
return self._conn
|
| 165 |
-
|
| 166 |
-
def _init_db(self) -> None:
|
| 167 |
-
conn = self._connect()
|
| 168 |
-
conn.executescript(self._CREATE_TABLE)
|
| 169 |
-
conn.commit()
|
| 170 |
-
|
| 171 |
-
def close(self) -> None:
|
| 172 |
-
"""Ferme la connexion SQLite."""
|
| 173 |
-
if self._conn:
|
| 174 |
-
self._conn.close()
|
| 175 |
-
self._conn = None
|
| 176 |
-
|
| 177 |
-
# ------------------------------------------------------------------
|
| 178 |
-
# Enregistrement
|
| 179 |
-
# ------------------------------------------------------------------
|
| 180 |
-
|
| 181 |
-
def record(
|
| 182 |
-
self,
|
| 183 |
-
benchmark_result: "BenchmarkResult",
|
| 184 |
-
run_id: Optional[str] = None,
|
| 185 |
-
extra_metadata: Optional[dict] = None,
|
| 186 |
-
) -> str:
|
| 187 |
-
"""Enregistre les résultats d'un benchmark dans l'historique.
|
| 188 |
-
|
| 189 |
-
Parameters
|
| 190 |
-
----------
|
| 191 |
-
benchmark_result:
|
| 192 |
-
Résultats à enregistrer (``BenchmarkResult``).
|
| 193 |
-
run_id:
|
| 194 |
-
Identifiant du run (auto-généré si None).
|
| 195 |
-
extra_metadata:
|
| 196 |
-
Métadonnées supplémentaires à stocker.
|
| 197 |
-
|
| 198 |
-
Returns
|
| 199 |
-
-------
|
| 200 |
-
str
|
| 201 |
-
L'identifiant du run enregistré.
|
| 202 |
-
"""
|
| 203 |
-
if run_id is None:
|
| 204 |
-
run_id = str(uuid.uuid4())
|
| 205 |
-
|
| 206 |
-
timestamp = datetime.now(timezone.utc).isoformat()
|
| 207 |
-
conn = self._connect()
|
| 208 |
-
|
| 209 |
-
for report in benchmark_result.engine_reports:
|
| 210 |
-
ranking = benchmark_result.ranking()
|
| 211 |
-
engine_entry = next(
|
| 212 |
-
(r for r in ranking if r["engine"] == report.engine_name),
|
| 213 |
-
None,
|
| 214 |
-
)
|
| 215 |
-
cer_mean = engine_entry["mean_cer"] if engine_entry else None
|
| 216 |
-
wer_mean = engine_entry["mean_wer"] if engine_entry else None
|
| 217 |
-
|
| 218 |
-
meta = {
|
| 219 |
-
"engine_version": report.engine_version,
|
| 220 |
-
"engine_config": report.engine_config,
|
| 221 |
-
"picarones_version": benchmark_result.metadata.get("picarones_version", ""),
|
| 222 |
-
**(extra_metadata or {}),
|
| 223 |
-
}
|
| 224 |
-
|
| 225 |
-
conn.execute(
|
| 226 |
-
"""
|
| 227 |
-
INSERT OR REPLACE INTO runs
|
| 228 |
-
(run_id, timestamp, corpus_name, engine_name,
|
| 229 |
-
cer_mean, wer_mean, doc_count, metadata)
|
| 230 |
-
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
| 231 |
-
""",
|
| 232 |
-
(
|
| 233 |
-
f"{run_id}_{report.engine_name}",
|
| 234 |
-
timestamp,
|
| 235 |
-
benchmark_result.corpus_name,
|
| 236 |
-
report.engine_name,
|
| 237 |
-
cer_mean,
|
| 238 |
-
wer_mean,
|
| 239 |
-
benchmark_result.document_count,
|
| 240 |
-
json.dumps(meta, ensure_ascii=False),
|
| 241 |
-
),
|
| 242 |
-
)
|
| 243 |
-
|
| 244 |
-
conn.commit()
|
| 245 |
-
logger.info("Benchmark enregistré dans l'historique : run_id=%s", run_id)
|
| 246 |
-
return run_id
|
| 247 |
-
|
| 248 |
-
def record_single(
|
| 249 |
-
self,
|
| 250 |
-
run_id: str,
|
| 251 |
-
corpus_name: str,
|
| 252 |
-
engine_name: str,
|
| 253 |
-
cer_mean: Optional[float],
|
| 254 |
-
wer_mean: Optional[float],
|
| 255 |
-
doc_count: int,
|
| 256 |
-
timestamp: Optional[str] = None,
|
| 257 |
-
metadata: Optional[dict] = None,
|
| 258 |
-
) -> str:
|
| 259 |
-
"""Enregistre manuellement une entrée dans l'historique.
|
| 260 |
-
|
| 261 |
-
Utile pour les tests, les imports de données externes, ou pour
|
| 262 |
-
enregistrer des résultats calculés en dehors de Picarones.
|
| 263 |
-
|
| 264 |
-
Returns
|
| 265 |
-
-------
|
| 266 |
-
str
|
| 267 |
-
run_id enregistré.
|
| 268 |
-
"""
|
| 269 |
-
if timestamp is None:
|
| 270 |
-
timestamp = datetime.now(timezone.utc).isoformat()
|
| 271 |
-
|
| 272 |
-
conn = self._connect()
|
| 273 |
-
conn.execute(
|
| 274 |
-
"""
|
| 275 |
-
INSERT OR REPLACE INTO runs
|
| 276 |
-
(run_id, timestamp, corpus_name, engine_name,
|
| 277 |
-
cer_mean, wer_mean, doc_count, metadata)
|
| 278 |
-
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
| 279 |
-
""",
|
| 280 |
-
(
|
| 281 |
-
run_id,
|
| 282 |
-
timestamp,
|
| 283 |
-
corpus_name,
|
| 284 |
-
engine_name,
|
| 285 |
-
cer_mean,
|
| 286 |
-
wer_mean,
|
| 287 |
-
doc_count,
|
| 288 |
-
json.dumps(metadata or {}, ensure_ascii=False),
|
| 289 |
-
),
|
| 290 |
-
)
|
| 291 |
-
conn.commit()
|
| 292 |
-
return run_id
|
| 293 |
-
|
| 294 |
-
# ------------------------------------------------------------------
|
| 295 |
-
# Requêtes
|
| 296 |
-
# ------------------------------------------------------------------
|
| 297 |
-
|
| 298 |
-
def query(
|
| 299 |
-
self,
|
| 300 |
-
engine: Optional[str] = None,
|
| 301 |
-
corpus: Optional[str] = None,
|
| 302 |
-
since: Optional[str] = None,
|
| 303 |
-
limit: int = 100,
|
| 304 |
-
) -> list[HistoryEntry]:
|
| 305 |
-
"""Retourne l'historique des runs, avec filtres optionnels.
|
| 306 |
-
|
| 307 |
-
Parameters
|
| 308 |
-
----------
|
| 309 |
-
engine:
|
| 310 |
-
Filtre sur le nom du moteur.
|
| 311 |
-
corpus:
|
| 312 |
-
Filtre sur le nom du corpus.
|
| 313 |
-
since:
|
| 314 |
-
Date ISO 8601 minimale (``"2025-01-01"``).
|
| 315 |
-
limit:
|
| 316 |
-
Nombre maximum d'entrées retournées.
|
| 317 |
-
|
| 318 |
-
Returns
|
| 319 |
-
-------
|
| 320 |
-
list[HistoryEntry]
|
| 321 |
-
Entrées triées par timestamp croissant.
|
| 322 |
-
"""
|
| 323 |
-
clauses: list[str] = []
|
| 324 |
-
params: list = []
|
| 325 |
-
|
| 326 |
-
if engine:
|
| 327 |
-
clauses.append("engine_name = ?")
|
| 328 |
-
params.append(engine)
|
| 329 |
-
if corpus:
|
| 330 |
-
clauses.append("corpus_name = ?")
|
| 331 |
-
params.append(corpus)
|
| 332 |
-
if since:
|
| 333 |
-
clauses.append("timestamp >= ?")
|
| 334 |
-
params.append(since)
|
| 335 |
-
|
| 336 |
-
where = f"WHERE {' AND '.join(clauses)}" if clauses else ""
|
| 337 |
-
params.append(limit)
|
| 338 |
-
|
| 339 |
-
conn = self._connect()
|
| 340 |
-
rows = conn.execute(
|
| 341 |
-
f"SELECT * FROM runs {where} ORDER BY timestamp ASC LIMIT ?",
|
| 342 |
-
params,
|
| 343 |
-
).fetchall()
|
| 344 |
-
|
| 345 |
-
return [
|
| 346 |
-
HistoryEntry(
|
| 347 |
-
run_id=row["run_id"],
|
| 348 |
-
timestamp=row["timestamp"],
|
| 349 |
-
corpus_name=row["corpus_name"],
|
| 350 |
-
engine_name=row["engine_name"],
|
| 351 |
-
cer_mean=row["cer_mean"],
|
| 352 |
-
wer_mean=row["wer_mean"],
|
| 353 |
-
doc_count=row["doc_count"],
|
| 354 |
-
metadata=json.loads(row["metadata"] or "{}"),
|
| 355 |
-
)
|
| 356 |
-
for row in rows
|
| 357 |
-
]
|
| 358 |
-
|
| 359 |
-
def list_engines(self) -> list[str]:
|
| 360 |
-
"""Retourne la liste des moteurs présents dans l'historique."""
|
| 361 |
-
conn = self._connect()
|
| 362 |
-
rows = conn.execute(
|
| 363 |
-
"SELECT DISTINCT engine_name FROM runs ORDER BY engine_name"
|
| 364 |
-
).fetchall()
|
| 365 |
-
return [row[0] for row in rows]
|
| 366 |
-
|
| 367 |
-
def list_corpora(self) -> list[str]:
|
| 368 |
-
"""Retourne la liste des corpus présents dans l'historique."""
|
| 369 |
-
conn = self._connect()
|
| 370 |
-
rows = conn.execute(
|
| 371 |
-
"SELECT DISTINCT corpus_name FROM runs ORDER BY corpus_name"
|
| 372 |
-
).fetchall()
|
| 373 |
-
return [row[0] for row in rows]
|
| 374 |
-
|
| 375 |
-
def count(self) -> int:
|
| 376 |
-
"""Nombre total d'entrées dans l'historique."""
|
| 377 |
-
conn = self._connect()
|
| 378 |
-
return conn.execute("SELECT COUNT(*) FROM runs").fetchone()[0]
|
| 379 |
-
|
| 380 |
-
# ------------------------------------------------------------------
|
| 381 |
-
# Courbes d'évolution
|
| 382 |
-
# ------------------------------------------------------------------
|
| 383 |
-
|
| 384 |
-
def get_cer_curve(
|
| 385 |
-
self,
|
| 386 |
-
engine: str,
|
| 387 |
-
corpus: Optional[str] = None,
|
| 388 |
-
) -> list[dict]:
|
| 389 |
-
"""Retourne les données pour tracer la courbe d'évolution du CER.
|
| 390 |
-
|
| 391 |
-
Parameters
|
| 392 |
-
----------
|
| 393 |
-
engine:
|
| 394 |
-
Nom du moteur.
|
| 395 |
-
corpus:
|
| 396 |
-
Corpus spécifique (None = tous les corpus pour ce moteur).
|
| 397 |
-
|
| 398 |
-
Returns
|
| 399 |
-
-------
|
| 400 |
-
list[dict]
|
| 401 |
-
Chaque dict contient ``{"timestamp": str, "cer": float, "run_id": str}``.
|
| 402 |
-
"""
|
| 403 |
-
entries = self.query(engine=engine, corpus=corpus, limit=1000)
|
| 404 |
-
return [
|
| 405 |
-
{
|
| 406 |
-
"timestamp": e.timestamp,
|
| 407 |
-
"cer": e.cer_mean,
|
| 408 |
-
"cer_percent": e.cer_percent,
|
| 409 |
-
"run_id": e.run_id,
|
| 410 |
-
"corpus_name": e.corpus_name,
|
| 411 |
-
}
|
| 412 |
-
for e in entries
|
| 413 |
-
if e.cer_mean is not None
|
| 414 |
-
]
|
| 415 |
-
|
| 416 |
-
# ------------------------------------------------------------------
|
| 417 |
-
# Détection de régression
|
| 418 |
-
# ------------------------------------------------------------------
|
| 419 |
-
|
| 420 |
-
def detect_regression(
|
| 421 |
-
self,
|
| 422 |
-
engine: str,
|
| 423 |
-
corpus: Optional[str] = None,
|
| 424 |
-
threshold: float = 0.01,
|
| 425 |
-
baseline_run_id: Optional[str] = None,
|
| 426 |
-
) -> Optional[RegressionResult]:
|
| 427 |
-
"""Détecte une régression du CER entre deux runs.
|
| 428 |
-
|
| 429 |
-
Compare le run le plus récent à une baseline (le run précédent ou
|
| 430 |
-
un run spécifique).
|
| 431 |
-
|
| 432 |
-
Parameters
|
| 433 |
-
----------
|
| 434 |
-
engine:
|
| 435 |
-
Nom du moteur à surveiller.
|
| 436 |
-
corpus:
|
| 437 |
-
Corpus spécifique (None = tous).
|
| 438 |
-
threshold:
|
| 439 |
-
Seuil de régression en points absolus de CER (ex : 0.01 = 1%).
|
| 440 |
-
Si delta_cer > threshold → régression détectée.
|
| 441 |
-
baseline_run_id:
|
| 442 |
-
run_id de référence. Si None, utilise l'avant-dernier run.
|
| 443 |
-
|
| 444 |
-
Returns
|
| 445 |
-
-------
|
| 446 |
-
RegressionResult | None
|
| 447 |
-
None si moins de 2 runs disponibles.
|
| 448 |
-
"""
|
| 449 |
-
entries = self.query(engine=engine, corpus=corpus, limit=1000)
|
| 450 |
-
if len(entries) < 2:
|
| 451 |
-
logger.info("Pas assez de runs pour détecter une régression (moteur=%s)", engine)
|
| 452 |
-
return None
|
| 453 |
-
|
| 454 |
-
current = entries[-1]
|
| 455 |
-
|
| 456 |
-
if baseline_run_id:
|
| 457 |
-
baseline_list = [e for e in entries[:-1] if e.run_id == baseline_run_id]
|
| 458 |
-
baseline = baseline_list[0] if baseline_list else entries[-2]
|
| 459 |
-
else:
|
| 460 |
-
baseline = entries[-2]
|
| 461 |
-
|
| 462 |
-
delta = None
|
| 463 |
-
is_regression = False
|
| 464 |
-
if current.cer_mean is not None and baseline.cer_mean is not None:
|
| 465 |
-
delta = current.cer_mean - baseline.cer_mean
|
| 466 |
-
is_regression = delta > threshold
|
| 467 |
-
|
| 468 |
-
return RegressionResult(
|
| 469 |
-
engine_name=engine,
|
| 470 |
-
corpus_name=corpus or "tous",
|
| 471 |
-
baseline_run_id=baseline.run_id,
|
| 472 |
-
baseline_timestamp=baseline.timestamp,
|
| 473 |
-
baseline_cer=baseline.cer_mean,
|
| 474 |
-
current_run_id=current.run_id,
|
| 475 |
-
current_timestamp=current.timestamp,
|
| 476 |
-
current_cer=current.cer_mean,
|
| 477 |
-
delta_cer=delta,
|
| 478 |
-
is_regression=is_regression,
|
| 479 |
-
threshold=threshold,
|
| 480 |
-
)
|
| 481 |
-
|
| 482 |
-
def detect_all_regressions(
|
| 483 |
-
self,
|
| 484 |
-
threshold: float = 0.01,
|
| 485 |
-
) -> list[RegressionResult]:
|
| 486 |
-
"""Détecte les régressions pour tous les moteurs et corpus connus.
|
| 487 |
-
|
| 488 |
-
Parameters
|
| 489 |
-
----------
|
| 490 |
-
threshold:
|
| 491 |
-
Seuil de régression.
|
| 492 |
-
|
| 493 |
-
Returns
|
| 494 |
-
-------
|
| 495 |
-
list[RegressionResult]
|
| 496 |
-
Uniquement les moteurs où une régression est détectée.
|
| 497 |
-
"""
|
| 498 |
-
results: list[RegressionResult] = []
|
| 499 |
-
engines = self.list_engines()
|
| 500 |
-
corpora = self.list_corpora()
|
| 501 |
-
|
| 502 |
-
for engine in engines:
|
| 503 |
-
for corpus in corpora:
|
| 504 |
-
result = self.detect_regression(engine, corpus, threshold)
|
| 505 |
-
if result and result.is_regression:
|
| 506 |
-
results.append(result)
|
| 507 |
-
|
| 508 |
-
return results
|
| 509 |
-
|
| 510 |
-
# ------------------------------------------------------------------
|
| 511 |
-
# Export
|
| 512 |
-
# ------------------------------------------------------------------
|
| 513 |
-
|
| 514 |
-
def export_json(self, output_path: str) -> Path:
|
| 515 |
-
"""Exporte l'historique complet en JSON.
|
| 516 |
-
|
| 517 |
-
Parameters
|
| 518 |
-
----------
|
| 519 |
-
output_path:
|
| 520 |
-
Chemin du fichier JSON de sortie.
|
| 521 |
-
|
| 522 |
-
Returns
|
| 523 |
-
-------
|
| 524 |
-
Path
|
| 525 |
-
Chemin vers le fichier créé.
|
| 526 |
-
"""
|
| 527 |
-
entries = self.query(limit=100_000)
|
| 528 |
-
path = Path(output_path)
|
| 529 |
-
data = {
|
| 530 |
-
"picarones_history": True,
|
| 531 |
-
"exported_at": datetime.now(timezone.utc).isoformat(),
|
| 532 |
-
"total_runs": len(entries),
|
| 533 |
-
"engines": self.list_engines(),
|
| 534 |
-
"corpora": self.list_corpora(),
|
| 535 |
-
"runs": [e.as_dict() for e in entries],
|
| 536 |
-
}
|
| 537 |
-
path.write_text(json.dumps(data, ensure_ascii=False, indent=2), encoding="utf-8")
|
| 538 |
-
return path
|
| 539 |
-
|
| 540 |
-
def __repr__(self) -> str:
|
| 541 |
-
return f"BenchmarkHistory(db='{self.db_path}', runs={self.count()})"
|
| 542 |
-
|
| 543 |
-
|
| 544 |
-
# ---------------------------------------------------------------------------
|
| 545 |
-
# Données de démonstration longitudinale
|
| 546 |
-
# ---------------------------------------------------------------------------
|
| 547 |
-
|
| 548 |
-
def generate_demo_history(
|
| 549 |
-
db: BenchmarkHistory,
|
| 550 |
-
n_runs: int = 8,
|
| 551 |
-
seed: int = 42,
|
| 552 |
-
) -> None:
|
| 553 |
-
"""Insère des données fictives de suivi longitudinal pour la démo.
|
| 554 |
-
|
| 555 |
-
Simule l'amélioration progressive d'un modèle tesseract sur 8 runs,
|
| 556 |
-
avec une légère régression au run 5.
|
| 557 |
-
|
| 558 |
-
Parameters
|
| 559 |
-
----------
|
| 560 |
-
db:
|
| 561 |
-
Base d'historique à remplir.
|
| 562 |
-
n_runs:
|
| 563 |
-
Nombre de runs à générer.
|
| 564 |
-
seed:
|
| 565 |
-
Graine aléatoire.
|
| 566 |
-
"""
|
| 567 |
-
import random
|
| 568 |
-
rng = random.Random(seed)
|
| 569 |
-
|
| 570 |
-
engines = ["tesseract", "pero_ocr", "ancien_moteur"]
|
| 571 |
-
corpus = "Chroniques médiévales"
|
| 572 |
-
|
| 573 |
-
# Trajectoires de CER simulées (amélioration progressive + bruit)
|
| 574 |
-
base_cers = {
|
| 575 |
-
"tesseract": 0.15,
|
| 576 |
-
"pero_ocr": 0.09,
|
| 577 |
-
"ancien_moteur": 0.28,
|
| 578 |
-
}
|
| 579 |
-
improvements = {
|
| 580 |
-
"tesseract": -0.008, # améliore de ~0.8% par run
|
| 581 |
-
"pero_ocr": -0.005, # améliore de ~0.5% par run
|
| 582 |
-
"ancien_moteur": -0.003,
|
| 583 |
-
}
|
| 584 |
-
|
| 585 |
-
from datetime import timedelta
|
| 586 |
-
base_date = datetime(2024, 9, 1, tzinfo=timezone.utc)
|
| 587 |
-
|
| 588 |
-
for run_idx in range(n_runs):
|
| 589 |
-
run_date = base_date + timedelta(weeks=run_idx * 2)
|
| 590 |
-
run_id = f"demo_run_{run_idx + 1:02d}"
|
| 591 |
-
|
| 592 |
-
for engine in engines:
|
| 593 |
-
cer = base_cers[engine] + improvements[engine] * run_idx
|
| 594 |
-
# Ajouter du bruit + régression au run 5
|
| 595 |
-
noise = rng.gauss(0, 0.005)
|
| 596 |
-
if run_idx == 4 and engine == "tesseract":
|
| 597 |
-
noise += 0.02 # régression simulée
|
| 598 |
-
cer = max(0.01, min(0.5, cer + noise))
|
| 599 |
-
|
| 600 |
-
wer = cer * 1.8 + rng.gauss(0, 0.01)
|
| 601 |
-
wer = max(0.01, min(0.9, wer))
|
| 602 |
|
| 603 |
-
|
| 604 |
-
|
| 605 |
-
|
| 606 |
-
|
| 607 |
-
cer_mean=round(cer, 4),
|
| 608 |
-
wer_mean=round(wer, 4),
|
| 609 |
-
doc_count=12,
|
| 610 |
-
timestamp=run_date.isoformat(),
|
| 611 |
-
metadata={
|
| 612 |
-
"note": f"Run de démonstration #{run_idx + 1}",
|
| 613 |
-
"engine_version": f"5.{run_idx}.0" if engine == "tesseract" else "0.7.2",
|
| 614 |
-
},
|
| 615 |
-
)
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.history`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.history import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
|
|
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.history import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.history as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,391 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
- **Score de contraste** : ratio Michelson entre zones sombres (encre) et claires (fond)
|
| 9 |
-
- **Score de qualité global** : combinaison normalisée des métriques ci-dessus
|
| 10 |
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
Note
|
| 16 |
-
----
|
| 17 |
-
Pour les images placeholder (fixtures), des valeurs fictives cohérentes
|
| 18 |
-
sont générées via `generate_mock_quality_scores()`.
|
| 19 |
"""
|
| 20 |
|
| 21 |
-
from
|
| 22 |
-
|
| 23 |
-
import logging
|
| 24 |
-
import math
|
| 25 |
-
import statistics
|
| 26 |
-
from dataclasses import dataclass
|
| 27 |
-
from pathlib import Path
|
| 28 |
-
from typing import Optional
|
| 29 |
-
|
| 30 |
-
logger = logging.getLogger(__name__)
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
@dataclass
|
| 34 |
-
class ImageQualityResult:
|
| 35 |
-
"""Métriques de qualité d'une image de document."""
|
| 36 |
-
|
| 37 |
-
sharpness_score: float = 0.0
|
| 38 |
-
"""Score de netteté [0, 1]. Basé sur la variance du laplacien normalisée."""
|
| 39 |
-
|
| 40 |
-
noise_level: float = 0.0
|
| 41 |
-
"""Niveau de bruit [0, 1]. 0 = pas de bruit, 1 = très bruité."""
|
| 42 |
-
|
| 43 |
-
rotation_degrees: float = 0.0
|
| 44 |
-
"""Angle de rotation résiduel estimé en degrés (positif = sens horaire)."""
|
| 45 |
-
|
| 46 |
-
contrast_score: float = 0.0
|
| 47 |
-
"""Score de contraste [0, 1]. Ratio Michelson encre/fond."""
|
| 48 |
-
|
| 49 |
-
quality_score: float = 0.0
|
| 50 |
-
"""Score de qualité global [0, 1]. Combinaison pondérée des autres métriques."""
|
| 51 |
-
|
| 52 |
-
analysis_method: str = "none"
|
| 53 |
-
"""Méthode d'analyse utilisée : 'pillow', 'numpy', 'mock'."""
|
| 54 |
-
|
| 55 |
-
error: Optional[str] = None
|
| 56 |
-
"""Erreur si l'analyse a échoué."""
|
| 57 |
-
|
| 58 |
-
@property
|
| 59 |
-
def is_good_quality(self) -> bool:
|
| 60 |
-
"""Vrai si le score de qualité global est ≥ 0.7."""
|
| 61 |
-
return self.quality_score >= 0.7
|
| 62 |
-
|
| 63 |
-
@property
|
| 64 |
-
def quality_tier(self) -> str:
|
| 65 |
-
"""Catégorie de qualité : 'good', 'medium', 'poor'."""
|
| 66 |
-
if self.quality_score >= 0.7:
|
| 67 |
-
return "good"
|
| 68 |
-
elif self.quality_score >= 0.4:
|
| 69 |
-
return "medium"
|
| 70 |
-
return "poor"
|
| 71 |
-
|
| 72 |
-
def as_dict(self) -> dict:
|
| 73 |
-
d = {
|
| 74 |
-
"sharpness_score": round(self.sharpness_score, 4),
|
| 75 |
-
"noise_level": round(self.noise_level, 4),
|
| 76 |
-
"rotation_degrees": round(self.rotation_degrees, 2),
|
| 77 |
-
"contrast_score": round(self.contrast_score, 4),
|
| 78 |
-
"quality_score": round(self.quality_score, 4),
|
| 79 |
-
"quality_tier": self.quality_tier,
|
| 80 |
-
"analysis_method": self.analysis_method,
|
| 81 |
-
}
|
| 82 |
-
if self.error:
|
| 83 |
-
d["error"] = self.error
|
| 84 |
-
return d
|
| 85 |
-
|
| 86 |
-
@classmethod
|
| 87 |
-
def from_dict(cls, data: dict) -> "ImageQualityResult":
|
| 88 |
-
return cls(
|
| 89 |
-
sharpness_score=data.get("sharpness_score", 0.0),
|
| 90 |
-
noise_level=data.get("noise_level", 0.0),
|
| 91 |
-
rotation_degrees=data.get("rotation_degrees", 0.0),
|
| 92 |
-
contrast_score=data.get("contrast_score", 0.0),
|
| 93 |
-
quality_score=data.get("quality_score", 0.0),
|
| 94 |
-
analysis_method=data.get("analysis_method", "none"),
|
| 95 |
-
error=data.get("error"),
|
| 96 |
-
)
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
def analyze_image_quality(image_path: str | Path) -> ImageQualityResult:
|
| 100 |
-
"""Analyse la qualité d'une image de document numérisé.
|
| 101 |
-
|
| 102 |
-
Essaie successivement :
|
| 103 |
-
1. Pillow + NumPy (méthode complète)
|
| 104 |
-
2. Pillow seul (méthode simplifiée)
|
| 105 |
-
3. Fallback : retourne un résultat vide avec erreur
|
| 106 |
-
|
| 107 |
-
Parameters
|
| 108 |
-
----------
|
| 109 |
-
image_path:
|
| 110 |
-
Chemin vers l'image (JPG, PNG, TIFF…).
|
| 111 |
-
|
| 112 |
-
Returns
|
| 113 |
-
-------
|
| 114 |
-
ImageQualityResult
|
| 115 |
-
"""
|
| 116 |
-
path = Path(image_path)
|
| 117 |
-
if not path.exists():
|
| 118 |
-
return ImageQualityResult(
|
| 119 |
-
error=f"Fichier image introuvable : {image_path}",
|
| 120 |
-
analysis_method="none",
|
| 121 |
-
)
|
| 122 |
-
|
| 123 |
-
# Essai avec Pillow + NumPy
|
| 124 |
-
try:
|
| 125 |
-
import numpy as np
|
| 126 |
-
from PIL import Image
|
| 127 |
-
return _analyze_with_numpy(path, np, Image)
|
| 128 |
-
except ImportError:
|
| 129 |
-
pass
|
| 130 |
-
|
| 131 |
-
# Essai avec Pillow seul
|
| 132 |
-
try:
|
| 133 |
-
from PIL import Image
|
| 134 |
-
return _analyze_with_pillow(path, Image)
|
| 135 |
-
except ImportError:
|
| 136 |
-
pass
|
| 137 |
-
|
| 138 |
-
return ImageQualityResult(
|
| 139 |
-
error="Pillow non disponible (pip install Pillow)",
|
| 140 |
-
analysis_method="none",
|
| 141 |
-
quality_score=0.5, # valeur neutre
|
| 142 |
-
)
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
def _analyze_with_numpy(path: Path, np, Image) -> ImageQualityResult:
|
| 146 |
-
"""Analyse complète avec NumPy."""
|
| 147 |
-
img = Image.open(path).convert("L") # niveaux de gris
|
| 148 |
-
arr = np.array(img, dtype=np.float32)
|
| 149 |
-
|
| 150 |
-
# 1. Netteté : variance du laplacien
|
| 151 |
-
laplacian = _laplacian_variance_numpy(arr, np)
|
| 152 |
-
# Normalisation empirique : variance > 500 = très net, < 50 = flou
|
| 153 |
-
sharpness = min(1.0, laplacian / 500.0)
|
| 154 |
-
|
| 155 |
-
# 2. Bruit : écart-type des résidus (différence image - image lissée)
|
| 156 |
-
noise = _noise_level_numpy(arr, np)
|
| 157 |
-
|
| 158 |
-
# 3. Rotation : angle d'inclinaison estimé
|
| 159 |
-
rotation = _estimate_rotation_numpy(arr, np)
|
| 160 |
-
|
| 161 |
-
# 4. Contraste : ratio Michelson
|
| 162 |
-
contrast = _contrast_score_numpy(arr, np)
|
| 163 |
-
|
| 164 |
-
# 5. Score global pondéré
|
| 165 |
-
quality = _global_quality_score(sharpness, noise, abs(rotation), contrast)
|
| 166 |
-
|
| 167 |
-
return ImageQualityResult(
|
| 168 |
-
sharpness_score=float(sharpness),
|
| 169 |
-
noise_level=float(noise),
|
| 170 |
-
rotation_degrees=float(rotation),
|
| 171 |
-
contrast_score=float(contrast),
|
| 172 |
-
quality_score=float(quality),
|
| 173 |
-
analysis_method="numpy",
|
| 174 |
-
)
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
def _analyze_with_pillow(path: Path, Image) -> ImageQualityResult:
|
| 178 |
-
"""Analyse simplifiée avec Pillow seul (sans NumPy)."""
|
| 179 |
-
img = Image.open(path).convert("L")
|
| 180 |
-
pixels = list(img.tobytes()) # mode "L" = 1 byte/pixel
|
| 181 |
-
w, h = img.size
|
| 182 |
-
|
| 183 |
-
if not pixels:
|
| 184 |
-
return ImageQualityResult(quality_score=0.5, analysis_method="pillow")
|
| 185 |
-
|
| 186 |
-
# Contraste : étendue des valeurs
|
| 187 |
-
min_val = min(pixels)
|
| 188 |
-
max_val = max(pixels)
|
| 189 |
-
if max_val + min_val > 0:
|
| 190 |
-
contrast = (max_val - min_val) / (max_val + min_val)
|
| 191 |
-
else:
|
| 192 |
-
contrast = 0.0
|
| 193 |
-
|
| 194 |
-
# Netteté approximée : variance globale des pixels
|
| 195 |
-
try:
|
| 196 |
-
variance = statistics.variance(pixels)
|
| 197 |
-
except statistics.StatisticsError:
|
| 198 |
-
variance = 0.0
|
| 199 |
-
sharpness = min(1.0, math.sqrt(variance) / 128.0)
|
| 200 |
-
|
| 201 |
-
# Bruit : approximation grossière
|
| 202 |
-
noise = min(1.0, statistics.stdev(pixels[:min(1000, len(pixels))]) / 64.0) if len(pixels) > 1 else 0.0
|
| 203 |
-
|
| 204 |
-
quality = _global_quality_score(sharpness, noise, 0.0, contrast)
|
| 205 |
-
|
| 206 |
-
return ImageQualityResult(
|
| 207 |
-
sharpness_score=sharpness,
|
| 208 |
-
noise_level=noise,
|
| 209 |
-
rotation_degrees=0.0, # non calculé sans NumPy
|
| 210 |
-
contrast_score=contrast,
|
| 211 |
-
quality_score=quality,
|
| 212 |
-
analysis_method="pillow",
|
| 213 |
-
)
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
def _laplacian_variance_numpy(arr, np) -> float:
|
| 217 |
-
"""Calcule la variance du laplacien (mesure de netteté)."""
|
| 218 |
-
# Convolution laplacien 3x3 via slicing (bordures ignorées)
|
| 219 |
-
h, w = arr.shape
|
| 220 |
-
if h < 3 or w < 3:
|
| 221 |
-
return float(np.var(arr))
|
| 222 |
-
|
| 223 |
-
# Utiliser une convolution rapide avec slicing
|
| 224 |
-
center = arr[1:-1, 1:-1]
|
| 225 |
-
top = arr[:-2, 1:-1]
|
| 226 |
-
bottom = arr[2:, 1:-1]
|
| 227 |
-
left = arr[1:-1, :-2]
|
| 228 |
-
right = arr[1:-1, 2:]
|
| 229 |
-
lap = top + bottom + left + right - 4 * center
|
| 230 |
-
|
| 231 |
-
return float(np.var(lap))
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
def _noise_level_numpy(arr, np) -> float:
|
| 235 |
-
"""Estime le niveau de bruit par la MAD (Median Absolute Deviation) des gradients."""
|
| 236 |
-
h, w = arr.shape
|
| 237 |
-
if h < 2 or w < 2:
|
| 238 |
-
return 0.0
|
| 239 |
-
# Différences horizontales et verticales
|
| 240 |
-
diff_h = np.abs(arr[:, 1:] - arr[:, :-1])
|
| 241 |
-
diff_v = np.abs(arr[1:, :] - arr[:-1, :])
|
| 242 |
-
noise_std = float(np.median(np.concatenate([diff_h.ravel(), diff_v.ravel()])))
|
| 243 |
-
# Normaliser : 0 = pas de bruit, 1 = très bruité (seuil à ~30)
|
| 244 |
-
return min(1.0, noise_std / 30.0)
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
def _estimate_rotation_numpy(arr, np) -> float:
|
| 248 |
-
"""Estime l'angle de rotation par projection horizontale simplifiée.
|
| 249 |
-
|
| 250 |
-
Retourne l'angle estimé en degrés [-45, 45].
|
| 251 |
-
"""
|
| 252 |
-
# Méthode simplifiée : analyse de la variance des projections à différents angles
|
| 253 |
-
# Limiter à quelques angles pour la performance
|
| 254 |
-
h, w = arr.shape
|
| 255 |
-
if h < 20 or w < 20:
|
| 256 |
-
return 0.0
|
| 257 |
-
|
| 258 |
-
# Sous-échantillonnage pour la performance
|
| 259 |
-
step = max(1, h // 100)
|
| 260 |
-
sample = arr[::step, :]
|
| 261 |
-
|
| 262 |
-
best_angle = 0.0
|
| 263 |
-
best_var = -1.0
|
| 264 |
-
|
| 265 |
-
for angle_deg in range(-5, 6): # ±5 degrés, pas de 1°
|
| 266 |
-
angle_rad = math.radians(angle_deg)
|
| 267 |
-
# Projection horizontale après rotation approximative
|
| 268 |
-
# (approximation linéaire rapide)
|
| 269 |
-
offsets = np.round(
|
| 270 |
-
np.arange(sample.shape[0]) * math.tan(angle_rad)
|
| 271 |
-
).astype(int)
|
| 272 |
-
offsets = np.clip(offsets, 0, w - 1)
|
| 273 |
-
|
| 274 |
-
# Variance des sommes de lignes décalées
|
| 275 |
-
try:
|
| 276 |
-
row_sums = np.array([
|
| 277 |
-
float(np.sum(sample[i, max(0, offsets[i]):min(w, offsets[i]+w)]))
|
| 278 |
-
for i in range(sample.shape[0])
|
| 279 |
-
])
|
| 280 |
-
var = float(np.var(row_sums))
|
| 281 |
-
if var > best_var:
|
| 282 |
-
best_var = var
|
| 283 |
-
best_angle = float(angle_deg)
|
| 284 |
-
except Exception as e:
|
| 285 |
-
logger.warning(
|
| 286 |
-
"[image_quality] projection à %d° indisponible : %s",
|
| 287 |
-
angle_deg, e,
|
| 288 |
-
)
|
| 289 |
-
|
| 290 |
-
return best_angle
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
def _contrast_score_numpy(arr, np) -> float:
|
| 294 |
-
"""Score de contraste Michelson [0, 1]."""
|
| 295 |
-
p5 = float(np.percentile(arr, 5)) # fond clair
|
| 296 |
-
p95 = float(np.percentile(arr, 95)) # encre sombre
|
| 297 |
-
if p5 + p95 == 0:
|
| 298 |
-
return 0.0
|
| 299 |
-
# Michelson : (Imax - Imin) / (Imax + Imin)
|
| 300 |
-
return float((p95 - p5) / (p95 + p5))
|
| 301 |
-
|
| 302 |
-
|
| 303 |
-
def _global_quality_score(
|
| 304 |
-
sharpness: float,
|
| 305 |
-
noise: float,
|
| 306 |
-
rotation_abs: float,
|
| 307 |
-
contrast: float,
|
| 308 |
-
) -> float:
|
| 309 |
-
"""Calcule le score de qualité global pondéré."""
|
| 310 |
-
# Poids : netteté (40%), contraste (30%), bruit (20%), rotation (10%)
|
| 311 |
-
score = (
|
| 312 |
-
0.40 * sharpness
|
| 313 |
-
+ 0.30 * contrast
|
| 314 |
-
+ 0.20 * (1.0 - noise) # moins de bruit = mieux
|
| 315 |
-
+ 0.10 * max(0.0, 1.0 - rotation_abs / 10.0) # ±10° max
|
| 316 |
-
)
|
| 317 |
-
return round(min(1.0, max(0.0, score)), 4)
|
| 318 |
-
|
| 319 |
-
|
| 320 |
-
# ---------------------------------------------------------------------------
|
| 321 |
-
# Données fictives pour les fixtures de démo
|
| 322 |
-
# ---------------------------------------------------------------------------
|
| 323 |
-
|
| 324 |
-
def generate_mock_quality_scores(
|
| 325 |
-
doc_id: str,
|
| 326 |
-
seed: Optional[int] = None,
|
| 327 |
-
) -> ImageQualityResult:
|
| 328 |
-
"""Génère des métriques de qualité fictives mais cohérentes pour un document.
|
| 329 |
-
|
| 330 |
-
Utilisé par les fixtures de démo pour simuler une diversité réaliste
|
| 331 |
-
de qualités d'image (bonne, moyenne, dégradée).
|
| 332 |
-
|
| 333 |
-
Parameters
|
| 334 |
-
----------
|
| 335 |
-
doc_id:
|
| 336 |
-
Identifiant du document (utilisé pour la reproductibilité).
|
| 337 |
-
seed:
|
| 338 |
-
Graine aléatoire optionnelle.
|
| 339 |
-
"""
|
| 340 |
-
import random
|
| 341 |
-
rng = random.Random(seed or hash(doc_id) % 2**32)
|
| 342 |
-
|
| 343 |
-
# Générer une qualité cohérente : certains docs sont plus difficiles
|
| 344 |
-
base_quality = 0.3 + rng.random() * 0.6 # 0.3 à 0.9
|
| 345 |
-
|
| 346 |
-
sharpness = max(0.1, min(1.0, base_quality + rng.gauss(0, 0.1)))
|
| 347 |
-
noise = max(0.0, min(1.0, (1.0 - base_quality) * 0.8 + rng.gauss(0, 0.05)))
|
| 348 |
-
rotation = rng.gauss(0, 1.5) # ±1.5° typique
|
| 349 |
-
contrast = max(0.2, min(1.0, base_quality + rng.gauss(0, 0.15)))
|
| 350 |
-
|
| 351 |
-
quality = _global_quality_score(sharpness, noise, abs(rotation), contrast)
|
| 352 |
-
|
| 353 |
-
return ImageQualityResult(
|
| 354 |
-
sharpness_score=round(sharpness, 4),
|
| 355 |
-
noise_level=round(noise, 4),
|
| 356 |
-
rotation_degrees=round(rotation, 2),
|
| 357 |
-
contrast_score=round(contrast, 4),
|
| 358 |
-
quality_score=round(quality, 4),
|
| 359 |
-
analysis_method="mock",
|
| 360 |
-
)
|
| 361 |
-
|
| 362 |
-
|
| 363 |
-
def aggregate_image_quality(results: list[ImageQualityResult]) -> dict:
|
| 364 |
-
"""Agrège les métriques de qualité image sur un corpus."""
|
| 365 |
-
if not results:
|
| 366 |
-
return {}
|
| 367 |
-
|
| 368 |
-
valid = [r for r in results if r.error is None]
|
| 369 |
-
if not valid:
|
| 370 |
-
return {"error": "Aucune analyse réussie"}
|
| 371 |
-
|
| 372 |
-
def _mean(vals: list[float]) -> float:
|
| 373 |
-
return round(statistics.mean(vals), 4) if vals else 0.0
|
| 374 |
-
|
| 375 |
-
quality_scores = [r.quality_score for r in valid]
|
| 376 |
-
sharpness_scores = [r.sharpness_score for r in valid]
|
| 377 |
-
noise_levels = [r.noise_level for r in valid]
|
| 378 |
-
|
| 379 |
-
# Distribution par tier
|
| 380 |
-
tiers = {"good": 0, "medium": 0, "poor": 0}
|
| 381 |
-
for r in valid:
|
| 382 |
-
tiers[r.quality_tier] += 1
|
| 383 |
|
| 384 |
-
|
| 385 |
-
|
| 386 |
-
|
| 387 |
-
|
| 388 |
-
"quality_distribution": tiers,
|
| 389 |
-
"document_count": len(valid),
|
| 390 |
-
"scores": [r.quality_score for r in valid], # pour scatter plot
|
| 391 |
-
}
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.image_quality`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.image_quality import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.image_quality import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.image_quality as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,253 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
mappeurs = 180 pipelines à comparer, le rapport noie
|
| 9 |
-
l'information. Il faut un mécanisme de **comparaison
|
| 10 |
-
contrôlée** type design d'expérience.
|
| 11 |
-
|
| 12 |
-
Méthode
|
| 13 |
-
-------
|
| 14 |
-
Pour mesurer l'effet isolé d'un slot ``varying`` :
|
| 15 |
-
|
| 16 |
-
1. Fixer les valeurs des autres slots (``fixed``).
|
| 17 |
-
2. Pour chaque combinaison des fixed, comparer les pipelines
|
| 18 |
-
qui ne diffèrent que sur le slot varying.
|
| 19 |
-
3. Agréger : pour chaque valeur du slot varying, calculer
|
| 20 |
-
sa moyenne, son écart-type, son rang moyen sur les groupes.
|
| 21 |
-
|
| 22 |
-
C'est presque un Latin square automatisé. Sans ça, le
|
| 23 |
-
rapport sur 180 pipelines est inutilisable.
|
| 24 |
-
|
| 25 |
-
Pas de tests statistiques scipy
|
| 26 |
-
-------------------------------
|
| 27 |
-
On ne reconstruit pas Friedman/Nemenyi (déjà dans Sprint 18) ;
|
| 28 |
-
on agrège ici les données nécessaires pour qu'un
|
| 29 |
-
tests statistique externe puisse les consommer. Le rapport
|
| 30 |
-
existant reste libre de brancher
|
| 31 |
-
``picarones.core.statistics.friedman_test`` sur la sortie de
|
| 32 |
-
ce module.
|
| 33 |
-
|
| 34 |
-
Sortie
|
| 35 |
-
------
|
| 36 |
-
``compare_isolated_effect(runs, varying_slot)`` retourne :
|
| 37 |
-
|
| 38 |
-
.. code-block:: text
|
| 39 |
-
|
| 40 |
-
{
|
| 41 |
-
"varying_slot": str,
|
| 42 |
-
"n_runs": int,
|
| 43 |
-
"n_groups": int, # combinaisons fixed distinctes
|
| 44 |
-
"values": list[str], # valeurs distinctes du slot
|
| 45 |
-
"per_value": {value: {
|
| 46 |
-
"n_observations": int,
|
| 47 |
-
"mean": float | None,
|
| 48 |
-
"stdev": float | None,
|
| 49 |
-
"min": float, "max": float,
|
| 50 |
-
"mean_rank": float | None,
|
| 51 |
-
}},
|
| 52 |
-
"best_value": str | None,
|
| 53 |
-
"worst_value": str | None,
|
| 54 |
-
"groups": list[dict], # détail par groupe
|
| 55 |
-
}
|
| 56 |
"""
|
| 57 |
|
| 58 |
-
from
|
| 59 |
-
|
| 60 |
-
import logging
|
| 61 |
-
import statistics
|
| 62 |
-
from dataclasses import dataclass
|
| 63 |
-
from typing import Optional
|
| 64 |
-
|
| 65 |
-
logger = logging.getLogger(__name__)
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
@dataclass(frozen=True)
|
| 69 |
-
class PipelineRun:
|
| 70 |
-
"""Un run de pipeline composée pour la comparaison contrôlée.
|
| 71 |
-
|
| 72 |
-
Attributes
|
| 73 |
-
----------
|
| 74 |
-
name:
|
| 75 |
-
Nom du run (libre — informatif uniquement).
|
| 76 |
-
slots:
|
| 77 |
-
Map ``{slot_name: module_name}`` décrivant la pipeline
|
| 78 |
-
(ex. ``{"ocr": "tess", "llm": "gpt-4o"}``).
|
| 79 |
-
score:
|
| 80 |
-
Métrique numérique à comparer (CER moyen typiquement).
|
| 81 |
-
Plus bas = meilleur par convention sauf si
|
| 82 |
-
``higher_is_better=True`` est passé à
|
| 83 |
-
``compare_isolated_effect``.
|
| 84 |
-
"""
|
| 85 |
-
|
| 86 |
-
name: str
|
| 87 |
-
slots: dict[str, str]
|
| 88 |
-
score: float
|
| 89 |
-
|
| 90 |
-
def as_dict(self) -> dict:
|
| 91 |
-
return {
|
| 92 |
-
"name": self.name,
|
| 93 |
-
"slots": dict(self.slots),
|
| 94 |
-
"score": self.score,
|
| 95 |
-
}
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
def _normalise_runs(runs) -> list[PipelineRun]:
|
| 99 |
-
"""Accepte une liste de ``PipelineRun`` ou de dicts compatibles."""
|
| 100 |
-
out: list[PipelineRun] = []
|
| 101 |
-
for r in runs:
|
| 102 |
-
if isinstance(r, PipelineRun):
|
| 103 |
-
out.append(r)
|
| 104 |
-
continue
|
| 105 |
-
if not isinstance(r, dict):
|
| 106 |
-
continue
|
| 107 |
-
slots = r.get("slots") or {}
|
| 108 |
-
if not isinstance(slots, dict):
|
| 109 |
-
continue
|
| 110 |
-
try:
|
| 111 |
-
score = float(r.get("score"))
|
| 112 |
-
except (TypeError, ValueError):
|
| 113 |
-
continue
|
| 114 |
-
out.append(PipelineRun(
|
| 115 |
-
name=str(r.get("name") or ""),
|
| 116 |
-
slots={str(k): str(v) for k, v in slots.items()},
|
| 117 |
-
score=score,
|
| 118 |
-
))
|
| 119 |
-
return out
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
def compare_isolated_effect(
|
| 123 |
-
runs,
|
| 124 |
-
varying_slot: str,
|
| 125 |
-
*,
|
| 126 |
-
higher_is_better: bool = False,
|
| 127 |
-
) -> Optional[dict]:
|
| 128 |
-
"""Mesure l'effet isolé du slot ``varying_slot``.
|
| 129 |
-
|
| 130 |
-
Parameters
|
| 131 |
-
----------
|
| 132 |
-
runs:
|
| 133 |
-
Liste de ``PipelineRun`` (ou dicts compatibles).
|
| 134 |
-
varying_slot:
|
| 135 |
-
Nom du slot dont on veut isoler l'effet. Les autres
|
| 136 |
-
slots constituent les groupes de contrôle.
|
| 137 |
-
higher_is_better:
|
| 138 |
-
Si ``True``, on inverse la convention de classement
|
| 139 |
-
(rang 1 = score le plus haut). Défaut ``False`` =
|
| 140 |
-
rang 1 = score le plus bas (CER).
|
| 141 |
-
|
| 142 |
-
Returns
|
| 143 |
-
-------
|
| 144 |
-
dict | None
|
| 145 |
-
``None`` si moins de 2 runs ou si ``varying_slot``
|
| 146 |
-
n'est présent dans aucun run.
|
| 147 |
-
"""
|
| 148 |
-
runs_list = _normalise_runs(runs)
|
| 149 |
-
if len(runs_list) < 2:
|
| 150 |
-
return None
|
| 151 |
-
runs_list = [r for r in runs_list if varying_slot in r.slots]
|
| 152 |
-
if not runs_list:
|
| 153 |
-
return None
|
| 154 |
-
|
| 155 |
-
# Constitue les groupes par valeurs des slots fixed
|
| 156 |
-
groups: dict[tuple, list[PipelineRun]] = {}
|
| 157 |
-
fixed_slot_names: list[str] = []
|
| 158 |
-
for r in runs_list:
|
| 159 |
-
other_slots = sorted(k for k in r.slots if k != varying_slot)
|
| 160 |
-
if not fixed_slot_names:
|
| 161 |
-
fixed_slot_names = other_slots
|
| 162 |
-
# Skip runs avec un schéma de slots incompatible
|
| 163 |
-
if other_slots != fixed_slot_names:
|
| 164 |
-
continue
|
| 165 |
-
key = tuple((k, r.slots[k]) for k in other_slots)
|
| 166 |
-
groups.setdefault(key, []).append(r)
|
| 167 |
-
|
| 168 |
-
if not groups:
|
| 169 |
-
return None
|
| 170 |
-
|
| 171 |
-
# Pour chaque groupe : ranking des runs par score
|
| 172 |
-
per_value: dict[str, dict] = {}
|
| 173 |
-
group_details: list[dict] = []
|
| 174 |
-
for key, members in groups.items():
|
| 175 |
-
members_sorted = sorted(
|
| 176 |
-
members, key=lambda x: x.score, reverse=higher_is_better,
|
| 177 |
-
)
|
| 178 |
-
# Rangs : runs ex aequo partagent la moyenne des rangs
|
| 179 |
-
ranks: dict[str, float] = {}
|
| 180 |
-
i = 0
|
| 181 |
-
while i < len(members_sorted):
|
| 182 |
-
j = i
|
| 183 |
-
while (
|
| 184 |
-
j + 1 < len(members_sorted)
|
| 185 |
-
and members_sorted[j + 1].score == members_sorted[i].score
|
| 186 |
-
):
|
| 187 |
-
j += 1
|
| 188 |
-
avg_rank = (i + 1 + j + 1) / 2
|
| 189 |
-
for k in range(i, j + 1):
|
| 190 |
-
value = members_sorted[k].slots[varying_slot]
|
| 191 |
-
ranks[value] = avg_rank
|
| 192 |
-
i = j + 1
|
| 193 |
-
|
| 194 |
-
for r in members:
|
| 195 |
-
value = r.slots[varying_slot]
|
| 196 |
-
slot = per_value.setdefault(value, {
|
| 197 |
-
"scores": [],
|
| 198 |
-
"ranks": [],
|
| 199 |
-
})
|
| 200 |
-
slot["scores"].append(r.score)
|
| 201 |
-
slot["ranks"].append(ranks[value])
|
| 202 |
-
group_details.append({
|
| 203 |
-
"fixed_slots": dict(key),
|
| 204 |
-
"n_members": len(members),
|
| 205 |
-
"values": [r.slots[varying_slot] for r in members_sorted],
|
| 206 |
-
"scores": [r.score for r in members_sorted],
|
| 207 |
-
})
|
| 208 |
-
|
| 209 |
-
# Calcul mean/stdev/min/max + rang moyen par valeur
|
| 210 |
-
summary: dict[str, dict] = {}
|
| 211 |
-
for value, slot in per_value.items():
|
| 212 |
-
scores = slot["scores"]
|
| 213 |
-
ranks = slot["ranks"]
|
| 214 |
-
summary[value] = {
|
| 215 |
-
"n_observations": len(scores),
|
| 216 |
-
"mean": statistics.fmean(scores) if scores else None,
|
| 217 |
-
"stdev": (
|
| 218 |
-
statistics.stdev(scores) if len(scores) >= 2 else None
|
| 219 |
-
),
|
| 220 |
-
"min": min(scores),
|
| 221 |
-
"max": max(scores),
|
| 222 |
-
"mean_rank": (
|
| 223 |
-
statistics.fmean(ranks) if ranks else None
|
| 224 |
-
),
|
| 225 |
-
}
|
| 226 |
-
|
| 227 |
-
# Best/worst : sur la mean (convention CER : plus bas = meilleur)
|
| 228 |
-
by_mean = sorted(
|
| 229 |
-
((v, d["mean"]) for v, d in summary.items()
|
| 230 |
-
if d["mean"] is not None),
|
| 231 |
-
key=lambda kv: kv[1],
|
| 232 |
-
reverse=higher_is_better,
|
| 233 |
-
)
|
| 234 |
-
best_value = by_mean[0][0] if by_mean else None
|
| 235 |
-
worst_value = by_mean[-1][0] if by_mean else None
|
| 236 |
-
|
| 237 |
-
return {
|
| 238 |
-
"varying_slot": varying_slot,
|
| 239 |
-
"n_runs": len(runs_list),
|
| 240 |
-
"n_groups": len(groups),
|
| 241 |
-
"values": sorted(per_value.keys()),
|
| 242 |
-
"per_value": summary,
|
| 243 |
-
"best_value": best_value,
|
| 244 |
-
"worst_value": worst_value,
|
| 245 |
-
"groups": group_details,
|
| 246 |
-
"higher_is_better": higher_is_better,
|
| 247 |
-
}
|
| 248 |
-
|
| 249 |
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
"
|
| 253 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.incremental_comparison`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.incremental_comparison import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.incremental_comparison import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.incremental_comparison as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,484 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
moteurs spécialisés sur des classes d'erreurs distinctes (visual vs
|
| 10 |
-
abréviation vs casse) et donc des candidats pour un voting ensemble.
|
| 11 |
-
|
| 12 |
-
2. **Complémentarité** (`oracle_token_recall`, `complementarity_gap`,
|
| 13 |
-
`pairwise_disagreement_rate`) — *quel CER serait atteignable si on
|
| 14 |
-
combinait les moteurs ?* La borne inférieure du CER atteignable par
|
| 15 |
-
un voting majoritaire token-level est ``1 - oracle_token_recall``.
|
| 16 |
-
Si elle est très inférieure au CER du meilleur moteur seul, l'effort
|
| 17 |
-
d'un pipeline d'ensemble se justifie. Sinon non.
|
| 18 |
-
|
| 19 |
-
Convention de typage
|
| 20 |
-
--------------------
|
| 21 |
-
Toutes les fonctions sont enregistrables dans le registre Sprint 34 si
|
| 22 |
-
on les wrappe par un adaptateur ``(input_types=(TEXT, TEXT))``. Pour
|
| 23 |
-
limiter le bruit, on ne les enregistre **pas** automatiquement : ce sont
|
| 24 |
-
des métriques d'agrégation (multi-moteurs ou multi-documents) qui ne
|
| 25 |
-
correspondent pas au modèle « une jonction = une métrique » du runner.
|
| 26 |
-
Elles sont consommées par les détecteurs narratifs et le rapport HTML.
|
| 27 |
-
|
| 28 |
-
Note sur l'oracle
|
| 29 |
-
-----------------
|
| 30 |
-
La métrique ``oracle_token_recall`` retournée ici utilise un alignement
|
| 31 |
-
bag-of-words pondéré par multiplicité. Ce n'est **pas** une vraie
|
| 32 |
-
borne atteignable par voting majoritaire séquentiel — c'est une borne
|
| 33 |
-
supérieure (proxy optimiste). La vraie borne demanderait un
|
| 34 |
-
alignement séquentiel des hypothèses, ce qui est plus coûteux. Pour
|
| 35 |
-
le diagnostic « ensemble vaut-il le coup ? », le proxy suffit
|
| 36 |
-
largement ; on documente clairement la limite dans le glossaire et le
|
| 37 |
-
rapport.
|
| 38 |
"""
|
| 39 |
|
| 40 |
-
from
|
| 41 |
-
|
| 42 |
-
import logging
|
| 43 |
-
import math
|
| 44 |
-
from collections import Counter
|
| 45 |
-
|
| 46 |
-
logger = logging.getLogger(__name__)
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 50 |
-
# Divergence taxonomique (KL / Jensen-Shannon)
|
| 51 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
def _smoothed_distribution(
|
| 55 |
-
distribution: dict[str, float],
|
| 56 |
-
keys: list[str],
|
| 57 |
-
epsilon: float = 1e-12,
|
| 58 |
-
) -> list[float]:
|
| 59 |
-
"""Aligne une distribution sur l'ordre de ``keys`` et lisse les zéros.
|
| 60 |
-
|
| 61 |
-
Le lissage évite ``log(0)`` dans la KL. ``epsilon`` est volontairement
|
| 62 |
-
minuscule pour ne pas modifier le résultat de manière sensible.
|
| 63 |
-
"""
|
| 64 |
-
smoothed = [max(distribution.get(k, 0.0), epsilon) for k in keys]
|
| 65 |
-
total = sum(smoothed)
|
| 66 |
-
return [v / total for v in smoothed]
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
def kl_divergence(p: dict[str, float], q: dict[str, float]) -> float:
|
| 70 |
-
"""KL-divergence ``D(P||Q)`` en bits, sur l'union des clés.
|
| 71 |
-
|
| 72 |
-
Les distributions n'ont pas besoin de partager exactement les mêmes
|
| 73 |
-
clés ; les clés manquantes sont lissées à ``epsilon`` puis
|
| 74 |
-
renormalisées.
|
| 75 |
-
|
| 76 |
-
Returns
|
| 77 |
-
-------
|
| 78 |
-
float
|
| 79 |
-
``D(P||Q) ≥ 0``. Vaut 0 si et seulement si P == Q. N'est pas
|
| 80 |
-
symétrique : ``kl(p, q) != kl(q, p)`` en général.
|
| 81 |
-
"""
|
| 82 |
-
keys = sorted(set(p.keys()) | set(q.keys()))
|
| 83 |
-
if not keys:
|
| 84 |
-
return 0.0
|
| 85 |
-
p_vec = _smoothed_distribution(p, keys)
|
| 86 |
-
q_vec = _smoothed_distribution(q, keys)
|
| 87 |
-
return sum(pi * math.log2(pi / qi) for pi, qi in zip(p_vec, q_vec))
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
def jensen_shannon_divergence(
|
| 91 |
-
p: dict[str, float],
|
| 92 |
-
q: dict[str, float],
|
| 93 |
-
) -> float:
|
| 94 |
-
"""JS-divergence symétrique en bits, bornée dans ``[0, 1]``.
|
| 95 |
-
|
| 96 |
-
``JS(P, Q) = ½ D(P||M) + ½ D(Q||M)`` avec ``M = (P + Q) / 2``.
|
| 97 |
-
Symétrique et bornée — préférable à la KL pour construire une
|
| 98 |
-
matrice triangulaire de divergences entre moteurs.
|
| 99 |
-
"""
|
| 100 |
-
keys = sorted(set(p.keys()) | set(q.keys()))
|
| 101 |
-
if not keys:
|
| 102 |
-
return 0.0
|
| 103 |
-
p_vec = _smoothed_distribution(p, keys)
|
| 104 |
-
q_vec = _smoothed_distribution(q, keys)
|
| 105 |
-
m_vec = [(pi + qi) / 2.0 for pi, qi in zip(p_vec, q_vec)]
|
| 106 |
-
|
| 107 |
-
def _kl(a: list[float], b: list[float]) -> float:
|
| 108 |
-
return sum(ai * math.log2(ai / bi) for ai, bi in zip(a, b) if ai > 0)
|
| 109 |
-
|
| 110 |
-
js = 0.5 * _kl(p_vec, m_vec) + 0.5 * _kl(q_vec, m_vec)
|
| 111 |
-
# Borne théorique : JS ∈ [0, 1] en bits. Clamp pour absorber les
|
| 112 |
-
# erreurs d'arrondi flottant.
|
| 113 |
-
return max(0.0, min(1.0, js))
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
def taxonomy_divergence_matrix(
|
| 117 |
-
distributions: dict[str, dict[str, float]],
|
| 118 |
-
metric: str = "js",
|
| 119 |
-
) -> dict[str, dict[str, float]]:
|
| 120 |
-
"""Construit la matrice de divergence triangulaire entre moteurs.
|
| 121 |
-
|
| 122 |
-
Parameters
|
| 123 |
-
----------
|
| 124 |
-
distributions:
|
| 125 |
-
``{engine_name: {error_class: probability}}``. Chaque
|
| 126 |
-
distribution doit sommer à environ 1 (pas de validation stricte
|
| 127 |
-
— les distributions taxonomiques de Picarones sont déjà
|
| 128 |
-
normalisées par ``aggregate_taxonomy``).
|
| 129 |
-
metric:
|
| 130 |
-
``"js"`` (défaut, symétrique) ou ``"kl"`` (asymétrique).
|
| 131 |
-
|
| 132 |
-
Returns
|
| 133 |
-
-------
|
| 134 |
-
dict[str, dict[str, float]]
|
| 135 |
-
Matrice ``{engine_a: {engine_b: divergence}}`` symétrique pour
|
| 136 |
-
``js``, asymétrique pour ``kl``. La diagonale vaut 0.
|
| 137 |
-
"""
|
| 138 |
-
if metric not in ("js", "kl"):
|
| 139 |
-
raise ValueError(f"metric doit être 'js' ou 'kl' — reçu {metric!r}")
|
| 140 |
-
fn = jensen_shannon_divergence if metric == "js" else kl_divergence
|
| 141 |
-
|
| 142 |
-
engines = sorted(distributions.keys())
|
| 143 |
-
matrix: dict[str, dict[str, float]] = {a: {} for a in engines}
|
| 144 |
-
for a in engines:
|
| 145 |
-
for b in engines:
|
| 146 |
-
if a == b:
|
| 147 |
-
matrix[a][b] = 0.0
|
| 148 |
-
elif metric == "js" and b in matrix and a in matrix[b]:
|
| 149 |
-
# Symétrique : recopie pour éviter de recalculer
|
| 150 |
-
matrix[a][b] = matrix[b][a]
|
| 151 |
-
else:
|
| 152 |
-
matrix[a][b] = fn(distributions[a], distributions[b])
|
| 153 |
-
return matrix
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 157 |
-
# Complémentarité (oracle token recall)
|
| 158 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
def _word_multiset(text: str) -> Counter[str]:
|
| 162 |
-
"""Décomposition en multiset de tokens (séparateur whitespace)."""
|
| 163 |
-
return Counter(tok for tok in text.split() if tok)
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
def oracle_token_recall(
|
| 167 |
-
reference: str,
|
| 168 |
-
hypotheses: dict[str, str],
|
| 169 |
-
) -> float:
|
| 170 |
-
"""Borne supérieure (proxy bag-of-words) du token-recall atteignable
|
| 171 |
-
par un voting majoritaire entre tous les moteurs fournis.
|
| 172 |
-
|
| 173 |
-
Pour chaque token de la référence (avec sa multiplicité), on
|
| 174 |
-
considère qu'il est "préservé" par l'ensemble si au moins un moteur
|
| 175 |
-
en produit une occurrence non encore comptée. Le score est le ratio
|
| 176 |
-
d'occurrences GT préservées sur le total.
|
| 177 |
-
|
| 178 |
-
Parameters
|
| 179 |
-
----------
|
| 180 |
-
reference:
|
| 181 |
-
Texte GT.
|
| 182 |
-
hypotheses:
|
| 183 |
-
``{engine_name: hypothesis_text}``.
|
| 184 |
-
|
| 185 |
-
Returns
|
| 186 |
-
-------
|
| 187 |
-
float
|
| 188 |
-
Ratio dans ``[0, 1]``. ``1.0`` = chaque token GT est présent
|
| 189 |
-
dans au moins une hypothèse à hauteur de sa multiplicité.
|
| 190 |
-
|
| 191 |
-
Note
|
| 192 |
-
----
|
| 193 |
-
Cette borne est **optimiste** (supérieure à la vraie borne par
|
| 194 |
-
voting séquentiel) car elle ignore l'ordre d'apparition. Pour le
|
| 195 |
-
diagnostic « un voting vaut-il l'effort ? » le proxy suffit ; pour
|
| 196 |
-
une vraie borne il faudrait un alignement séquentiel.
|
| 197 |
-
"""
|
| 198 |
-
ref_counter = _word_multiset(reference)
|
| 199 |
-
if not ref_counter or not hypotheses:
|
| 200 |
-
return 1.0 if not ref_counter else 0.0
|
| 201 |
-
|
| 202 |
-
hyp_counters = [_word_multiset(h) for h in hypotheses.values()]
|
| 203 |
-
total_ref = sum(ref_counter.values())
|
| 204 |
-
preserved = 0
|
| 205 |
-
for token, gt_count in ref_counter.items():
|
| 206 |
-
# Pour chaque moteur, le nombre d'occurrences disponibles, plafonné
|
| 207 |
-
# à la multiplicité GT. L'oracle prend le max sur les moteurs.
|
| 208 |
-
best = max((min(gt_count, hc.get(token, 0)) for hc in hyp_counters), default=0)
|
| 209 |
-
preserved += best
|
| 210 |
-
return preserved / total_ref
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
def complementarity_gap(
|
| 214 |
-
reference: str,
|
| 215 |
-
hypotheses: dict[str, str],
|
| 216 |
-
) -> dict[str, float]:
|
| 217 |
-
"""Compare l'oracle au meilleur moteur seul.
|
| 218 |
-
|
| 219 |
-
Returns
|
| 220 |
-
-------
|
| 221 |
-
dict
|
| 222 |
-
``{
|
| 223 |
-
"oracle_recall": float, # bag-of-words recall de l'oracle
|
| 224 |
-
"best_single_recall": float, # meilleur recall token d'un moteur seul
|
| 225 |
-
"best_engine": str, # nom du moteur correspondant
|
| 226 |
-
"absolute_gap": float, # oracle - best_single (toujours ≥ 0)
|
| 227 |
-
"relative_gap": float, # absolute_gap / (1 - best_single + ε)
|
| 228 |
-
# = fraction des erreurs encore évitables
|
| 229 |
-
# par un ensemble
|
| 230 |
-
}``
|
| 231 |
-
"""
|
| 232 |
-
ref_counter = _word_multiset(reference)
|
| 233 |
-
total = sum(ref_counter.values())
|
| 234 |
-
if not total:
|
| 235 |
-
return {
|
| 236 |
-
"oracle_recall": 1.0,
|
| 237 |
-
"best_single_recall": 1.0,
|
| 238 |
-
"best_engine": "",
|
| 239 |
-
"absolute_gap": 0.0,
|
| 240 |
-
"relative_gap": 0.0,
|
| 241 |
-
}
|
| 242 |
-
|
| 243 |
-
def _single_recall(hyp_text: str) -> float:
|
| 244 |
-
hc = _word_multiset(hyp_text)
|
| 245 |
-
preserved = sum(min(gt, hc.get(tok, 0)) for tok, gt in ref_counter.items())
|
| 246 |
-
return preserved / total
|
| 247 |
-
|
| 248 |
-
if not hypotheses:
|
| 249 |
-
return {
|
| 250 |
-
"oracle_recall": 0.0,
|
| 251 |
-
"best_single_recall": 0.0,
|
| 252 |
-
"best_engine": "",
|
| 253 |
-
"absolute_gap": 0.0,
|
| 254 |
-
"relative_gap": 0.0,
|
| 255 |
-
}
|
| 256 |
-
|
| 257 |
-
per_engine = {name: _single_recall(h) for name, h in hypotheses.items()}
|
| 258 |
-
best_engine, best_recall = max(per_engine.items(), key=lambda kv: kv[1])
|
| 259 |
-
oracle = oracle_token_recall(reference, hypotheses)
|
| 260 |
-
|
| 261 |
-
absolute_gap = max(0.0, oracle - best_recall)
|
| 262 |
-
# relative_gap : fraction des erreurs du meilleur moteur que l'ensemble
|
| 263 |
-
# serait théoriquement capable de récupérer (∈ [0, 1])
|
| 264 |
-
headroom = max(1.0 - best_recall, 1e-12)
|
| 265 |
-
relative_gap = min(1.0, absolute_gap / headroom)
|
| 266 |
-
|
| 267 |
-
return {
|
| 268 |
-
"oracle_recall": oracle,
|
| 269 |
-
"best_single_recall": best_recall,
|
| 270 |
-
"best_engine": best_engine,
|
| 271 |
-
"absolute_gap": absolute_gap,
|
| 272 |
-
"relative_gap": relative_gap,
|
| 273 |
-
}
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
def pairwise_disagreement_rate(
|
| 277 |
-
reference: str,
|
| 278 |
-
hyp_a: str,
|
| 279 |
-
hyp_b: str,
|
| 280 |
-
) -> float:
|
| 281 |
-
"""Fraction de tokens GT pour lesquels A et B sont en désaccord.
|
| 282 |
-
|
| 283 |
-
Un désaccord = (l'un préserve le token, l'autre non) OU
|
| 284 |
-
(les deux le ratent mais avec des substitutions différentes — non
|
| 285 |
-
capturé ici, on reste sur la version simple présence/absence).
|
| 286 |
-
|
| 287 |
-
Returns
|
| 288 |
-
-------
|
| 289 |
-
float
|
| 290 |
-
Ratio dans ``[0, 1]``. ``0`` = A et B font les mêmes choix
|
| 291 |
-
(pas de gain d'ensemble). ``1`` = A et B sont toujours en
|
| 292 |
-
désaccord (gain d'ensemble maximal).
|
| 293 |
-
"""
|
| 294 |
-
ref_counter = _word_multiset(reference)
|
| 295 |
-
if not ref_counter:
|
| 296 |
-
return 0.0
|
| 297 |
-
a = _word_multiset(hyp_a)
|
| 298 |
-
b = _word_multiset(hyp_b)
|
| 299 |
-
total = sum(ref_counter.values())
|
| 300 |
-
disagree = 0
|
| 301 |
-
for tok, gt_count in ref_counter.items():
|
| 302 |
-
a_pres = min(gt_count, a.get(tok, 0))
|
| 303 |
-
b_pres = min(gt_count, b.get(tok, 0))
|
| 304 |
-
# Compte les positions où A et B donnent une réponse différente
|
| 305 |
-
disagree += abs(a_pres - b_pres)
|
| 306 |
-
return disagree / total
|
| 307 |
-
|
| 308 |
-
|
| 309 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 310 |
-
# Agrégation au niveau benchmark (Sprint 36)
|
| 311 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 312 |
-
|
| 313 |
-
|
| 314 |
-
def compute_inter_engine_analysis(
|
| 315 |
-
*,
|
| 316 |
-
per_engine_outputs: dict[str, dict[str, str]],
|
| 317 |
-
ground_truths: dict[str, str],
|
| 318 |
-
taxonomy_distributions: dict[str, dict[str, float]] | None = None,
|
| 319 |
-
divergence_metric: str = "js",
|
| 320 |
-
) -> dict:
|
| 321 |
-
"""Agrège les métriques inter-moteurs sur l'ensemble du corpus.
|
| 322 |
-
|
| 323 |
-
Parameters
|
| 324 |
-
----------
|
| 325 |
-
per_engine_outputs:
|
| 326 |
-
``{engine_name: {doc_id: hypothesis_text}}``. Une entrée par
|
| 327 |
-
moteur, avec une hypothèse par document. Les documents absents
|
| 328 |
-
d'un moteur (échecs, timeouts) sont simplement ignorés pour ce
|
| 329 |
-
moteur — l'oracle est calculé sur les moteurs qui ont produit
|
| 330 |
-
une sortie pour le doc.
|
| 331 |
-
ground_truths:
|
| 332 |
-
``{doc_id: ground_truth_text}``. La GT est la même pour tous
|
| 333 |
-
les moteurs ; on la passe une seule fois.
|
| 334 |
-
taxonomy_distributions:
|
| 335 |
-
``{engine_name: {error_class: probability}}`` — typiquement
|
| 336 |
-
``EngineReport.aggregated_taxonomy["class_distribution"]``. Si
|
| 337 |
-
``None`` ou vide, la divergence taxonomique n'est pas calculée.
|
| 338 |
-
divergence_metric:
|
| 339 |
-
``"js"`` (défaut, symétrique) ou ``"kl"``.
|
| 340 |
-
|
| 341 |
-
Returns
|
| 342 |
-
-------
|
| 343 |
-
dict
|
| 344 |
-
Structure stable consommable par les détecteurs narratifs et le
|
| 345 |
-
rapport HTML :
|
| 346 |
-
``{
|
| 347 |
-
"complementarity": {
|
| 348 |
-
"oracle_recall": float,
|
| 349 |
-
"best_single_recall": float,
|
| 350 |
-
"best_engine": str,
|
| 351 |
-
"absolute_gap": float,
|
| 352 |
-
"relative_gap": float,
|
| 353 |
-
"doc_count": int,
|
| 354 |
-
"per_doc": [{doc_id, oracle, best, gap}, ...] # max 50 docs
|
| 355 |
-
},
|
| 356 |
-
"taxonomy_divergence": {
|
| 357 |
-
"metric": "js"|"kl",
|
| 358 |
-
"matrix": {engine_a: {engine_b: divergence}},
|
| 359 |
-
"max_pair": [engine_a, engine_b, value] # paire la plus divergente
|
| 360 |
-
} | None,
|
| 361 |
-
"engines": [...], # liste des moteurs analysés (ordre stable)
|
| 362 |
-
}``
|
| 363 |
-
"""
|
| 364 |
-
engines = sorted(per_engine_outputs.keys())
|
| 365 |
-
result: dict = {"engines": engines}
|
| 366 |
-
|
| 367 |
-
# ── Complémentarité agrégée doc par doc ──────────────────────────────
|
| 368 |
-
if not engines:
|
| 369 |
-
result["complementarity"] = None
|
| 370 |
-
else:
|
| 371 |
-
total_oracle_preserved = 0
|
| 372 |
-
total_ref_tokens = 0
|
| 373 |
-
per_engine_preserved: dict[str, int] = {name: 0 for name in engines}
|
| 374 |
-
per_doc_records: list[dict] = []
|
| 375 |
-
|
| 376 |
-
for doc_id, gt in ground_truths.items():
|
| 377 |
-
ref_counter = _word_multiset(gt)
|
| 378 |
-
ref_total = sum(ref_counter.values())
|
| 379 |
-
if not ref_total:
|
| 380 |
-
continue
|
| 381 |
-
total_ref_tokens += ref_total
|
| 382 |
-
|
| 383 |
-
doc_hyps: dict[str, str] = {}
|
| 384 |
-
for name in engines:
|
| 385 |
-
hyp = per_engine_outputs.get(name, {}).get(doc_id)
|
| 386 |
-
if hyp is not None:
|
| 387 |
-
doc_hyps[name] = hyp
|
| 388 |
-
|
| 389 |
-
if not doc_hyps:
|
| 390 |
-
continue
|
| 391 |
-
|
| 392 |
-
hyp_counters = {n: _word_multiset(h) for n, h in doc_hyps.items()}
|
| 393 |
-
|
| 394 |
-
doc_oracle = 0
|
| 395 |
-
doc_best_per_engine: dict[str, int] = {n: 0 for n in doc_hyps}
|
| 396 |
-
for tok, gt_count in ref_counter.items():
|
| 397 |
-
# Oracle : meilleur des moteurs sur ce token
|
| 398 |
-
best_for_token = 0
|
| 399 |
-
for name, hc in hyp_counters.items():
|
| 400 |
-
preserved = min(gt_count, hc.get(tok, 0))
|
| 401 |
-
doc_best_per_engine[name] += preserved
|
| 402 |
-
if preserved > best_for_token:
|
| 403 |
-
best_for_token = preserved
|
| 404 |
-
doc_oracle += best_for_token
|
| 405 |
-
|
| 406 |
-
total_oracle_preserved += doc_oracle
|
| 407 |
-
for name, count in doc_best_per_engine.items():
|
| 408 |
-
per_engine_preserved[name] += count
|
| 409 |
-
|
| 410 |
-
doc_best = max(doc_best_per_engine.values()) if doc_best_per_engine else 0
|
| 411 |
-
per_doc_records.append({
|
| 412 |
-
"doc_id": doc_id,
|
| 413 |
-
"oracle_recall": doc_oracle / ref_total,
|
| 414 |
-
"best_single_recall": doc_best / ref_total,
|
| 415 |
-
"absolute_gap": (doc_oracle - doc_best) / ref_total,
|
| 416 |
-
})
|
| 417 |
-
|
| 418 |
-
if total_ref_tokens == 0:
|
| 419 |
-
result["complementarity"] = None
|
| 420 |
-
else:
|
| 421 |
-
oracle_recall = total_oracle_preserved / total_ref_tokens
|
| 422 |
-
recalls = {
|
| 423 |
-
name: per_engine_preserved[name] / total_ref_tokens
|
| 424 |
-
for name in engines
|
| 425 |
-
}
|
| 426 |
-
best_engine, best_recall = max(recalls.items(), key=lambda kv: kv[1])
|
| 427 |
-
absolute_gap = max(0.0, oracle_recall - best_recall)
|
| 428 |
-
headroom = max(1.0 - best_recall, 1e-12)
|
| 429 |
-
relative_gap = min(1.0, absolute_gap / headroom)
|
| 430 |
-
|
| 431 |
-
# Garder les ``per_doc_records`` les plus instructifs : tri par
|
| 432 |
-
# gap absolu décroissant, top 50. Les détecteurs narratifs
|
| 433 |
-
# n'en consomment que quelques-uns.
|
| 434 |
-
per_doc_records.sort(key=lambda r: r["absolute_gap"], reverse=True)
|
| 435 |
-
per_doc_top = per_doc_records[:50]
|
| 436 |
-
|
| 437 |
-
result["complementarity"] = {
|
| 438 |
-
"oracle_recall": oracle_recall,
|
| 439 |
-
"best_single_recall": best_recall,
|
| 440 |
-
"best_engine": best_engine,
|
| 441 |
-
"absolute_gap": absolute_gap,
|
| 442 |
-
"relative_gap": relative_gap,
|
| 443 |
-
"doc_count": len(per_doc_records),
|
| 444 |
-
"per_engine_recall": recalls,
|
| 445 |
-
"per_doc": per_doc_top,
|
| 446 |
-
}
|
| 447 |
-
|
| 448 |
-
# ── Divergence taxonomique ─────────────────────────────────────────
|
| 449 |
-
if not taxonomy_distributions:
|
| 450 |
-
result["taxonomy_divergence"] = None
|
| 451 |
-
else:
|
| 452 |
-
matrix = taxonomy_divergence_matrix(
|
| 453 |
-
taxonomy_distributions,
|
| 454 |
-
metric=divergence_metric,
|
| 455 |
-
)
|
| 456 |
-
# Cherche la paire la plus divergente (utile pour la synthèse
|
| 457 |
-
# narrative qui veut nommer les deux moteurs candidats à
|
| 458 |
-
# l'ensemble).
|
| 459 |
-
max_pair: tuple[str, str, float] = ("", "", 0.0)
|
| 460 |
-
names = sorted(matrix.keys())
|
| 461 |
-
for i, a in enumerate(names):
|
| 462 |
-
for b in names[i + 1:]:
|
| 463 |
-
v = matrix[a][b]
|
| 464 |
-
if v > max_pair[2]:
|
| 465 |
-
max_pair = (a, b, v)
|
| 466 |
-
|
| 467 |
-
result["taxonomy_divergence"] = {
|
| 468 |
-
"metric": divergence_metric,
|
| 469 |
-
"matrix": matrix,
|
| 470 |
-
"max_pair": list(max_pair) if max_pair[2] > 0 else None,
|
| 471 |
-
}
|
| 472 |
-
|
| 473 |
-
return result
|
| 474 |
-
|
| 475 |
|
| 476 |
-
|
| 477 |
-
|
| 478 |
-
"
|
| 479 |
-
|
| 480 |
-
"oracle_token_recall",
|
| 481 |
-
"complementarity_gap",
|
| 482 |
-
"pairwise_disagreement_rate",
|
| 483 |
-
"compute_inter_engine_analysis",
|
| 484 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.inter_engine`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.inter_engine import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.inter_engine import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.inter_engine as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,280 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
sépare-t-il bien le texte principal de la glose ? »*. Le score de
|
| 9 |
-
structure global de Picarones (Sprint 5) agrège fusion/fragmentation
|
| 10 |
-
de lignes en un seul nombre — utile mais non typé. Ce module
|
| 11 |
-
discrimine par **type de région** ALTO/PAGE (``TextRegion``,
|
| 12 |
-
``MarginNote``, ``Header``, ``Footer``, ``Drop-Cap``...) en
|
| 13 |
-
appliquant le pattern ICDAR layout standard :
|
| 14 |
-
|
| 15 |
-
- **TP** : région GT et région hypothèse de **même type** avec
|
| 16 |
-
chevauchement IoU ≥ seuil (alignement greedy par IoU décroissant),
|
| 17 |
-
- **FN** : région GT non matchée,
|
| 18 |
-
- **FP** : région hypothèse non matchée,
|
| 19 |
-
- F1 calculé global et par type.
|
| 20 |
-
|
| 21 |
-
Le pattern d'alignement est le même que pour le NER (Sprint 38) — on
|
| 22 |
-
réutilise une approche éprouvée plutôt que d'en inventer une nouvelle.
|
| 23 |
-
|
| 24 |
-
Stratégie de découpage
|
| 25 |
-
----------------------
|
| 26 |
-
Cohérente avec NER (Sprint 38), Flesch (Sprint 52), Reading order F1
|
| 27 |
-
(Sprint 53) : couche de calcul pure d'abord. L'utilisateur fournit
|
| 28 |
-
deux listes de ``Region`` (typiquement extraites de ALTO/PAGE par un
|
| 29 |
-
parser amont — le parser ALTO/PAGE standard de Picarones suivra
|
| 30 |
-
dans un sprint dédié). Pas de câblage runner ni de vue HTML ici.
|
| 31 |
-
|
| 32 |
-
Convention de coordonnées
|
| 33 |
-
-------------------------
|
| 34 |
-
Une bbox est un tuple ``(x, y, width, height)`` en pixels (origine
|
| 35 |
-
en haut à gauche, axe y vers le bas — convention ALTO et PAGE
|
| 36 |
-
standard). L'IoU est calculée sur l'aire d'intersection / union des
|
| 37 |
-
rectangles.
|
| 38 |
"""
|
| 39 |
|
| 40 |
-
from
|
| 41 |
-
|
| 42 |
-
import logging
|
| 43 |
-
from dataclasses import dataclass
|
| 44 |
-
from typing import Iterable
|
| 45 |
-
|
| 46 |
-
logger = logging.getLogger(__name__)
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 50 |
-
# Modèle de données
|
| 51 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
@dataclass(frozen=True)
|
| 55 |
-
class Region:
|
| 56 |
-
"""Une région ALTO/PAGE alignable sur sa GT.
|
| 57 |
-
|
| 58 |
-
Attributs
|
| 59 |
-
---------
|
| 60 |
-
id:
|
| 61 |
-
Identifiant unique au sein de la séquence (ex. ``"r_1"``,
|
| 62 |
-
``"region_main"``). Informatif — l'alignement se fait par IoU,
|
| 63 |
-
pas par ID.
|
| 64 |
-
type:
|
| 65 |
-
Catégorie de la région (``"TextRegion"``, ``"MarginNote"``,
|
| 66 |
-
``"Header"``, etc.). Comparaison **case-insensitive**.
|
| 67 |
-
bbox:
|
| 68 |
-
Rectangle ``(x, y, width, height)`` en pixels, origine en haut
|
| 69 |
-
à gauche. Doit avoir width > 0 et height > 0.
|
| 70 |
-
"""
|
| 71 |
-
|
| 72 |
-
id: str
|
| 73 |
-
type: str
|
| 74 |
-
bbox: tuple[int, int, int, int]
|
| 75 |
-
|
| 76 |
-
def __post_init__(self) -> None:
|
| 77 |
-
x, y, w, h = self.bbox
|
| 78 |
-
if w <= 0 or h <= 0:
|
| 79 |
-
raise ValueError(
|
| 80 |
-
f"Region {self.id!r} : bbox invalide (w={w}, h={h}). "
|
| 81 |
-
"width et height doivent être strictement positifs."
|
| 82 |
-
)
|
| 83 |
-
|
| 84 |
-
@property
|
| 85 |
-
def area(self) -> int:
|
| 86 |
-
_, _, w, h = self.bbox
|
| 87 |
-
return w * h
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
def _to_region(obj: Region | dict) -> Region:
|
| 91 |
-
"""Coerce un dict en ``Region`` (clés ``id``, ``type``, ``bbox``)."""
|
| 92 |
-
if isinstance(obj, Region):
|
| 93 |
-
return obj
|
| 94 |
-
return Region(
|
| 95 |
-
id=str(obj["id"]),
|
| 96 |
-
type=str(obj["type"]),
|
| 97 |
-
bbox=tuple(obj["bbox"]), # type: ignore[arg-type]
|
| 98 |
-
)
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 102 |
-
# IoU + alignement greedy
|
| 103 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
def _iou_bbox(a: Region, b: Region) -> float:
|
| 107 |
-
"""Intersection-over-Union de deux bboxes ``(x, y, w, h)``."""
|
| 108 |
-
ax, ay, aw, ah = a.bbox
|
| 109 |
-
bx, by, bw, bh = b.bbox
|
| 110 |
-
inter_x = max(ax, bx)
|
| 111 |
-
inter_y = max(ay, by)
|
| 112 |
-
inter_x_end = min(ax + aw, bx + bw)
|
| 113 |
-
inter_y_end = min(ay + ah, by + bh)
|
| 114 |
-
inter_w = max(0, inter_x_end - inter_x)
|
| 115 |
-
inter_h = max(0, inter_y_end - inter_y)
|
| 116 |
-
inter = inter_w * inter_h
|
| 117 |
-
if inter == 0:
|
| 118 |
-
return 0.0
|
| 119 |
-
union = a.area + b.area - inter
|
| 120 |
-
if union <= 0:
|
| 121 |
-
return 0.0
|
| 122 |
-
return inter / union
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
def _align_regions(
|
| 126 |
-
references: list[Region],
|
| 127 |
-
hypotheses: list[Region],
|
| 128 |
-
iou_threshold: float,
|
| 129 |
-
) -> tuple[list[tuple[int, int, float]], set[int], set[int]]:
|
| 130 |
-
"""Appareillage greedy par IoU décroissant ; same type requis.
|
| 131 |
-
|
| 132 |
-
Renvoie ``(matches, unmatched_refs, unmatched_hyps)`` —
|
| 133 |
-
``matches`` est une liste de ``(idx_ref, idx_hyp, iou)``.
|
| 134 |
-
"""
|
| 135 |
-
candidates: list[tuple[float, int, int]] = []
|
| 136 |
-
for i, r in enumerate(references):
|
| 137 |
-
for j, h in enumerate(hypotheses):
|
| 138 |
-
if r.type.casefold() != h.type.casefold():
|
| 139 |
-
continue
|
| 140 |
-
iou = _iou_bbox(r, h)
|
| 141 |
-
if iou >= iou_threshold:
|
| 142 |
-
candidates.append((iou, i, j))
|
| 143 |
-
|
| 144 |
-
# Tri stable : IoU décroissant, puis indices croissants pour
|
| 145 |
-
# déterminisme sur égalités.
|
| 146 |
-
candidates.sort(key=lambda t: (-t[0], t[1], t[2]))
|
| 147 |
-
|
| 148 |
-
matched_refs: set[int] = set()
|
| 149 |
-
matched_hyps: set[int] = set()
|
| 150 |
-
matches: list[tuple[int, int, float]] = []
|
| 151 |
-
for iou, i, j in candidates:
|
| 152 |
-
if i in matched_refs or j in matched_hyps:
|
| 153 |
-
continue
|
| 154 |
-
matched_refs.add(i)
|
| 155 |
-
matched_hyps.add(j)
|
| 156 |
-
matches.append((i, j, iou))
|
| 157 |
-
|
| 158 |
-
unmatched_refs = set(range(len(references))) - matched_refs
|
| 159 |
-
unmatched_hyps = set(range(len(hypotheses))) - matched_hyps
|
| 160 |
-
return matches, unmatched_refs, unmatched_hyps
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 164 |
-
# Métrique principale
|
| 165 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
def _prf(tp: int, fp: int, fn: int) -> dict[str, float]:
|
| 169 |
-
p = tp / (tp + fp) if (tp + fp) > 0 else 0.0
|
| 170 |
-
r = tp / (tp + fn) if (tp + fn) > 0 else 0.0
|
| 171 |
-
f1 = 2 * p * r / (p + r) if (p + r) > 0 else 0.0
|
| 172 |
-
return {"precision": p, "recall": r, "f1": f1, "support": tp + fn}
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
def compute_layout_metrics(
|
| 176 |
-
reference_regions: Iterable[Region | dict] | None,
|
| 177 |
-
hypothesis_regions: Iterable[Region | dict] | None,
|
| 178 |
-
iou_threshold: float = 0.5,
|
| 179 |
-
) -> dict:
|
| 180 |
-
"""Calcule precision/recall/F1 sur le layout par type de région.
|
| 181 |
-
|
| 182 |
-
Parameters
|
| 183 |
-
----------
|
| 184 |
-
reference_regions:
|
| 185 |
-
Liste de régions GT (``Region`` ou dict ``{id, type, bbox}``).
|
| 186 |
-
hypothesis_regions:
|
| 187 |
-
Liste de régions produites par le moteur OCR/HTR ou un
|
| 188 |
-
layout-detector.
|
| 189 |
-
iou_threshold:
|
| 190 |
-
Seuil de chevauchement minimal pour déclarer un appariement
|
| 191 |
-
(défaut : 0,5 — convention ICDAR).
|
| 192 |
-
|
| 193 |
-
Returns
|
| 194 |
-
-------
|
| 195 |
-
dict
|
| 196 |
-
``{
|
| 197 |
-
"global": {"precision", "recall", "f1", "support"},
|
| 198 |
-
"per_type": {type_name: {"precision", ...}},
|
| 199 |
-
"true_positives": int,
|
| 200 |
-
"false_positives": int,
|
| 201 |
-
"false_negatives": int,
|
| 202 |
-
"missed_regions": list[dict], # GT non matchées
|
| 203 |
-
"hallucinated_regions": list[dict], # hyp non matchées
|
| 204 |
-
"iou_threshold": float,
|
| 205 |
-
}``
|
| 206 |
-
|
| 207 |
-
Cas dégénérés
|
| 208 |
-
-------------
|
| 209 |
-
- Deux listes vides → F1 = 0 et tous compteurs à 0.
|
| 210 |
-
- GT vide + hyp non-vide → F1 = 0 (toutes hyp = FP).
|
| 211 |
-
- hyp vide + GT non-vide → F1 = 0 (toutes GT = FN).
|
| 212 |
-
"""
|
| 213 |
-
refs = [_to_region(r) for r in (reference_regions or [])]
|
| 214 |
-
hyps = [_to_region(h) for h in (hypothesis_regions or [])]
|
| 215 |
-
|
| 216 |
-
matches, unmatched_refs, unmatched_hyps = _align_regions(
|
| 217 |
-
refs, hyps, iou_threshold,
|
| 218 |
-
)
|
| 219 |
-
|
| 220 |
-
tp = len(matches)
|
| 221 |
-
fn = len(unmatched_refs)
|
| 222 |
-
fp = len(unmatched_hyps)
|
| 223 |
-
|
| 224 |
-
cat_tp: dict[str, int] = {}
|
| 225 |
-
cat_fn: dict[str, int] = {}
|
| 226 |
-
cat_fp: dict[str, int] = {}
|
| 227 |
-
for i, _j, _iou in matches:
|
| 228 |
-
cat = refs[i].type
|
| 229 |
-
cat_tp[cat] = cat_tp.get(cat, 0) + 1
|
| 230 |
-
for i in unmatched_refs:
|
| 231 |
-
cat = refs[i].type
|
| 232 |
-
cat_fn[cat] = cat_fn.get(cat, 0) + 1
|
| 233 |
-
for j in unmatched_hyps:
|
| 234 |
-
cat = hyps[j].type
|
| 235 |
-
cat_fp[cat] = cat_fp.get(cat, 0) + 1
|
| 236 |
-
|
| 237 |
-
all_categories = sorted(set(cat_tp) | set(cat_fn) | set(cat_fp))
|
| 238 |
-
per_type = {
|
| 239 |
-
cat: _prf(
|
| 240 |
-
cat_tp.get(cat, 0),
|
| 241 |
-
cat_fp.get(cat, 0),
|
| 242 |
-
cat_fn.get(cat, 0),
|
| 243 |
-
)
|
| 244 |
-
for cat in all_categories
|
| 245 |
-
}
|
| 246 |
-
|
| 247 |
-
return {
|
| 248 |
-
"global": _prf(tp, fp, fn),
|
| 249 |
-
"per_type": per_type,
|
| 250 |
-
"true_positives": tp,
|
| 251 |
-
"false_positives": fp,
|
| 252 |
-
"false_negatives": fn,
|
| 253 |
-
"missed_regions": [
|
| 254 |
-
{"id": refs[i].id, "type": refs[i].type, "bbox": list(refs[i].bbox)}
|
| 255 |
-
for i in sorted(unmatched_refs)
|
| 256 |
-
],
|
| 257 |
-
"hallucinated_regions": [
|
| 258 |
-
{"id": hyps[j].id, "type": hyps[j].type, "bbox": list(hyps[j].bbox)}
|
| 259 |
-
for j in sorted(unmatched_hyps)
|
| 260 |
-
],
|
| 261 |
-
"iou_threshold": iou_threshold,
|
| 262 |
-
}
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
def layout_f1(
|
| 266 |
-
reference_regions: Iterable[Region | dict] | None,
|
| 267 |
-
hypothesis_regions: Iterable[Region | dict] | None,
|
| 268 |
-
iou_threshold: float = 0.5,
|
| 269 |
-
) -> float:
|
| 270 |
-
"""Raccourci : F1 global du layout."""
|
| 271 |
-
return compute_layout_metrics(
|
| 272 |
-
reference_regions, hypothesis_regions, iou_threshold,
|
| 273 |
-
)["global"]["f1"]
|
| 274 |
-
|
| 275 |
|
| 276 |
-
|
| 277 |
-
|
| 278 |
-
"
|
| 279 |
-
|
| 280 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.layout`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.layout import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.layout import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.layout as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
@@ -1,561 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
qui s'est passé** dans le benchmark : qui gagne, qui s'effondre,
|
| 9 |
-
qui est fragile. Ce sprint répond à une question
|
| 10 |
-
complémentaire : **sur quelle dimension le bénéfice attendu d'une
|
| 11 |
-
amélioration serait-il le plus visible ?**
|
| 12 |
-
|
| 13 |
-
Pas de prescription
|
| 14 |
-
-------------------
|
| 15 |
-
Picarones est un **outil de recherche**, pas un atelier de
|
| 16 |
-
production. Le module ne dit jamais *« faites X »* ni
|
| 17 |
-
*« utilisez le moteur Y »* ; il agrège des **observations
|
| 18 |
-
factuelles** déjà calculées dans d'autres modules (Sprints 75-81)
|
| 19 |
-
et les présente comme un récapitulatif compact en bas du rapport.
|
| 20 |
-
Le chercheur lit, juge et arbitre.
|
| 21 |
-
|
| 22 |
-
Exemples de leviers émis
|
| 23 |
-
------------------------
|
| 24 |
-
- *« 65 % des erreurs de Tesseract sont de classe récupérable
|
| 25 |
-
(case_error, ligature_error, abbreviation_error) — un
|
| 26 |
-
post-processing trivial absorberait une partie. »*
|
| 27 |
-
- *« 12 % de vos documents concentrent 78 % du CER total
|
| 28 |
-
(Pareto-CER). »*
|
| 29 |
-
- *« Le déficit projeté du moteur le plus fragile sur le corpus
|
| 30 |
-
réel est de 4,2 points de CER (Sprint 81). »*
|
| 31 |
-
- *« Le top-3 des tokens GT systématiquement modernisés est
|
| 32 |
-
maistre, nostre, veoir (Sprint 80). »*
|
| 33 |
-
|
| 34 |
-
Structure
|
| 35 |
-
---------
|
| 36 |
-
Module parallèle au registre narratif Sprint 19 : `Lever` est la
|
| 37 |
-
dataclass équivalente à `Fact`, `LeverImportance` reprend la
|
| 38 |
-
sémantique de `FactImportance`, `@register_lever` indexe les
|
| 39 |
-
détecteurs. Garde-fou anti-hallucination identique : chaque
|
| 40 |
-
nombre rendu doit être présent dans le `payload` du `Lever`.
|
| 41 |
-
|
| 42 |
-
Les détecteurs lisent **uniquement** des structures déjà
|
| 43 |
-
construites par le pipeline du benchmark — ils ne calculent rien
|
| 44 |
-
de nouveau, ils synthétisent. C'est pourquoi le module est
|
| 45 |
-
résolument optionnel : si un benchmark n'expose pas
|
| 46 |
-
`taxonomy_aggregated`, `inter_engine_analysis`, `corpus_difficulty`,
|
| 47 |
-
`lexical_modernization` ou `robustness_projection`, le détecteur
|
| 48 |
-
correspondant retourne tout simplement `[]`.
|
| 49 |
"""
|
| 50 |
|
| 51 |
-
from
|
| 52 |
-
|
| 53 |
-
import logging
|
| 54 |
-
import threading
|
| 55 |
-
from dataclasses import dataclass
|
| 56 |
-
from enum import Enum
|
| 57 |
-
from typing import Callable
|
| 58 |
-
|
| 59 |
-
logger = logging.getLogger(__name__)
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 63 |
-
# Modèle
|
| 64 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
class LeverType(str, Enum):
|
| 68 |
-
"""Types de leviers détectés."""
|
| 69 |
-
|
| 70 |
-
DOMINANT_RECOVERABLE_CLASS = "dominant_recoverable_class"
|
| 71 |
-
"""Une part importante des erreurs d'un moteur est dans des classes
|
| 72 |
-
catégorisées « récupérables » (Sprint 77)."""
|
| 73 |
-
|
| 74 |
-
PARETO_CONCENTRATION = "pareto_concentration"
|
| 75 |
-
"""Une fraction minoritaire de documents concentre une fraction
|
| 76 |
-
majoritaire du CER total — l'inspection ciblée est rentable."""
|
| 77 |
-
|
| 78 |
-
COMPLEMENTARITY_OBSERVATION = "complementarity_observation"
|
| 79 |
-
"""Le `complementarity_gap` (Sprint 35) entre l'oracle et le
|
| 80 |
-
meilleur moteur seul est non négligeable — observation factuelle,
|
| 81 |
-
aucune recommandation d'ensemble."""
|
| 82 |
-
|
| 83 |
-
LEXICAL_MODERNIZATION_OBSERVATION = "lexical_modernization_observation"
|
| 84 |
-
"""Top-N des tokens GT systématiquement modernisés (Sprint 80)."""
|
| 85 |
-
|
| 86 |
-
ROBUSTNESS_PROJECTION_OBSERVATION = "robustness_projection_observation"
|
| 87 |
-
"""Déficit projeté global le plus important pour un moteur sur
|
| 88 |
-
le corpus réel (Sprint 81)."""
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
class LeverImportance(int, Enum):
|
| 92 |
-
"""Importance éditoriale d'un levier."""
|
| 93 |
-
|
| 94 |
-
HIGH = 70
|
| 95 |
-
MEDIUM = 40
|
| 96 |
-
LOW = 10
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
@dataclass
|
| 100 |
-
class Lever:
|
| 101 |
-
"""Observation factuelle synthétisable en encart « Leviers ».
|
| 102 |
-
|
| 103 |
-
Attributes
|
| 104 |
-
----------
|
| 105 |
-
type:
|
| 106 |
-
Le type de levier (voir `LeverType`).
|
| 107 |
-
importance:
|
| 108 |
-
Score qui décide l'ordre d'affichage.
|
| 109 |
-
payload:
|
| 110 |
-
Données brutes — **tout chiffre rendu dans le HTML doit
|
| 111 |
-
provenir d'ici**, jamais d'un calcul du renderer.
|
| 112 |
-
engines_involved:
|
| 113 |
-
Noms des moteurs concernés (peut être vide pour un levier
|
| 114 |
-
corpus-wide).
|
| 115 |
-
"""
|
| 116 |
-
|
| 117 |
-
type: LeverType
|
| 118 |
-
importance: LeverImportance
|
| 119 |
-
payload: dict
|
| 120 |
-
engines_involved: tuple[str, ...] = ()
|
| 121 |
-
|
| 122 |
-
def as_dict(self) -> dict:
|
| 123 |
-
return {
|
| 124 |
-
"type": self.type.value,
|
| 125 |
-
"importance": int(self.importance),
|
| 126 |
-
"payload": self.payload,
|
| 127 |
-
"engines_involved": list(self.engines_involved),
|
| 128 |
-
}
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 132 |
-
# Registre
|
| 133 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
LeverDetectorFn = Callable[[dict], list[Lever]]
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
@dataclass(frozen=True)
|
| 140 |
-
class LeverDetectorEntry:
|
| 141 |
-
lever_type: LeverType
|
| 142 |
-
fn: LeverDetectorFn
|
| 143 |
-
priority: int
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
_LEVER_REGISTRY: dict[LeverType, LeverDetectorEntry] = {}
|
| 147 |
-
_LEVER_REGISTRY_LOCK = threading.Lock()
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
def register_lever(
|
| 151 |
-
lever_type: LeverType,
|
| 152 |
-
*,
|
| 153 |
-
priority: int,
|
| 154 |
-
) -> Callable[[LeverDetectorFn], LeverDetectorFn]:
|
| 155 |
-
"""Décorateur : enregistre un détecteur de levier.
|
| 156 |
-
|
| 157 |
-
Une seule fonction par type — réenregistrer lève `ValueError`.
|
| 158 |
-
"""
|
| 159 |
-
def _decorator(fn: LeverDetectorFn) -> LeverDetectorFn:
|
| 160 |
-
with _LEVER_REGISTRY_LOCK:
|
| 161 |
-
if lever_type in _LEVER_REGISTRY:
|
| 162 |
-
raise ValueError(
|
| 163 |
-
f"Détecteur déjà enregistré pour {lever_type.value!r} : "
|
| 164 |
-
f"{_LEVER_REGISTRY[lever_type].fn.__name__}."
|
| 165 |
-
)
|
| 166 |
-
_LEVER_REGISTRY[lever_type] = LeverDetectorEntry(
|
| 167 |
-
lever_type=lever_type, fn=fn, priority=int(priority),
|
| 168 |
-
)
|
| 169 |
-
return fn
|
| 170 |
-
return _decorator
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
def unregister_lever(lever_type: LeverType) -> None:
|
| 174 |
-
with _LEVER_REGISTRY_LOCK:
|
| 175 |
-
_LEVER_REGISTRY.pop(lever_type, None)
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
def iter_lever_detectors() -> list[LeverDetectorEntry]:
|
| 179 |
-
with _LEVER_REGISTRY_LOCK:
|
| 180 |
-
entries = list(_LEVER_REGISTRY.values())
|
| 181 |
-
entries.sort(key=lambda e: e.priority)
|
| 182 |
-
return entries
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
def detect_levers(benchmark_data: dict) -> list[Lever]:
|
| 186 |
-
"""Applique tous les détecteurs enregistrés et trie par importance
|
| 187 |
-
décroissante puis priorité d'enregistrement croissante."""
|
| 188 |
-
levers: list[Lever] = []
|
| 189 |
-
for entry in iter_lever_detectors():
|
| 190 |
-
try:
|
| 191 |
-
result = entry.fn(benchmark_data)
|
| 192 |
-
except Exception as e:
|
| 193 |
-
logger.warning(
|
| 194 |
-
"[levers.detector.%s] fonctionnalité dégradée : %s",
|
| 195 |
-
entry.lever_type.value, e,
|
| 196 |
-
)
|
| 197 |
-
continue
|
| 198 |
-
if result:
|
| 199 |
-
levers.extend(result)
|
| 200 |
-
# Tri stable : importance décroissante d'abord
|
| 201 |
-
levers.sort(key=lambda lv: -int(lv.importance))
|
| 202 |
-
return levers
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 206 |
-
# Détecteurs
|
| 207 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
# Catégorisation reprise du Sprint 77 (taxonomy_comparison.py).
|
| 211 |
-
# Volontairement dupliquée ici pour ne pas introduire d'import
|
| 212 |
-
# circulaire — la sémantique est gelée.
|
| 213 |
-
_RECOVERABILITY: dict[str, str] = {
|
| 214 |
-
"case_error": "recoverable",
|
| 215 |
-
"ligature_error": "recoverable",
|
| 216 |
-
"abbreviation_error": "recoverable",
|
| 217 |
-
"diacritic_error": "difficult",
|
| 218 |
-
"visual_confusion": "difficult",
|
| 219 |
-
"hapax": "difficult",
|
| 220 |
-
"lacuna": "irrecoverable",
|
| 221 |
-
"oov_character": "irrecoverable",
|
| 222 |
-
"segmentation_error": "irrecoverable",
|
| 223 |
-
}
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
@register_lever(LeverType.DOMINANT_RECOVERABLE_CLASS, priority=10)
|
| 227 |
-
def detect_dominant_recoverable_class(
|
| 228 |
-
benchmark_data: dict,
|
| 229 |
-
*,
|
| 230 |
-
threshold: float = 0.30,
|
| 231 |
-
) -> list[Lever]:
|
| 232 |
-
"""Émet un levier si ≥ `threshold` des erreurs d'un moteur sont
|
| 233 |
-
classifiées récupérables (catégorisation Sprint 77).
|
| 234 |
-
|
| 235 |
-
Lit `benchmark_data["engines"][i]["aggregated_taxonomy"]` —
|
| 236 |
-
structure produite par le runner historique. Si absent, retourne
|
| 237 |
-
[].
|
| 238 |
-
"""
|
| 239 |
-
engines = benchmark_data.get("engines") or []
|
| 240 |
-
out: list[Lever] = []
|
| 241 |
-
for engine in engines:
|
| 242 |
-
taxonomy = engine.get("aggregated_taxonomy")
|
| 243 |
-
if not taxonomy:
|
| 244 |
-
continue
|
| 245 |
-
# `taxonomy` peut être {class_name: int} ou un dict avec une
|
| 246 |
-
# sous-clé "counts" — on accepte les deux conventions.
|
| 247 |
-
counts = taxonomy.get("counts") if isinstance(taxonomy, dict) and "counts" in taxonomy else taxonomy
|
| 248 |
-
if not isinstance(counts, dict) or not counts:
|
| 249 |
-
continue
|
| 250 |
-
try:
|
| 251 |
-
int_counts = {k: int(v) for k, v in counts.items() if isinstance(v, (int, float))}
|
| 252 |
-
except (TypeError, ValueError):
|
| 253 |
-
continue
|
| 254 |
-
total = sum(int_counts.values())
|
| 255 |
-
if total <= 0:
|
| 256 |
-
continue
|
| 257 |
-
recoverable_total = sum(
|
| 258 |
-
v for k, v in int_counts.items()
|
| 259 |
-
if _RECOVERABILITY.get(k) == "recoverable"
|
| 260 |
-
)
|
| 261 |
-
share = recoverable_total / total
|
| 262 |
-
if share < threshold:
|
| 263 |
-
continue
|
| 264 |
-
# Classes récupérables non vides triées par count décroissant
|
| 265 |
-
breakdown = sorted(
|
| 266 |
-
(
|
| 267 |
-
(k, v) for k, v in int_counts.items()
|
| 268 |
-
if _RECOVERABILITY.get(k) == "recoverable" and v > 0
|
| 269 |
-
),
|
| 270 |
-
key=lambda kv: -kv[1],
|
| 271 |
-
)
|
| 272 |
-
importance = (
|
| 273 |
-
LeverImportance.HIGH if share >= 0.50 else LeverImportance.MEDIUM
|
| 274 |
-
)
|
| 275 |
-
out.append(Lever(
|
| 276 |
-
type=LeverType.DOMINANT_RECOVERABLE_CLASS,
|
| 277 |
-
importance=importance,
|
| 278 |
-
payload={
|
| 279 |
-
"engine": engine.get("name") or "?",
|
| 280 |
-
"share_recoverable": share,
|
| 281 |
-
"share_recoverable_pct": round(share * 100, 1),
|
| 282 |
-
"n_recoverable": recoverable_total,
|
| 283 |
-
"n_total_errors": total,
|
| 284 |
-
"top_classes": [
|
| 285 |
-
{"class": k, "count": v} for k, v in breakdown[:3]
|
| 286 |
-
],
|
| 287 |
-
},
|
| 288 |
-
engines_involved=(engine.get("name") or "?",),
|
| 289 |
-
))
|
| 290 |
-
return out
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
@register_lever(LeverType.PARETO_CONCENTRATION, priority=20)
|
| 294 |
-
def detect_pareto_concentration(
|
| 295 |
-
benchmark_data: dict,
|
| 296 |
-
*,
|
| 297 |
-
top_share: float = 0.20,
|
| 298 |
-
cer_share_threshold: float = 0.50,
|
| 299 |
-
) -> list[Lever]:
|
| 300 |
-
"""Émet un levier si une fraction minoritaire de documents
|
| 301 |
-
(`top_share`) concentre plus de `cer_share_threshold` du CER
|
| 302 |
-
total cumulé sur le moteur leader.
|
| 303 |
-
|
| 304 |
-
Lit `benchmark_data["per_doc_cer"][engine_name]` ou tente de
|
| 305 |
-
reconstruire depuis `benchmark_data["engines"][...]["per_doc"]`.
|
| 306 |
-
Si rien d'exploitable, retourne [].
|
| 307 |
-
"""
|
| 308 |
-
ranking = benchmark_data.get("ranking") or []
|
| 309 |
-
if not ranking:
|
| 310 |
-
return []
|
| 311 |
-
leader = ranking[0]
|
| 312 |
-
leader_name = leader.get("engine")
|
| 313 |
-
if not leader_name:
|
| 314 |
-
return []
|
| 315 |
-
|
| 316 |
-
per_doc_cer: list[float] = []
|
| 317 |
-
# Voie 1 : structure plate "per_doc_cer"
|
| 318 |
-
flat = benchmark_data.get("per_doc_cer") or {}
|
| 319 |
-
if isinstance(flat, dict) and leader_name in flat and isinstance(flat[leader_name], list):
|
| 320 |
-
per_doc_cer = [float(x) for x in flat[leader_name] if isinstance(x, (int, float))]
|
| 321 |
-
else:
|
| 322 |
-
# Voie 2 : engine.per_doc liste de dicts {cer: float}
|
| 323 |
-
for engine in benchmark_data.get("engines") or []:
|
| 324 |
-
if engine.get("name") != leader_name:
|
| 325 |
-
continue
|
| 326 |
-
per_doc = engine.get("per_doc") or []
|
| 327 |
-
for entry in per_doc:
|
| 328 |
-
if isinstance(entry, dict) and isinstance(entry.get("cer"), (int, float)):
|
| 329 |
-
per_doc_cer.append(float(entry["cer"]))
|
| 330 |
-
break
|
| 331 |
-
|
| 332 |
-
if not per_doc_cer:
|
| 333 |
-
return []
|
| 334 |
-
total_cer = sum(per_doc_cer)
|
| 335 |
-
if total_cer <= 0:
|
| 336 |
-
return []
|
| 337 |
-
|
| 338 |
-
sorted_cer = sorted(per_doc_cer, reverse=True)
|
| 339 |
-
n = len(sorted_cer)
|
| 340 |
-
n_top = max(1, int(round(top_share * n)))
|
| 341 |
-
top_cer_sum = sum(sorted_cer[:n_top])
|
| 342 |
-
share_of_total = top_cer_sum / total_cer
|
| 343 |
-
if share_of_total < cer_share_threshold:
|
| 344 |
-
return []
|
| 345 |
-
importance = (
|
| 346 |
-
LeverImportance.HIGH if share_of_total >= 0.75
|
| 347 |
-
else LeverImportance.MEDIUM
|
| 348 |
-
)
|
| 349 |
-
return [Lever(
|
| 350 |
-
type=LeverType.PARETO_CONCENTRATION,
|
| 351 |
-
importance=importance,
|
| 352 |
-
payload={
|
| 353 |
-
"engine": leader_name,
|
| 354 |
-
"n_docs": n,
|
| 355 |
-
"n_docs_top": n_top,
|
| 356 |
-
"top_share_pct": round((n_top / n) * 100, 1),
|
| 357 |
-
"cer_share_of_total": share_of_total,
|
| 358 |
-
"cer_share_pct": round(share_of_total * 100, 1),
|
| 359 |
-
},
|
| 360 |
-
engines_involved=(leader_name,),
|
| 361 |
-
)]
|
| 362 |
-
|
| 363 |
-
|
| 364 |
-
@register_lever(LeverType.COMPLEMENTARITY_OBSERVATION, priority=30)
|
| 365 |
-
def detect_complementarity_observation(
|
| 366 |
-
benchmark_data: dict,
|
| 367 |
-
*,
|
| 368 |
-
min_relative_gap: float = 0.20,
|
| 369 |
-
) -> list[Lever]:
|
| 370 |
-
"""Reformule factuellement le `complementarity_gap` (Sprint 35).
|
| 371 |
-
|
| 372 |
-
Lit `benchmark_data["inter_engine_analysis"]`. Garde-fou : ne
|
| 373 |
-
déclenche que si `relative_gap` ≥ `min_relative_gap`. **Aucune
|
| 374 |
-
recommandation d'ensemble** — le levier dit factuellement
|
| 375 |
-
« X points séparent l'oracle du meilleur moteur », c'est tout.
|
| 376 |
-
"""
|
| 377 |
-
inter = benchmark_data.get("inter_engine_analysis") or {}
|
| 378 |
-
cgap = inter.get("complementarity_gap") or {}
|
| 379 |
-
relative_gap = cgap.get("relative_gap")
|
| 380 |
-
absolute_gap = cgap.get("absolute_gap")
|
| 381 |
-
if relative_gap is None or absolute_gap is None:
|
| 382 |
-
return []
|
| 383 |
-
try:
|
| 384 |
-
rg = float(relative_gap)
|
| 385 |
-
ag = float(absolute_gap)
|
| 386 |
-
except (TypeError, ValueError):
|
| 387 |
-
return []
|
| 388 |
-
if rg < min_relative_gap:
|
| 389 |
-
return []
|
| 390 |
-
importance = (
|
| 391 |
-
LeverImportance.HIGH if rg >= 0.50 else LeverImportance.MEDIUM
|
| 392 |
-
)
|
| 393 |
-
payload: dict = {
|
| 394 |
-
"absolute_gap": ag,
|
| 395 |
-
"absolute_gap_pct": round(ag * 100, 1),
|
| 396 |
-
"relative_gap": rg,
|
| 397 |
-
"relative_gap_pct": round(rg * 100, 1),
|
| 398 |
-
}
|
| 399 |
-
best_engine = cgap.get("best_engine") or inter.get("best_engine")
|
| 400 |
-
best_recall = cgap.get("best_recall") or inter.get("best_engine_recall")
|
| 401 |
-
oracle_recall = cgap.get("oracle_recall") or inter.get("oracle_recall")
|
| 402 |
-
engines_involved: tuple[str, ...] = ()
|
| 403 |
-
if best_engine:
|
| 404 |
-
payload["best_engine"] = str(best_engine)
|
| 405 |
-
engines_involved = (str(best_engine),)
|
| 406 |
-
if isinstance(best_recall, (int, float)):
|
| 407 |
-
payload["best_recall"] = float(best_recall)
|
| 408 |
-
if isinstance(oracle_recall, (int, float)):
|
| 409 |
-
payload["oracle_recall"] = float(oracle_recall)
|
| 410 |
-
return [Lever(
|
| 411 |
-
type=LeverType.COMPLEMENTARITY_OBSERVATION,
|
| 412 |
-
importance=importance,
|
| 413 |
-
payload=payload,
|
| 414 |
-
engines_involved=engines_involved,
|
| 415 |
-
)]
|
| 416 |
-
|
| 417 |
-
|
| 418 |
-
@register_lever(LeverType.LEXICAL_MODERNIZATION_OBSERVATION, priority=40)
|
| 419 |
-
def detect_lexical_modernization_observation(
|
| 420 |
-
benchmark_data: dict,
|
| 421 |
-
*,
|
| 422 |
-
top_n: int = 3,
|
| 423 |
-
min_total: int = 3,
|
| 424 |
-
min_rate: float = 0.50,
|
| 425 |
-
) -> list[Lever]:
|
| 426 |
-
"""Pour chaque moteur disposant de `lexical_modernization`,
|
| 427 |
-
émet un levier listant les `top_n` tokens GT les plus modernisés.
|
| 428 |
-
|
| 429 |
-
Lit `benchmark_data["engines"][i]["lexical_modernization"]` qui
|
| 430 |
-
suit la forme produite par `compute_lexical_modernization` du
|
| 431 |
-
Sprint 80 (`{"n_gt_tokens": int, "tokens": dict}`).
|
| 432 |
-
"""
|
| 433 |
-
out: list[Lever] = []
|
| 434 |
-
for engine in benchmark_data.get("engines") or []:
|
| 435 |
-
data = engine.get("lexical_modernization")
|
| 436 |
-
if not isinstance(data, dict):
|
| 437 |
-
continue
|
| 438 |
-
tokens = data.get("tokens") or {}
|
| 439 |
-
if not isinstance(tokens, dict) or not tokens:
|
| 440 |
-
continue
|
| 441 |
-
candidates: list[tuple[str, dict]] = []
|
| 442 |
-
for gt_token, slot in tokens.items():
|
| 443 |
-
if not isinstance(slot, dict):
|
| 444 |
-
continue
|
| 445 |
-
n_total = slot.get("n_total")
|
| 446 |
-
rate = slot.get("rate_modernized")
|
| 447 |
-
if not isinstance(n_total, (int, float)) or not isinstance(rate, (int, float)):
|
| 448 |
-
continue
|
| 449 |
-
if int(n_total) < min_total:
|
| 450 |
-
continue
|
| 451 |
-
if float(rate) < min_rate:
|
| 452 |
-
continue
|
| 453 |
-
candidates.append((gt_token, dict(slot)))
|
| 454 |
-
if not candidates:
|
| 455 |
-
continue
|
| 456 |
-
candidates.sort(
|
| 457 |
-
key=lambda kv: (-float(kv[1].get("rate_modernized", 0.0)),
|
| 458 |
-
-int(kv[1].get("n_total", 0)),
|
| 459 |
-
kv[0]),
|
| 460 |
-
)
|
| 461 |
-
top = candidates[:top_n]
|
| 462 |
-
engine_name = engine.get("name") or "?"
|
| 463 |
-
max_rate = max(float(slot.get("rate_modernized", 0.0)) for _, slot in top)
|
| 464 |
-
importance = (
|
| 465 |
-
LeverImportance.HIGH if max_rate >= 0.90 else LeverImportance.MEDIUM
|
| 466 |
-
)
|
| 467 |
-
out.append(Lever(
|
| 468 |
-
type=LeverType.LEXICAL_MODERNIZATION_OBSERVATION,
|
| 469 |
-
importance=importance,
|
| 470 |
-
payload={
|
| 471 |
-
"engine": engine_name,
|
| 472 |
-
"top_tokens": [
|
| 473 |
-
{
|
| 474 |
-
"gt_token": gt,
|
| 475 |
-
"n_total": int(slot.get("n_total", 0)),
|
| 476 |
-
"rate_modernized": float(slot.get("rate_modernized", 0.0)),
|
| 477 |
-
"rate_modernized_pct": round(
|
| 478 |
-
float(slot.get("rate_modernized", 0.0)) * 100, 1,
|
| 479 |
-
),
|
| 480 |
-
}
|
| 481 |
-
for gt, slot in top
|
| 482 |
-
],
|
| 483 |
-
},
|
| 484 |
-
engines_involved=(engine_name,),
|
| 485 |
-
))
|
| 486 |
-
return out
|
| 487 |
-
|
| 488 |
-
|
| 489 |
-
@register_lever(LeverType.ROBUSTNESS_PROJECTION_OBSERVATION, priority=50)
|
| 490 |
-
def detect_robustness_projection_observation(
|
| 491 |
-
benchmark_data: dict,
|
| 492 |
-
*,
|
| 493 |
-
min_total_deficit: float = 0.02,
|
| 494 |
-
) -> list[Lever]:
|
| 495 |
-
"""Lit l'agrégation par moteur de la projection de robustesse
|
| 496 |
-
(Sprint 81). Émet le levier pour le moteur dont
|
| 497 |
-
`total_expected_deficit` est ≥ `min_total_deficit` (par défaut
|
| 498 |
-
2 points de CER).
|
| 499 |
-
|
| 500 |
-
Lit `benchmark_data["robustness_projection_aggregated"]` —
|
| 501 |
-
structure produite par `aggregate_projection_per_engine`.
|
| 502 |
-
"""
|
| 503 |
-
agg = benchmark_data.get("robustness_projection_aggregated") or {}
|
| 504 |
-
if not isinstance(agg, dict) or not agg:
|
| 505 |
-
return []
|
| 506 |
-
out: list[Lever] = []
|
| 507 |
-
for engine_name, info in agg.items():
|
| 508 |
-
if not isinstance(info, dict):
|
| 509 |
-
continue
|
| 510 |
-
total_deficit = info.get("total_expected_deficit")
|
| 511 |
-
worst_type = info.get("worst_degradation_type")
|
| 512 |
-
worst_deficit = info.get("worst_degradation_deficit")
|
| 513 |
-
if not isinstance(total_deficit, (int, float)):
|
| 514 |
-
continue
|
| 515 |
-
if float(total_deficit) < min_total_deficit:
|
| 516 |
-
continue
|
| 517 |
-
importance = (
|
| 518 |
-
LeverImportance.HIGH if float(total_deficit) >= 0.05
|
| 519 |
-
else LeverImportance.MEDIUM
|
| 520 |
-
)
|
| 521 |
-
payload: dict = {
|
| 522 |
-
"engine": engine_name,
|
| 523 |
-
"total_expected_deficit": float(total_deficit),
|
| 524 |
-
"total_expected_deficit_pct": round(float(total_deficit) * 100, 1),
|
| 525 |
-
"n_degradation_types": int(info.get("n_degradation_types") or 0),
|
| 526 |
-
}
|
| 527 |
-
if isinstance(worst_type, str):
|
| 528 |
-
payload["worst_degradation_type"] = worst_type
|
| 529 |
-
if isinstance(worst_deficit, (int, float)):
|
| 530 |
-
payload["worst_degradation_deficit"] = float(worst_deficit)
|
| 531 |
-
payload["worst_degradation_deficit_pct"] = round(
|
| 532 |
-
float(worst_deficit) * 100, 1,
|
| 533 |
-
)
|
| 534 |
-
out.append(Lever(
|
| 535 |
-
type=LeverType.ROBUSTNESS_PROJECTION_OBSERVATION,
|
| 536 |
-
importance=importance,
|
| 537 |
-
payload=payload,
|
| 538 |
-
engines_involved=(engine_name,),
|
| 539 |
-
))
|
| 540 |
-
# Tri par déficit décroissant pour stabilité d'affichage.
|
| 541 |
-
out.sort(
|
| 542 |
-
key=lambda lv: -float(lv.payload.get("total_expected_deficit") or 0.0),
|
| 543 |
-
)
|
| 544 |
-
return out
|
| 545 |
-
|
| 546 |
|
| 547 |
-
|
| 548 |
-
|
| 549 |
-
"
|
| 550 |
-
|
| 551 |
-
"LeverDetectorEntry",
|
| 552 |
-
"register_lever",
|
| 553 |
-
"unregister_lever",
|
| 554 |
-
"iter_lever_detectors",
|
| 555 |
-
"detect_levers",
|
| 556 |
-
"detect_dominant_recoverable_class",
|
| 557 |
-
"detect_pareto_concentration",
|
| 558 |
-
"detect_complementarity_observation",
|
| 559 |
-
"detect_lexical_modernization_observation",
|
| 560 |
-
"detect_robustness_projection_observation",
|
| 561 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.levers`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.levers import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.levers import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.levers as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,286 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
- Coefficient de Gini : concentration des erreurs (0 = uniformes, 1 = toutes concentrées)
|
| 9 |
-
- Carte thermique : CER moyen par tranche de position dans le document
|
| 10 |
-
"""
|
| 11 |
-
|
| 12 |
-
from __future__ import annotations
|
| 13 |
-
|
| 14 |
-
import unicodedata
|
| 15 |
-
from dataclasses import dataclass
|
| 16 |
-
from typing import Optional
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
# ---------------------------------------------------------------------------
|
| 20 |
-
# CER d'une paire de lignes (distance d'édition Levenshtein normalisée)
|
| 21 |
-
# ---------------------------------------------------------------------------
|
| 22 |
-
|
| 23 |
-
def _edit_distance(a: str, b: str) -> int:
|
| 24 |
-
"""Distance de Levenshtein entre deux chaînes."""
|
| 25 |
-
if not a:
|
| 26 |
-
return len(b)
|
| 27 |
-
if not b:
|
| 28 |
-
return len(a)
|
| 29 |
-
prev = list(range(len(b) + 1))
|
| 30 |
-
for i, ca in enumerate(a, 1):
|
| 31 |
-
curr = [i]
|
| 32 |
-
for j, cb in enumerate(b, 1):
|
| 33 |
-
cost = 0 if ca == cb else 1
|
| 34 |
-
curr.append(min(curr[j - 1] + 1, prev[j] + 1, prev[j - 1] + cost))
|
| 35 |
-
prev = curr
|
| 36 |
-
return prev[-1]
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
def _line_cer(ref_line: str, hyp_line: str) -> float:
|
| 40 |
-
"""CER pour une paire de lignes. Retourne 1.0 si le GT est vide et que l'hyp ne l'est pas."""
|
| 41 |
-
ref = unicodedata.normalize("NFC", ref_line.strip())
|
| 42 |
-
hyp = unicodedata.normalize("NFC", hyp_line.strip())
|
| 43 |
-
if not ref:
|
| 44 |
-
return 0.0 if not hyp else 1.0
|
| 45 |
-
dist = _edit_distance(ref, hyp)
|
| 46 |
-
return dist / len(ref)
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
# ---------------------------------------------------------------------------
|
| 50 |
-
# Percentiles (implémentation pur-Python, sans numpy)
|
| 51 |
-
# ---------------------------------------------------------------------------
|
| 52 |
-
|
| 53 |
-
def _percentile(sorted_values: list[float], p: float) -> float:
|
| 54 |
-
"""Retourne le p-ième percentile (0 ≤ p ≤ 100) d'une liste triée."""
|
| 55 |
-
if not sorted_values:
|
| 56 |
-
return 0.0
|
| 57 |
-
n = len(sorted_values)
|
| 58 |
-
index = p / 100 * (n - 1)
|
| 59 |
-
lo = int(index)
|
| 60 |
-
hi = min(lo + 1, n - 1)
|
| 61 |
-
frac = index - lo
|
| 62 |
-
return sorted_values[lo] + frac * (sorted_values[hi] - sorted_values[lo])
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
# ---------------------------------------------------------------------------
|
| 66 |
-
# Coefficient de Gini
|
| 67 |
-
# ---------------------------------------------------------------------------
|
| 68 |
-
|
| 69 |
-
def _gini(values: list[float]) -> float:
|
| 70 |
-
"""Coefficient de Gini des erreurs (0 = uniformes, 1 = toutes concentrées).
|
| 71 |
-
|
| 72 |
-
Formule : G = (2 * Σ i*x_i) / (n * Σ x_i) - (n+1)/n
|
| 73 |
-
sur les valeurs triées par ordre croissant.
|
| 74 |
-
"""
|
| 75 |
-
if not values:
|
| 76 |
-
return 0.0
|
| 77 |
-
xs = sorted(max(v, 0.0) for v in values)
|
| 78 |
-
n = len(xs)
|
| 79 |
-
total = sum(xs)
|
| 80 |
-
if total == 0.0:
|
| 81 |
-
return 0.0
|
| 82 |
-
weighted_sum = sum((i + 1) * x for i, x in enumerate(xs))
|
| 83 |
-
return (2.0 * weighted_sum) / (n * total) - (n + 1) / n
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
# ---------------------------------------------------------------------------
|
| 87 |
-
# Résultat structuré
|
| 88 |
-
# ---------------------------------------------------------------------------
|
| 89 |
-
|
| 90 |
-
@dataclass
|
| 91 |
-
class LineMetrics:
|
| 92 |
-
"""Distribution des erreurs CER par ligne pour une paire (GT, hypothèse)."""
|
| 93 |
-
|
| 94 |
-
cer_per_line: list[float]
|
| 95 |
-
"""CER de chaque ligne (longueur = nombre de lignes GT)."""
|
| 96 |
-
|
| 97 |
-
percentiles: dict[str, float]
|
| 98 |
-
"""Percentiles : p50, p75, p90, p95, p99."""
|
| 99 |
-
|
| 100 |
-
catastrophic_rate: dict[str, float]
|
| 101 |
-
"""Taux de lignes catastrophiques pour chaque seuil (ex. {0.3: 0.12, 0.5: 0.07, 1.0: 0.02})."""
|
| 102 |
-
|
| 103 |
-
gini: float
|
| 104 |
-
"""Coefficient de Gini des erreurs (0 → uniforme, 1 → concentrées)."""
|
| 105 |
-
|
| 106 |
-
heatmap: list[float]
|
| 107 |
-
"""CER moyen par tranche de position dans le document (longueur = heatmap_bins)."""
|
| 108 |
-
|
| 109 |
-
line_count: int
|
| 110 |
-
"""Nombre de lignes GT traitées."""
|
| 111 |
-
|
| 112 |
-
mean_cer: float
|
| 113 |
-
"""CER moyen sur l'ensemble des lignes."""
|
| 114 |
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
"catastrophic_rate": {str(k): round(v, 6) for k, v in self.catastrophic_rate.items()},
|
| 120 |
-
"gini": round(self.gini, 6),
|
| 121 |
-
"heatmap": [round(v, 6) for v in self.heatmap],
|
| 122 |
-
"line_count": self.line_count,
|
| 123 |
-
"mean_cer": round(self.mean_cer, 6),
|
| 124 |
-
}
|
| 125 |
-
|
| 126 |
-
@classmethod
|
| 127 |
-
def from_dict(cls, d: dict) -> "LineMetrics":
|
| 128 |
-
return cls(
|
| 129 |
-
cer_per_line=d.get("cer_per_line", []),
|
| 130 |
-
percentiles=d.get("percentiles", {}),
|
| 131 |
-
catastrophic_rate={float(k): v for k, v in d.get("catastrophic_rate", {}).items()},
|
| 132 |
-
gini=d.get("gini", 0.0),
|
| 133 |
-
heatmap=d.get("heatmap", []),
|
| 134 |
-
line_count=d.get("line_count", 0),
|
| 135 |
-
mean_cer=d.get("mean_cer", 0.0),
|
| 136 |
-
)
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
# ---------------------------------------------------------------------------
|
| 140 |
-
# Calcul principal
|
| 141 |
-
# ---------------------------------------------------------------------------
|
| 142 |
-
|
| 143 |
-
def compute_line_metrics(
|
| 144 |
-
reference: str,
|
| 145 |
-
hypothesis: str,
|
| 146 |
-
thresholds: Optional[list[float]] = None,
|
| 147 |
-
heatmap_bins: int = 10,
|
| 148 |
-
) -> LineMetrics:
|
| 149 |
-
"""Calcule la distribution des erreurs CER ligne par ligne.
|
| 150 |
-
|
| 151 |
-
Parameters
|
| 152 |
-
----------
|
| 153 |
-
reference:
|
| 154 |
-
Texte de vérité terrain (GT) avec sauts de ligne.
|
| 155 |
-
hypothesis:
|
| 156 |
-
Texte produit par le moteur OCR.
|
| 157 |
-
thresholds:
|
| 158 |
-
Seuils CER pour le taux catastrophique. Défaut : [0.30, 0.50, 1.00].
|
| 159 |
-
heatmap_bins:
|
| 160 |
-
Nombre de tranches de position pour la carte thermique.
|
| 161 |
-
|
| 162 |
-
Returns
|
| 163 |
-
-------
|
| 164 |
-
LineMetrics
|
| 165 |
-
"""
|
| 166 |
-
if thresholds is None:
|
| 167 |
-
thresholds = [0.30, 0.50, 1.00]
|
| 168 |
-
|
| 169 |
-
ref_lines = reference.splitlines()
|
| 170 |
-
hyp_lines = hypothesis.splitlines()
|
| 171 |
-
|
| 172 |
-
# Aligner les lignes GT / hypothèse — on prend au moins autant de lignes que le GT
|
| 173 |
-
n = len(ref_lines)
|
| 174 |
-
if n == 0:
|
| 175 |
-
# Pas de lignes : retourner des métriques neutres
|
| 176 |
-
return LineMetrics(
|
| 177 |
-
cer_per_line=[],
|
| 178 |
-
percentiles={f"p{p}": 0.0 for p in (50, 75, 90, 95, 99)},
|
| 179 |
-
catastrophic_rate={t: 0.0 for t in thresholds},
|
| 180 |
-
gini=0.0,
|
| 181 |
-
heatmap=[0.0] * heatmap_bins,
|
| 182 |
-
line_count=0,
|
| 183 |
-
mean_cer=0.0,
|
| 184 |
-
)
|
| 185 |
-
|
| 186 |
-
# Aligner en ignorant les lignes d'hypothèse supplémentaires
|
| 187 |
-
# Si l'hypothèse a moins de lignes, les lignes manquantes comptent comme supprimées (CER = 1.0)
|
| 188 |
-
cer_per_line: list[float] = []
|
| 189 |
-
for i, ref_line in enumerate(ref_lines):
|
| 190 |
-
hyp_line = hyp_lines[i] if i < len(hyp_lines) else ""
|
| 191 |
-
cer_per_line.append(min(_line_cer(ref_line, hyp_line), 1.0))
|
| 192 |
-
|
| 193 |
-
sorted_cer = sorted(cer_per_line)
|
| 194 |
-
|
| 195 |
-
# Percentiles
|
| 196 |
-
percentiles = {
|
| 197 |
-
f"p{p}": _percentile(sorted_cer, p)
|
| 198 |
-
for p in (50, 75, 90, 95, 99)
|
| 199 |
-
}
|
| 200 |
-
|
| 201 |
-
# Taux catastrophiques
|
| 202 |
-
catastrophic_rate: dict[float, float] = {}
|
| 203 |
-
for t in thresholds:
|
| 204 |
-
count = sum(1 for v in cer_per_line if v > t)
|
| 205 |
-
catastrophic_rate[t] = count / n
|
| 206 |
-
|
| 207 |
-
# Gini
|
| 208 |
-
gini = _gini(cer_per_line)
|
| 209 |
-
|
| 210 |
-
# Carte thermique par tranche de position
|
| 211 |
-
bins = heatmap_bins
|
| 212 |
-
heatmap: list[float] = []
|
| 213 |
-
for b in range(bins):
|
| 214 |
-
start = int(b * n / bins)
|
| 215 |
-
end = int((b + 1) * n / bins)
|
| 216 |
-
slice_ = cer_per_line[start:end]
|
| 217 |
-
heatmap.append(sum(slice_) / len(slice_) if slice_ else 0.0)
|
| 218 |
-
|
| 219 |
-
mean_cer = sum(cer_per_line) / n
|
| 220 |
-
|
| 221 |
-
return LineMetrics(
|
| 222 |
-
cer_per_line=cer_per_line,
|
| 223 |
-
percentiles=percentiles,
|
| 224 |
-
catastrophic_rate=catastrophic_rate,
|
| 225 |
-
gini=gini,
|
| 226 |
-
heatmap=heatmap,
|
| 227 |
-
line_count=n,
|
| 228 |
-
mean_cer=mean_cer,
|
| 229 |
-
)
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
# ---------------------------------------------------------------------------
|
| 233 |
-
# Agrégation sur un corpus
|
| 234 |
-
# ---------------------------------------------------------------------------
|
| 235 |
-
|
| 236 |
-
def aggregate_line_metrics(results: list[LineMetrics]) -> dict:
|
| 237 |
-
"""Agrège les métriques de distribution par ligne sur un corpus.
|
| 238 |
-
|
| 239 |
-
Returns
|
| 240 |
-
-------
|
| 241 |
-
dict
|
| 242 |
-
Statistiques agrégées : Gini moyen, percentiles moyens, taux catastrophiques moyens.
|
| 243 |
-
"""
|
| 244 |
-
if not results:
|
| 245 |
-
return {}
|
| 246 |
-
|
| 247 |
-
import statistics as _stats
|
| 248 |
-
|
| 249 |
-
gini_values = [r.gini for r in results]
|
| 250 |
-
mean_cer_values = [r.mean_cer for r in results]
|
| 251 |
-
|
| 252 |
-
# Percentiles moyens
|
| 253 |
-
pct_keys = ["p50", "p75", "p90", "p95", "p99"]
|
| 254 |
-
avg_percentiles = {}
|
| 255 |
-
for k in pct_keys:
|
| 256 |
-
vals = [r.percentiles.get(k, 0.0) for r in results]
|
| 257 |
-
avg_percentiles[k] = round(sum(vals) / len(vals), 6) if vals else 0.0
|
| 258 |
-
|
| 259 |
-
# Taux catastrophiques moyens (union des seuils)
|
| 260 |
-
all_thresholds: set[float] = set()
|
| 261 |
-
for r in results:
|
| 262 |
-
all_thresholds.update(r.catastrophic_rate.keys())
|
| 263 |
-
avg_catastrophic: dict[str, float] = {}
|
| 264 |
-
for t in sorted(all_thresholds):
|
| 265 |
-
vals = [r.catastrophic_rate.get(t, 0.0) for r in results]
|
| 266 |
-
avg_catastrophic[str(t)] = round(sum(vals) / len(vals), 6) if vals else 0.0
|
| 267 |
|
| 268 |
-
|
| 269 |
-
if results and results[0].heatmap:
|
| 270 |
-
n_bins = len(results[0].heatmap)
|
| 271 |
-
heatmap_avg = []
|
| 272 |
-
for b in range(n_bins):
|
| 273 |
-
vals = [r.heatmap[b] for r in results if b < len(r.heatmap)]
|
| 274 |
-
heatmap_avg.append(round(sum(vals) / len(vals), 6) if vals else 0.0)
|
| 275 |
-
else:
|
| 276 |
-
heatmap_avg = []
|
| 277 |
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
|
| 282 |
-
"percentiles": avg_percentiles,
|
| 283 |
-
"catastrophic_rate": avg_catastrophic,
|
| 284 |
-
"heatmap": heatmap_avg,
|
| 285 |
-
"document_count": len(results),
|
| 286 |
-
}
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.line_metrics`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.line_metrics import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
| 12 |
+
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
+
from picarones.measurements.line_metrics import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.line_metrics as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,373 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
résultats de chaque run de benchmark, mais aucune métrique
|
| 9 |
-
n'en sortait dans le rapport. Ce module exploite la série
|
| 10 |
-
temporelle des CER d'un moteur pour répondre à deux
|
| 11 |
-
questions :
|
| 12 |
-
|
| 13 |
-
1. **Y a-t-il une tendance ?** Régression linéaire simple
|
| 14 |
-
(méthode des moindres carrés) sur ``(t, CER)`` — pente,
|
| 15 |
-
ordonnée à l'origine, R², n_runs. Une pente > 0 signale
|
| 16 |
-
une régression progressive ; une pente < 0 une amélioration.
|
| 17 |
-
|
| 18 |
-
2. **Y a-t-il un point de rupture ?** Algorithme de
|
| 19 |
-
change-point pur Python (différence de moyennes maximale,
|
| 20 |
-
variante de Pettitt simplifiée). Identifie l'index où la
|
| 21 |
-
série se sépare en deux segments avec moyennes les plus
|
| 22 |
-
différentes — typiquement le run où un modèle a changé de
|
| 23 |
-
comportement.
|
| 24 |
-
|
| 25 |
-
Pas de scipy
|
| 26 |
-
------------
|
| 27 |
-
Pour rester sans dépendance lourde, on implémente :
|
| 28 |
-
- la régression linéaire en pur Python (closed-form OLS) ;
|
| 29 |
-
- le change-point par balayage exhaustif (O(N) pour de petits
|
| 30 |
-
N — l'historique d'une institution dépasse rarement quelques
|
| 31 |
-
centaines de runs).
|
| 32 |
"""
|
| 33 |
|
| 34 |
-
from
|
| 35 |
-
|
| 36 |
-
import logging
|
| 37 |
-
import math
|
| 38 |
-
import statistics
|
| 39 |
-
from dataclasses import dataclass
|
| 40 |
-
from datetime import datetime
|
| 41 |
-
from typing import Iterable, Optional
|
| 42 |
-
|
| 43 |
-
logger = logging.getLogger(__name__)
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
@dataclass
|
| 47 |
-
class LinearTrend:
|
| 48 |
-
"""Résultat d'une régression linéaire sur une série CER."""
|
| 49 |
-
slope: float
|
| 50 |
-
"""Pente (CER par jour). Positif = régression."""
|
| 51 |
-
intercept: float
|
| 52 |
-
"""Ordonnée à l'origine."""
|
| 53 |
-
r_squared: float
|
| 54 |
-
"""Qualité de l'ajustement, ∈ [0, 1]."""
|
| 55 |
-
n_runs: int
|
| 56 |
-
"""Nombre de points utilisés."""
|
| 57 |
-
|
| 58 |
-
def as_dict(self) -> dict:
|
| 59 |
-
return {
|
| 60 |
-
"slope": self.slope,
|
| 61 |
-
"intercept": self.intercept,
|
| 62 |
-
"r_squared": self.r_squared,
|
| 63 |
-
"n_runs": self.n_runs,
|
| 64 |
-
}
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
@dataclass
|
| 68 |
-
class ChangePointResult:
|
| 69 |
-
"""Résultat d'une détection de point de rupture."""
|
| 70 |
-
index: int
|
| 71 |
-
"""Index de la rupture (0-based, le segment 1 est [0:index],
|
| 72 |
-
le segment 2 est [index:N])."""
|
| 73 |
-
timestamp: str
|
| 74 |
-
"""Timestamp du run à la rupture."""
|
| 75 |
-
mean_before: float
|
| 76 |
-
mean_after: float
|
| 77 |
-
delta: float
|
| 78 |
-
"""``mean_after - mean_before``. Positif = régression."""
|
| 79 |
-
n_before: int
|
| 80 |
-
n_after: int
|
| 81 |
-
|
| 82 |
-
def as_dict(self) -> dict:
|
| 83 |
-
return {
|
| 84 |
-
"index": self.index,
|
| 85 |
-
"timestamp": self.timestamp,
|
| 86 |
-
"mean_before": self.mean_before,
|
| 87 |
-
"mean_after": self.mean_after,
|
| 88 |
-
"delta": self.delta,
|
| 89 |
-
"n_before": self.n_before,
|
| 90 |
-
"n_after": self.n_after,
|
| 91 |
-
}
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
def _parse_timestamp(ts: str) -> Optional[float]:
|
| 95 |
-
"""Parse un ISO timestamp en jour ordinal float.
|
| 96 |
-
|
| 97 |
-
Tolère ``YYYY-MM-DD`` et ``YYYY-MM-DDTHH:MM:SS``. Retourne
|
| 98 |
-
``None`` si non parsable.
|
| 99 |
-
"""
|
| 100 |
-
if not ts:
|
| 101 |
-
return None
|
| 102 |
-
formats = (
|
| 103 |
-
"%Y-%m-%dT%H:%M:%S.%f",
|
| 104 |
-
"%Y-%m-%dT%H:%M:%S",
|
| 105 |
-
"%Y-%m-%d %H:%M:%S",
|
| 106 |
-
"%Y-%m-%d",
|
| 107 |
-
)
|
| 108 |
-
for fmt in formats:
|
| 109 |
-
try:
|
| 110 |
-
dt = datetime.strptime(ts.split("+")[0].split("Z")[0], fmt)
|
| 111 |
-
return dt.toordinal() + (
|
| 112 |
-
dt.hour * 3600 + dt.minute * 60 + dt.second
|
| 113 |
-
) / 86400.0
|
| 114 |
-
except ValueError:
|
| 115 |
-
continue
|
| 116 |
-
return None
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
def compute_linear_trend(
|
| 120 |
-
cer_series: Iterable[tuple[str, float]],
|
| 121 |
-
) -> Optional[LinearTrend]:
|
| 122 |
-
"""Régression linéaire OLS sur une série temporelle de CER.
|
| 123 |
-
|
| 124 |
-
Parameters
|
| 125 |
-
----------
|
| 126 |
-
cer_series:
|
| 127 |
-
Itérable de ``(timestamp_iso, cer)``. Au moins 2 points
|
| 128 |
-
valides requis.
|
| 129 |
-
|
| 130 |
-
Returns
|
| 131 |
-
-------
|
| 132 |
-
LinearTrend | None
|
| 133 |
-
``None`` si moins de 2 points ou si tous les timestamps
|
| 134 |
-
sont identiques (variance nulle sur t).
|
| 135 |
-
"""
|
| 136 |
-
points: list[tuple[float, float]] = []
|
| 137 |
-
for ts, cer in cer_series:
|
| 138 |
-
t = _parse_timestamp(ts)
|
| 139 |
-
if t is None or cer is None:
|
| 140 |
-
continue
|
| 141 |
-
try:
|
| 142 |
-
cer_f = float(cer)
|
| 143 |
-
except (TypeError, ValueError):
|
| 144 |
-
continue
|
| 145 |
-
points.append((t, cer_f))
|
| 146 |
-
n = len(points)
|
| 147 |
-
if n < 2:
|
| 148 |
-
return None
|
| 149 |
-
xs = [p[0] for p in points]
|
| 150 |
-
ys = [p[1] for p in points]
|
| 151 |
-
x_mean = statistics.fmean(xs)
|
| 152 |
-
y_mean = statistics.fmean(ys)
|
| 153 |
-
sxx = sum((x - x_mean) ** 2 for x in xs)
|
| 154 |
-
sxy = sum((x - x_mean) * (y - y_mean) for x, y in zip(xs, ys))
|
| 155 |
-
if sxx == 0:
|
| 156 |
-
return None
|
| 157 |
-
slope = sxy / sxx
|
| 158 |
-
intercept = y_mean - slope * x_mean
|
| 159 |
-
syy = sum((y - y_mean) ** 2 for y in ys)
|
| 160 |
-
if syy == 0:
|
| 161 |
-
# Tous les CER sont égaux → R² mathématiquement indéfini ;
|
| 162 |
-
# on retourne 1.0 (parfaite "non-tendance").
|
| 163 |
-
r_squared = 1.0
|
| 164 |
-
else:
|
| 165 |
-
ss_res = sum(
|
| 166 |
-
(y - (slope * x + intercept)) ** 2
|
| 167 |
-
for x, y in zip(xs, ys)
|
| 168 |
-
)
|
| 169 |
-
r_squared = max(0.0, 1.0 - ss_res / syy)
|
| 170 |
-
return LinearTrend(
|
| 171 |
-
slope=slope,
|
| 172 |
-
intercept=intercept,
|
| 173 |
-
r_squared=r_squared,
|
| 174 |
-
n_runs=n,
|
| 175 |
-
)
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
def detect_change_point(
|
| 179 |
-
cer_series: Iterable[tuple[str, float]],
|
| 180 |
-
min_segment_size: int = 3,
|
| 181 |
-
) -> Optional[ChangePointResult]:
|
| 182 |
-
"""Détecte le point de rupture maximisant l'écart de moyennes.
|
| 183 |
-
|
| 184 |
-
Algorithme : balayage des indices ``i`` où la série se
|
| 185 |
-
sépare en deux segments d'au moins ``min_segment_size``
|
| 186 |
-
points chacun ; on retient l'index où ``|mean_after -
|
| 187 |
-
mean_before|`` est maximal. Variante simplifiée de Pettitt.
|
| 188 |
-
|
| 189 |
-
Parameters
|
| 190 |
-
----------
|
| 191 |
-
cer_series:
|
| 192 |
-
Itérable de ``(timestamp_iso, cer)``.
|
| 193 |
-
min_segment_size:
|
| 194 |
-
Taille minimale des deux segments. Défaut 3.
|
| 195 |
-
|
| 196 |
-
Returns
|
| 197 |
-
-------
|
| 198 |
-
ChangePointResult | None
|
| 199 |
-
``None`` si la série a moins de ``2 × min_segment_size``
|
| 200 |
-
points valides.
|
| 201 |
-
"""
|
| 202 |
-
points: list[tuple[str, float, float]] = []
|
| 203 |
-
for ts, cer in cer_series:
|
| 204 |
-
t = _parse_timestamp(ts)
|
| 205 |
-
if t is None or cer is None:
|
| 206 |
-
continue
|
| 207 |
-
try:
|
| 208 |
-
cer_f = float(cer)
|
| 209 |
-
except (TypeError, ValueError):
|
| 210 |
-
continue
|
| 211 |
-
points.append((ts, t, cer_f))
|
| 212 |
-
if len(points) < 2 * min_segment_size:
|
| 213 |
-
return None
|
| 214 |
-
points.sort(key=lambda p: p[1])
|
| 215 |
-
n = len(points)
|
| 216 |
-
best_index = -1
|
| 217 |
-
best_abs_delta = -1.0
|
| 218 |
-
best_delta = 0.0
|
| 219 |
-
best_mean_before = 0.0
|
| 220 |
-
best_mean_after = 0.0
|
| 221 |
-
for i in range(min_segment_size, n - min_segment_size + 1):
|
| 222 |
-
before = [p[2] for p in points[:i]]
|
| 223 |
-
after = [p[2] for p in points[i:]]
|
| 224 |
-
mean_b = statistics.fmean(before)
|
| 225 |
-
mean_a = statistics.fmean(after)
|
| 226 |
-
delta = mean_a - mean_b
|
| 227 |
-
abs_delta = abs(delta)
|
| 228 |
-
if abs_delta > best_abs_delta:
|
| 229 |
-
best_abs_delta = abs_delta
|
| 230 |
-
best_index = i
|
| 231 |
-
best_delta = delta
|
| 232 |
-
best_mean_before = mean_b
|
| 233 |
-
best_mean_after = mean_a
|
| 234 |
-
if best_index < 0:
|
| 235 |
-
return None
|
| 236 |
-
return ChangePointResult(
|
| 237 |
-
index=best_index,
|
| 238 |
-
timestamp=points[best_index][0],
|
| 239 |
-
mean_before=best_mean_before,
|
| 240 |
-
mean_after=best_mean_after,
|
| 241 |
-
delta=best_delta,
|
| 242 |
-
n_before=best_index,
|
| 243 |
-
n_after=n - best_index,
|
| 244 |
-
)
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
def compute_engine_longitudinal(
|
| 248 |
-
history_entries: Iterable,
|
| 249 |
-
engine_name: str,
|
| 250 |
-
corpus_name: Optional[str] = None,
|
| 251 |
-
*,
|
| 252 |
-
min_runs_for_trend: int = 3,
|
| 253 |
-
min_segment_size: int = 3,
|
| 254 |
-
change_point_threshold: float = 0.01,
|
| 255 |
-
) -> Optional[dict]:
|
| 256 |
-
"""Calcule trend + change_point pour un moteur.
|
| 257 |
-
|
| 258 |
-
Parameters
|
| 259 |
-
----------
|
| 260 |
-
history_entries:
|
| 261 |
-
Liste de ``HistoryEntry`` (ou dicts compatibles).
|
| 262 |
-
engine_name:
|
| 263 |
-
Filtre sur le nom du moteur.
|
| 264 |
-
corpus_name:
|
| 265 |
-
Filtre optionnel sur le corpus. ``None`` (défaut) : tous
|
| 266 |
-
les corpus.
|
| 267 |
-
min_runs_for_trend:
|
| 268 |
-
Minimum de runs pour calculer une tendance.
|
| 269 |
-
min_segment_size:
|
| 270 |
-
Taille minimale des segments pour le change-point.
|
| 271 |
-
change_point_threshold:
|
| 272 |
-
Magnitude absolue minimale du delta (en CER) pour
|
| 273 |
-
retenir le change-point. Défaut 0.01 (1 point de CER).
|
| 274 |
-
|
| 275 |
-
Returns
|
| 276 |
-
-------
|
| 277 |
-
dict | None
|
| 278 |
-
``{
|
| 279 |
-
"engine_name", "corpus_name", "n_runs", "trend",
|
| 280 |
-
"change_point", # ou None
|
| 281 |
-
"first_timestamp", "last_timestamp",
|
| 282 |
-
"first_cer", "last_cer", "absolute_delta_pct",
|
| 283 |
-
}`` ou ``None`` si moins de ``min_runs_for_trend`` runs.
|
| 284 |
-
"""
|
| 285 |
-
series: list[tuple[str, float]] = []
|
| 286 |
-
for entry in history_entries:
|
| 287 |
-
if hasattr(entry, "as_dict"):
|
| 288 |
-
data = entry.as_dict()
|
| 289 |
-
else:
|
| 290 |
-
data = entry
|
| 291 |
-
if data.get("engine_name") != engine_name:
|
| 292 |
-
continue
|
| 293 |
-
if corpus_name is not None and data.get("corpus_name") != corpus_name:
|
| 294 |
-
continue
|
| 295 |
-
cer = data.get("cer_mean")
|
| 296 |
-
ts = data.get("timestamp")
|
| 297 |
-
if cer is None or ts is None:
|
| 298 |
-
continue
|
| 299 |
-
series.append((ts, float(cer)))
|
| 300 |
-
if len(series) < min_runs_for_trend:
|
| 301 |
-
return None
|
| 302 |
-
series.sort(key=lambda p: _parse_timestamp(p[0]) or 0.0)
|
| 303 |
-
trend = compute_linear_trend(series)
|
| 304 |
-
cp = detect_change_point(series, min_segment_size=min_segment_size)
|
| 305 |
-
if cp is not None and abs(cp.delta) < change_point_threshold:
|
| 306 |
-
cp = None
|
| 307 |
-
first_ts, first_cer = series[0]
|
| 308 |
-
last_ts, last_cer = series[-1]
|
| 309 |
-
return {
|
| 310 |
-
"engine_name": engine_name,
|
| 311 |
-
"corpus_name": corpus_name,
|
| 312 |
-
"n_runs": len(series),
|
| 313 |
-
"trend": trend.as_dict() if trend else None,
|
| 314 |
-
"change_point": cp.as_dict() if cp else None,
|
| 315 |
-
"first_timestamp": first_ts,
|
| 316 |
-
"last_timestamp": last_ts,
|
| 317 |
-
"first_cer": first_cer,
|
| 318 |
-
"last_cer": last_cer,
|
| 319 |
-
"absolute_delta": last_cer - first_cer,
|
| 320 |
-
"absolute_delta_pct": round((last_cer - first_cer) * 100, 2),
|
| 321 |
-
}
|
| 322 |
-
|
| 323 |
-
|
| 324 |
-
def compute_corpus_longitudinal(
|
| 325 |
-
history_entries: Iterable,
|
| 326 |
-
corpus_name: Optional[str] = None,
|
| 327 |
-
*,
|
| 328 |
-
min_runs_for_trend: int = 3,
|
| 329 |
-
min_segment_size: int = 3,
|
| 330 |
-
change_point_threshold: float = 0.01,
|
| 331 |
-
) -> list[dict]:
|
| 332 |
-
"""Pour chaque moteur présent dans l'historique sur ``corpus_name``,
|
| 333 |
-
calcule trend + change_point.
|
| 334 |
-
|
| 335 |
-
Returns
|
| 336 |
-
-------
|
| 337 |
-
list[dict]
|
| 338 |
-
Une entrée par moteur (filtrée), liste vide si rien.
|
| 339 |
-
"""
|
| 340 |
-
entries = list(history_entries)
|
| 341 |
-
engines: set[str] = set()
|
| 342 |
-
for entry in entries:
|
| 343 |
-
data = entry.as_dict() if hasattr(entry, "as_dict") else entry
|
| 344 |
-
if corpus_name is not None and data.get("corpus_name") != corpus_name:
|
| 345 |
-
continue
|
| 346 |
-
name = data.get("engine_name")
|
| 347 |
-
if name:
|
| 348 |
-
engines.add(name)
|
| 349 |
-
out: list[dict] = []
|
| 350 |
-
for engine in sorted(engines):
|
| 351 |
-
result = compute_engine_longitudinal(
|
| 352 |
-
entries, engine, corpus_name=corpus_name,
|
| 353 |
-
min_runs_for_trend=min_runs_for_trend,
|
| 354 |
-
min_segment_size=min_segment_size,
|
| 355 |
-
change_point_threshold=change_point_threshold,
|
| 356 |
-
)
|
| 357 |
-
if result is not None:
|
| 358 |
-
out.append(result)
|
| 359 |
-
return out
|
| 360 |
-
|
| 361 |
-
|
| 362 |
-
__all__ = [
|
| 363 |
-
"LinearTrend",
|
| 364 |
-
"ChangePointResult",
|
| 365 |
-
"compute_linear_trend",
|
| 366 |
-
"detect_change_point",
|
| 367 |
-
"compute_engine_longitudinal",
|
| 368 |
-
"compute_corpus_longitudinal",
|
| 369 |
-
]
|
| 370 |
-
|
| 371 |
|
| 372 |
-
|
| 373 |
-
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.longitudinal`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.longitudinal import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.longitudinal import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.longitudinal as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,142 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
quel surcoût est *raisonnable* pour quelle réduction d'erreur.
|
| 9 |
-
Une institution avec un budget contraint a besoin d'une
|
| 10 |
-
réponse opérationnelle :
|
| 11 |
-
|
| 12 |
-
*« Passer de Tesseract à Mistral OCR coûte 0,83 € par
|
| 13 |
-
erreur évitée — décider selon votre budget par millier
|
| 14 |
-
d'erreurs corrigées. »*
|
| 15 |
-
|
| 16 |
-
Formule
|
| 17 |
-
-------
|
| 18 |
-
Pour deux moteurs A et B où B fait **moins** d'erreurs que A
|
| 19 |
-
(donc B est plus précis) :
|
| 20 |
-
|
| 21 |
-
.. code::
|
| 22 |
-
|
| 23 |
-
coût_marginal = (coût_B − coût_A) / (errors_A − errors_B)
|
| 24 |
-
|
| 25 |
-
- Si ``cost_B > cost_A`` et ``errors_B < errors_A`` :
|
| 26 |
-
``cost_per_avoided_error > 0`` (cas standard, B coûte plus
|
| 27 |
-
pour moins d'erreurs).
|
| 28 |
-
- Si ``cost_B ≤ cost_A`` et ``errors_B < errors_A`` :
|
| 29 |
-
``cost_per_avoided_error ≤ 0`` (cas idéal, B est strictement
|
| 30 |
-
meilleur).
|
| 31 |
-
- Si ``errors_B ≥ errors_A`` : non comparable dans ce sens
|
| 32 |
-
(B n'évite pas d'erreur), retourne ``None``.
|
| 33 |
-
|
| 34 |
-
Sortie
|
| 35 |
-
------
|
| 36 |
-
``compute_marginal_cost(cost_a, errors_a, cost_b, errors_b)``
|
| 37 |
-
retourne ``{cost_per_avoided_error, n_errors_avoided,
|
| 38 |
-
cost_delta, dominated}`` ou ``None`` si non comparable.
|
| 39 |
-
|
| 40 |
-
``compute_marginal_cost_matrix(per_engine)`` retourne, pour
|
| 41 |
-
chaque paire ordonnée ``(A → B)`` où B est plus précis, le
|
| 42 |
-
coût marginal correspondant. Trié par coût marginal croissant
|
| 43 |
-
(meilleur ratio en tête).
|
| 44 |
"""
|
| 45 |
|
| 46 |
-
from
|
| 47 |
-
|
| 48 |
-
import logging
|
| 49 |
-
from typing import Optional
|
| 50 |
-
|
| 51 |
-
logger = logging.getLogger(__name__)
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
def compute_marginal_cost(
|
| 55 |
-
cost_a: float,
|
| 56 |
-
errors_a: float,
|
| 57 |
-
cost_b: float,
|
| 58 |
-
errors_b: float,
|
| 59 |
-
) -> Optional[dict]:
|
| 60 |
-
"""Coût marginal du passage A → B (B plus précis).
|
| 61 |
-
|
| 62 |
-
Retourne ``None`` si :
|
| 63 |
-
- ``errors_b >= errors_a`` (B n'évite pas d'erreur) ;
|
| 64 |
-
- les valeurs ne sont pas finies.
|
| 65 |
-
"""
|
| 66 |
-
try:
|
| 67 |
-
ca = float(cost_a)
|
| 68 |
-
cb = float(cost_b)
|
| 69 |
-
ea = float(errors_a)
|
| 70 |
-
eb = float(errors_b)
|
| 71 |
-
except (TypeError, ValueError):
|
| 72 |
-
return None
|
| 73 |
-
if ea <= eb:
|
| 74 |
-
# B ne fait pas mieux que A → pas de gain à mesurer.
|
| 75 |
-
return None
|
| 76 |
-
n_avoided = ea - eb
|
| 77 |
-
cost_delta = cb - ca
|
| 78 |
-
cost_per_avoided = cost_delta / n_avoided
|
| 79 |
-
dominated = cost_delta <= 0 # B aussi cher ou moins → cas idéal
|
| 80 |
-
return {
|
| 81 |
-
"cost_per_avoided_error": cost_per_avoided,
|
| 82 |
-
"n_errors_avoided": n_avoided,
|
| 83 |
-
"cost_delta": cost_delta,
|
| 84 |
-
"dominated": dominated,
|
| 85 |
-
}
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
def compute_marginal_cost_matrix(
|
| 89 |
-
per_engine: dict[str, dict],
|
| 90 |
-
) -> Optional[dict]:
|
| 91 |
-
"""Pour chaque paire A → B où B fait moins d'erreurs, calcule
|
| 92 |
-
le coût marginal.
|
| 93 |
-
|
| 94 |
-
Parameters
|
| 95 |
-
----------
|
| 96 |
-
per_engine:
|
| 97 |
-
Map ``{engine_name: {"cost": float, "errors": float}}``.
|
| 98 |
-
|
| 99 |
-
Returns
|
| 100 |
-
-------
|
| 101 |
-
dict | None
|
| 102 |
-
``{
|
| 103 |
-
"pairs": list[
|
| 104 |
-
{"engine_a", "engine_b", "cost_per_avoided_error",
|
| 105 |
-
"n_errors_avoided", "cost_delta", "dominated"}
|
| 106 |
-
], # triée par cost_per_avoided_error croissant
|
| 107 |
-
}``
|
| 108 |
-
ou ``None`` si moins de 2 moteurs.
|
| 109 |
-
"""
|
| 110 |
-
if not per_engine or len(per_engine) < 2:
|
| 111 |
-
return None
|
| 112 |
-
engines = sorted(per_engine.keys())
|
| 113 |
-
pairs: list[dict] = []
|
| 114 |
-
for a in engines:
|
| 115 |
-
for b in engines:
|
| 116 |
-
if a == b:
|
| 117 |
-
continue
|
| 118 |
-
data_a = per_engine[a]
|
| 119 |
-
data_b = per_engine[b]
|
| 120 |
-
try:
|
| 121 |
-
ca = float(data_a.get("cost"))
|
| 122 |
-
ea = float(data_a.get("errors"))
|
| 123 |
-
cb = float(data_b.get("cost"))
|
| 124 |
-
eb = float(data_b.get("errors"))
|
| 125 |
-
except (TypeError, ValueError):
|
| 126 |
-
continue
|
| 127 |
-
result = compute_marginal_cost(ca, ea, cb, eb)
|
| 128 |
-
if result is None:
|
| 129 |
-
continue
|
| 130 |
-
entry = {"engine_a": a, "engine_b": b}
|
| 131 |
-
entry.update(result)
|
| 132 |
-
pairs.append(entry)
|
| 133 |
-
if not pairs:
|
| 134 |
-
return None
|
| 135 |
-
pairs.sort(key=lambda p: p["cost_per_avoided_error"])
|
| 136 |
-
return {"pairs": pairs}
|
| 137 |
-
|
| 138 |
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
"
|
| 142 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.marginal_cost`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.marginal_cost import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.marginal_cost import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.marginal_cost as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,78 +1,15 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
------------
|
| 9 |
-
- ``Fact``, ``FactType``, ``FactImportance`` : modèle de données
|
| 10 |
-
- ``DetectorRegistry`` : registre des détecteurs
|
| 11 |
-
- ``detect_all(data)`` : applique le registre par défaut
|
| 12 |
-
- ``select_facts(facts, max_facts=5)`` : arbitre de sélection
|
| 13 |
-
- ``render_synthesis(facts, lang="fr")`` : rend en liste de phrases
|
| 14 |
-
- ``build_synthesis(data, lang="fr")`` : pipeline complet (Sprint 4)
|
| 15 |
"""
|
| 16 |
|
| 17 |
-
from picarones.
|
| 18 |
-
Fact,
|
| 19 |
-
FactType,
|
| 20 |
-
FactImportance,
|
| 21 |
-
DetectorRegistry,
|
| 22 |
-
detect_all,
|
| 23 |
-
_DEFAULT_REGISTRY,
|
| 24 |
-
)
|
| 25 |
-
from picarones.core.narrative.arbiter import select_facts
|
| 26 |
-
from picarones.core.narrative.renderer import (
|
| 27 |
-
render_fact,
|
| 28 |
-
render_synthesis,
|
| 29 |
-
extract_numbers,
|
| 30 |
-
)
|
| 31 |
-
from picarones.core.narrative.detectors import (
|
| 32 |
-
register_default_detectors,
|
| 33 |
-
DETECTORS_BY_TYPE,
|
| 34 |
-
)
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
# Activer le registre par défaut — Sprint 4
|
| 38 |
-
register_default_detectors(_DEFAULT_REGISTRY)
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
def build_synthesis(
|
| 42 |
-
benchmark_data: dict,
|
| 43 |
-
lang: str = "fr",
|
| 44 |
-
max_facts: int = 5,
|
| 45 |
-
) -> dict:
|
| 46 |
-
"""Pipeline complet : détection → arbitre → rendu.
|
| 47 |
-
|
| 48 |
-
Returns
|
| 49 |
-
-------
|
| 50 |
-
dict avec :
|
| 51 |
-
- ``sentences`` : liste de phrases prêtes à l'affichage
|
| 52 |
-
- ``facts`` : liste de dicts ``Fact.as_dict()`` pour traçabilité
|
| 53 |
-
- ``lang`` : langue utilisée
|
| 54 |
-
"""
|
| 55 |
-
all_facts = detect_all(benchmark_data)
|
| 56 |
-
selected = select_facts(all_facts, max_facts=max_facts)
|
| 57 |
-
sentences = render_synthesis(selected, lang=lang)
|
| 58 |
-
return {
|
| 59 |
-
"sentences": sentences,
|
| 60 |
-
"facts": [f.as_dict() for f in selected],
|
| 61 |
-
"lang": lang,
|
| 62 |
-
}
|
| 63 |
-
|
| 64 |
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
"
|
| 68 |
-
|
| 69 |
-
"DetectorRegistry",
|
| 70 |
-
"detect_all",
|
| 71 |
-
"select_facts",
|
| 72 |
-
"render_fact",
|
| 73 |
-
"render_synthesis",
|
| 74 |
-
"extract_numbers",
|
| 75 |
-
"build_synthesis",
|
| 76 |
-
"register_default_detectors",
|
| 77 |
-
"DETECTORS_BY_TYPE",
|
| 78 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — package déplacé dans :mod:`picarones.measurements.narrative`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2) vit désormais dans ``picarones.measurements.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques :
|
| 6 |
+
``from picarones.core.narrative import build_synthesis``,
|
| 7 |
+
``from picarones.core.narrative.facts import Fact``, etc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
"""
|
| 9 |
|
| 10 |
+
from picarones.measurements.narrative import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
+
import picarones.measurements.narrative as _module
|
| 13 |
+
__all__ = getattr(_module, "__all__", [
|
| 14 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 15 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,227 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
Règles de sélection :
|
| 7 |
-
1. Tri par importance décroissante, puis par type (ordre canonique).
|
| 8 |
-
2. Non-redondance : un seul fait par moteur, sauf si les types sont
|
| 9 |
-
complémentaires (ex. ``GLOBAL_LEADER_CER`` + ``SIGNIFICANT_GAP``
|
| 10 |
-
concernent le leader mais apportent une information différente).
|
| 11 |
-
3. Limite : au maximum ``max_facts`` faits retenus (défaut 5).
|
| 12 |
-
4. Déterminisme : tri stable sur (−importance, ordre canonique du type,
|
| 13 |
-
noms des moteurs) pour garantir une sortie bit-à-bit identique.
|
| 14 |
-
|
| 15 |
-
Les détecteurs peuvent émettre plusieurs faits du même type (ex. plusieurs
|
| 16 |
-
``STATISTICAL_TIE`` si plusieurs groupes distincts). L'arbitre ne fusionne
|
| 17 |
-
pas mais peut limiter par type.
|
| 18 |
"""
|
| 19 |
|
| 20 |
-
from
|
| 21 |
-
|
| 22 |
-
from typing import Iterable, Sequence
|
| 23 |
-
|
| 24 |
-
from picarones.core.narrative.facts import Fact, FactImportance, FactType
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
# Ordre canonique des types pour départager les ex-aequo à l'importance égale.
|
| 28 |
-
#
|
| 29 |
-
# Politique éditoriale — exposée et documentée dans
|
| 30 |
-
# ``docs/developer/narrative-engine.md`` § Editorial policy.
|
| 31 |
-
# L'ordre encode quels faits sont remontés en priorité quand plusieurs ont
|
| 32 |
-
# la même ``FactImportance``. Surchargeable via le paramètre ``type_order``
|
| 33 |
-
# de ``select_facts`` sans patcher le code.
|
| 34 |
-
#
|
| 35 |
-
# Sprint 29 : la valeur n'est plus codée en dur ici — elle est dérivée du
|
| 36 |
-
# registre déclaratif (``@register_detector(..., priority=N)``). Ajouter
|
| 37 |
-
# un détecteur en bonne position se fait donc en éditant **un seul**
|
| 38 |
-
# fichier (``detectors.py``) au lieu de quatre comme avant.
|
| 39 |
-
def _compute_default_type_order() -> tuple[FactType, ...]:
|
| 40 |
-
# Import local pour éviter la dépendance circulaire au chargement.
|
| 41 |
-
from picarones.core.narrative.registry import default_type_order
|
| 42 |
-
order = default_type_order()
|
| 43 |
-
# Filet de sécurité : tant que les détecteurs n'ont pas été importés
|
| 44 |
-
# (cas des tests qui mockent le registre), on retombe sur un ordre
|
| 45 |
-
# canonique gravé pour ne pas planter ``select_facts``.
|
| 46 |
-
if not order:
|
| 47 |
-
return _FALLBACK_TYPE_ORDER
|
| 48 |
-
return order
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
# Ordre statique gardé en mémoire : utilisé si jamais le registre est vide
|
| 52 |
-
# au moment où ``arbiter`` est chargé (chargement partiel par les tests).
|
| 53 |
-
_FALLBACK_TYPE_ORDER: tuple[FactType, ...] = (
|
| 54 |
-
FactType.GLOBAL_LEADER_CER,
|
| 55 |
-
FactType.STATISTICAL_TIE,
|
| 56 |
-
FactType.SIGNIFICANT_GAP,
|
| 57 |
-
FactType.STRATUM_WINNER,
|
| 58 |
-
# Sprint 46 — priority 45, juste après STRATUM_WINNER (40),
|
| 59 |
-
# avant STRATUM_COLLAPSE (50). La recommandation de stratification
|
| 60 |
-
# nuance directement les autres faits par strate.
|
| 61 |
-
FactType.STRATIFICATION_RECOMMENDED,
|
| 62 |
-
FactType.STRATUM_COLLAPSE,
|
| 63 |
-
FactType.ERROR_PROFILE_OUTLIER,
|
| 64 |
-
FactType.LLM_HALLUCINATION_FLAG,
|
| 65 |
-
FactType.ROBUSTNESS_FRAGILE,
|
| 66 |
-
FactType.PARETO_ALTERNATIVE,
|
| 67 |
-
FactType.SPEED_WINNER,
|
| 68 |
-
FactType.COST_OUTLIER,
|
| 69 |
-
FactType.CONFIDENCE_WARNING,
|
| 70 |
-
FactType.ENSEMBLE_OPPORTUNITY,
|
| 71 |
-
FactType.MEDIAN_MEAN_GAP_WARNING,
|
| 72 |
-
# Sprint 73 — priority 150, après MEDIAN_MEAN_GAP_WARNING (140).
|
| 73 |
-
# Le détecteur off-baseline donne le contexte historique, qui
|
| 74 |
-
# vient en fin de synthèse comme « note ».
|
| 75 |
-
FactType.ENGINE_OFF_BASELINE,
|
| 76 |
-
# Sprint 90 — priority 160, ferme la synthèse avec la mise en
|
| 77 |
-
# garde sur la reproductibilité. Une instabilité multi-runs
|
| 78 |
-
# discrédite toute autre conclusion sur ce moteur ; on la
|
| 79 |
-
# remonte en dernier pour ne pas l'enterrer.
|
| 80 |
-
FactType.ENGINE_UNSTABLE,
|
| 81 |
-
# Sprint 92 — priority 170, après ENGINE_UNSTABLE. La
|
| 82 |
-
# régression historique complète A.I.3 (off-baseline) en
|
| 83 |
-
# caractérisant la tendance : l'écart courant est-il une
|
| 84 |
-
# dégradation graduelle, une rupture brutale, ou un bruit ?
|
| 85 |
-
FactType.REGRESSION_IN_HISTORY,
|
| 86 |
-
)
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
# ``DEFAULT_TYPE_ORDER`` reste un attribut module accessible. On le calcule
|
| 90 |
-
# à l'import si possible, sinon on prend le fallback ; ``select_facts``
|
| 91 |
-
# recalcule à chaque appel pour absorber les ajouts de détecteurs après
|
| 92 |
-
# l'import initial (extensions tierces).
|
| 93 |
-
DEFAULT_TYPE_ORDER: tuple[FactType, ...] = _compute_default_type_order()
|
| 94 |
-
|
| 95 |
-
# Alias rétro-compatible.
|
| 96 |
-
_TYPE_ORDER = DEFAULT_TYPE_ORDER
|
| 97 |
-
_TYPE_INDEX: dict[FactType, int] = {t: i for i, t in enumerate(DEFAULT_TYPE_ORDER)}
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
# Paires de types qui ne sont PAS considérées comme redondantes même quand
|
| 101 |
-
# elles concernent le même moteur. Tout autre couple → un seul fait retenu
|
| 102 |
-
# pour le moteur (le plus important).
|
| 103 |
-
_COMPLEMENTARY_PAIRS: frozenset[frozenset[FactType]] = frozenset({
|
| 104 |
-
frozenset({FactType.GLOBAL_LEADER_CER, FactType.SIGNIFICANT_GAP}),
|
| 105 |
-
frozenset({FactType.GLOBAL_LEADER_CER, FactType.SPEED_WINNER}),
|
| 106 |
-
frozenset({FactType.GLOBAL_LEADER_CER, FactType.CONFIDENCE_WARNING}),
|
| 107 |
-
frozenset({FactType.STATISTICAL_TIE, FactType.SPEED_WINNER}),
|
| 108 |
-
# Sprint 44 — l'avertissement d'asymétrie nuance le leader
|
| 109 |
-
# plutôt que de le doubler : on veut les deux phrases ensemble.
|
| 110 |
-
frozenset({FactType.GLOBAL_LEADER_CER, FactType.MEDIAN_MEAN_GAP_WARNING}),
|
| 111 |
-
# Sprint 46 — la recommandation de stratification est un méta-conseil
|
| 112 |
-
# qui s'ajoute au leader sans le contredire ; les deux peuvent
|
| 113 |
-
# cohabiter même quand ils concernent le même moteur.
|
| 114 |
-
frozenset({FactType.GLOBAL_LEADER_CER, FactType.STRATIFICATION_RECOMMENDED}),
|
| 115 |
-
# Sprint 90 — l'instabilité multi-runs nuance les conclusions
|
| 116 |
-
# sur le moteur leader sans les contredire : un moteur peut être
|
| 117 |
-
# leader **et** instable, et c'est précisément l'information
|
| 118 |
-
# critique pour la reproductibilité scientifique.
|
| 119 |
-
frozenset({FactType.GLOBAL_LEADER_CER, FactType.ENGINE_UNSTABLE}),
|
| 120 |
-
# Sprint 92 — la régression historique caractérise la tendance
|
| 121 |
-
# du leader : un leader peut être en régression progressive,
|
| 122 |
-
# info critique pour décider quand re-tester.
|
| 123 |
-
frozenset({FactType.GLOBAL_LEADER_CER, FactType.REGRESSION_IN_HISTORY}),
|
| 124 |
-
# Off-baseline (Sprint 73) dit "écart anormal sur ce corpus" ;
|
| 125 |
-
# regression-in-history (Sprint 92) dit "tendance dans le
|
| 126 |
-
# temps" — les deux se complètent sans se redonder.
|
| 127 |
-
frozenset({FactType.ENGINE_OFF_BASELINE, FactType.REGRESSION_IN_HISTORY}),
|
| 128 |
-
})
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
def _sort_key(fact: Fact, type_index: dict[FactType, int]) -> tuple:
|
| 132 |
-
"""Clé de tri stable : importance (desc), type canonique, moteurs."""
|
| 133 |
-
return (
|
| 134 |
-
-int(fact.importance),
|
| 135 |
-
type_index.get(fact.type, len(type_index)),
|
| 136 |
-
tuple(sorted(fact.engines_involved)),
|
| 137 |
-
fact.stratum or "",
|
| 138 |
-
)
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
def _is_redundant(candidate: Fact, kept: Fact) -> bool:
|
| 142 |
-
"""Vrai si ``candidate`` apporte trop peu par rapport à ``kept``.
|
| 143 |
-
|
| 144 |
-
Deux faits sont redondants s'ils concernent exactement le même moteur,
|
| 145 |
-
ont le même type, et la même strate (s'il y en a une). Des types
|
| 146 |
-
différents sur le même moteur ne sont considérés redondants que s'ils
|
| 147 |
-
n'appartiennent pas aux paires complémentaires (ex : un leader peut
|
| 148 |
-
aussi être rapide ; c'est complémentaire).
|
| 149 |
-
"""
|
| 150 |
-
if candidate.type == kept.type and candidate.stratum == kept.stratum:
|
| 151 |
-
return set(candidate.engines_involved) == set(kept.engines_involved)
|
| 152 |
-
if set(candidate.engines_involved) == set(kept.engines_involved):
|
| 153 |
-
pair = frozenset({candidate.type, kept.type})
|
| 154 |
-
return pair not in _COMPLEMENTARY_PAIRS
|
| 155 |
-
return False
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
def _remove_contradictions(facts: list[Fact]) -> list[Fact]:
|
| 159 |
-
"""Supprime les faits incohérents sur le plan statistique.
|
| 160 |
-
|
| 161 |
-
Règle centrale : si Nemenyi (post-hoc corrigé pour comparaisons multiples)
|
| 162 |
-
place deux moteurs dans le même groupe d'ex-aequo, alors un ``SIGNIFICANT_GAP``
|
| 163 |
-
basé sur Wilcoxon non corrigé entre ces deux mêmes moteurs est trompeur
|
| 164 |
-
pour un lecteur non statisticien. Nemenyi l'emporte.
|
| 165 |
-
"""
|
| 166 |
-
tied_groups: list[set[str]] = []
|
| 167 |
-
for f in facts:
|
| 168 |
-
if f.type == FactType.STATISTICAL_TIE:
|
| 169 |
-
tied_groups.append(set(f.engines_involved))
|
| 170 |
-
|
| 171 |
-
def _is_contradicted(fact: Fact) -> bool:
|
| 172 |
-
if fact.type != FactType.SIGNIFICANT_GAP:
|
| 173 |
-
return False
|
| 174 |
-
pair = set(fact.engines_involved)
|
| 175 |
-
return any(pair <= group for group in tied_groups)
|
| 176 |
-
|
| 177 |
-
return [f for f in facts if not _is_contradicted(f)]
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
def select_facts(
|
| 181 |
-
facts: Iterable[Fact],
|
| 182 |
-
max_facts: int = 5,
|
| 183 |
-
min_importance: FactImportance = FactImportance.MEDIUM,
|
| 184 |
-
type_order: Sequence[FactType] | None = None,
|
| 185 |
-
) -> list[Fact]:
|
| 186 |
-
"""Sélectionne la synthèse finale à partir d'une liste brute de faits.
|
| 187 |
-
|
| 188 |
-
Parameters
|
| 189 |
-
----------
|
| 190 |
-
facts:
|
| 191 |
-
Liste de ``Fact`` brute issue de ``DetectorRegistry.run``.
|
| 192 |
-
max_facts:
|
| 193 |
-
Nombre maximal de faits retenus (défaut : 5).
|
| 194 |
-
min_importance:
|
| 195 |
-
Seuil minimal d'importance. Les faits ``LOW`` sont exclus par défaut.
|
| 196 |
-
type_order:
|
| 197 |
-
Surcharge optionnelle de l'ordre canonique des types pour départager
|
| 198 |
-
les faits d'égale importance. ``None`` (défaut) utilise
|
| 199 |
-
``DEFAULT_TYPE_ORDER``. Une institution peut passer son propre ordre
|
| 200 |
-
sans patcher le code — voir ``docs/developer/narrative-engine.md``.
|
| 201 |
-
|
| 202 |
-
Returns
|
| 203 |
-
-------
|
| 204 |
-
Liste ordonnée, prête à être rendue. Toujours ≤ ``max_facts``.
|
| 205 |
-
"""
|
| 206 |
-
if type_order is None:
|
| 207 |
-
# Sprint 29 — recalcul à chaque appel pour absorber les détecteurs
|
| 208 |
-
# enregistrés après l'import d'arbiter (extensions tierces qui
|
| 209 |
-
# font ``@register_detector`` dans un module utilisateur).
|
| 210 |
-
from picarones.core.narrative.registry import default_type_order
|
| 211 |
-
live_order = default_type_order() or _FALLBACK_TYPE_ORDER
|
| 212 |
-
type_index = {t: i for i, t in enumerate(live_order)}
|
| 213 |
-
else:
|
| 214 |
-
type_index = {t: i for i, t in enumerate(type_order)}
|
| 215 |
-
|
| 216 |
-
facts_list = [f for f in facts if int(f.importance) >= int(min_importance)]
|
| 217 |
-
facts_list = _remove_contradictions(facts_list)
|
| 218 |
-
ranked = sorted(facts_list, key=lambda f: _sort_key(f, type_index))
|
| 219 |
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
selected.append(fact)
|
| 225 |
-
if len(selected) >= max_facts:
|
| 226 |
-
break
|
| 227 |
-
return selected
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.narrative.arbiter`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2 — measurements/) a quitté ``picarones.core.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.arbiter import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.arbiter as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,129 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
(
|
| 5 |
-
|
| 6 |
-
détecteurs ont été regroupés par **famille thématique** :
|
| 7 |
-
|
| 8 |
-
- :mod:`ranking` — global leader, statistical tie, significant gap,
|
| 9 |
-
speed winner, median/mean gap warning (5 détecteurs)
|
| 10 |
-
- :mod:`pareto` — Pareto alternative, cost outlier (2 détecteurs)
|
| 11 |
-
- :mod:`stratum` — stratum winner / collapse, stratification
|
| 12 |
-
recommended (3 détecteurs)
|
| 13 |
-
- :mod:`quality` — error profile outlier, LLM hallucination flag,
|
| 14 |
-
robustness fragile, confidence warning (4 détecteurs)
|
| 15 |
-
- :mod:`history` — engine off baseline, engine unstable, regression
|
| 16 |
-
in history (3 détecteurs)
|
| 17 |
-
- :mod:`ensemble` — ensemble opportunity (1 détecteur)
|
| 18 |
-
|
| 19 |
-
Total : 18 détecteurs (≠ "12" mentionné dans CLAUDE.md historique —
|
| 20 |
-
le chantier 5 corrige ce comptage).
|
| 21 |
-
|
| 22 |
-
Rétrocompatibilité absolue
|
| 23 |
-
--------------------------
|
| 24 |
-
Tous les noms exportés par l'ancien fichier ``detectors.py``
|
| 25 |
-
(``detect_*``, ``DETECTORS_BY_TYPE``, ``register_default_detectors``)
|
| 26 |
-
restent accessibles via ``from picarones.core.narrative.detectors
|
| 27 |
-
import ...``. Les tests Sprints 20, 23, 29, 36, 44, 46, 73 importent
|
| 28 |
-
directement ces noms et continuent à fonctionner sans modification.
|
| 29 |
-
|
| 30 |
-
L'enregistrement automatique des détecteurs via ``@register_detector``
|
| 31 |
-
se fait à l'import de ce package — chaque sous-module est importé ici
|
| 32 |
-
en cascade.
|
| 33 |
"""
|
| 34 |
|
| 35 |
-
from
|
| 36 |
-
|
| 37 |
-
# Imports en cascade des 6 sous-modules : déclenche l'enregistrement
|
| 38 |
-
# automatique via les décorateurs ``@register_detector`` au chargement.
|
| 39 |
-
from picarones.core.narrative.detectors.ranking import (
|
| 40 |
-
detect_global_leader_cer,
|
| 41 |
-
detect_median_mean_gap_warning,
|
| 42 |
-
detect_significant_gap,
|
| 43 |
-
detect_speed_winner,
|
| 44 |
-
detect_statistical_tie,
|
| 45 |
-
)
|
| 46 |
-
from picarones.core.narrative.detectors.pareto import (
|
| 47 |
-
detect_cost_outlier,
|
| 48 |
-
detect_pareto_alternative,
|
| 49 |
-
)
|
| 50 |
-
from picarones.core.narrative.detectors.stratum import (
|
| 51 |
-
detect_stratification_recommended,
|
| 52 |
-
detect_stratum_collapse,
|
| 53 |
-
detect_stratum_winner,
|
| 54 |
-
)
|
| 55 |
-
from picarones.core.narrative.detectors.quality import (
|
| 56 |
-
detect_confidence_warning,
|
| 57 |
-
detect_error_profile_outlier,
|
| 58 |
-
detect_llm_hallucination_flag,
|
| 59 |
-
detect_robustness_fragile,
|
| 60 |
-
)
|
| 61 |
-
from picarones.core.narrative.detectors.history import (
|
| 62 |
-
detect_engine_off_baseline,
|
| 63 |
-
detect_engine_unstable,
|
| 64 |
-
detect_regression_in_history,
|
| 65 |
-
)
|
| 66 |
-
from picarones.core.narrative.detectors.ensemble import (
|
| 67 |
-
detect_ensemble_opportunity,
|
| 68 |
-
)
|
| 69 |
-
|
| 70 |
-
# Snapshot du registre + helper d'enregistrement legacy — déplacés
|
| 71 |
-
# verbatim depuis l'ancien ``detectors.py`` (lignes 1193-1229).
|
| 72 |
-
from picarones.core.narrative.facts import DetectorFn, FactType
|
| 73 |
-
from picarones.core.narrative.registry import (
|
| 74 |
-
iter_detectors as _iter_detectors,
|
| 75 |
-
populate_legacy_registry as _populate_legacy_registry,
|
| 76 |
-
)
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
def _build_detectors_by_type() -> dict[FactType, DetectorFn]:
|
| 80 |
-
"""Snapshot du registre déclaratif vers un dict ``{type: fn}``."""
|
| 81 |
-
return {entry.fact_type: entry.fn for entry in _iter_detectors()}
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
# Vue figée à l'import — utile pour les tests qui parcourent les types
|
| 85 |
-
# enregistrés sans instancier un ``DetectorRegistry``.
|
| 86 |
-
DETECTORS_BY_TYPE = _build_detectors_by_type()
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
def register_default_detectors(registry) -> None:
|
| 90 |
-
"""Enregistre les détecteurs du registre déclaratif dans un
|
| 91 |
-
``DetectorRegistry`` historique.
|
| 92 |
-
|
| 93 |
-
Sprint 29 : la source de vérité est maintenant le décorateur
|
| 94 |
-
``@register_detector`` ; cette fonction se contente de pousser
|
| 95 |
-
le contenu du registre vers l'objet ``DetectorRegistry`` que les
|
| 96 |
-
consommateurs externes (``DetectorRegistry.run``) instancient.
|
| 97 |
-
"""
|
| 98 |
-
_populate_legacy_registry(registry)
|
| 99 |
-
|
| 100 |
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
"
|
| 104 |
-
|
| 105 |
-
"detect_significant_gap",
|
| 106 |
-
"detect_speed_winner",
|
| 107 |
-
"detect_statistical_tie",
|
| 108 |
-
# pareto
|
| 109 |
-
"detect_cost_outlier",
|
| 110 |
-
"detect_pareto_alternative",
|
| 111 |
-
# stratum
|
| 112 |
-
"detect_stratification_recommended",
|
| 113 |
-
"detect_stratum_collapse",
|
| 114 |
-
"detect_stratum_winner",
|
| 115 |
-
# quality
|
| 116 |
-
"detect_confidence_warning",
|
| 117 |
-
"detect_error_profile_outlier",
|
| 118 |
-
"detect_llm_hallucination_flag",
|
| 119 |
-
"detect_robustness_fragile",
|
| 120 |
-
# history
|
| 121 |
-
"detect_engine_off_baseline",
|
| 122 |
-
"detect_engine_unstable",
|
| 123 |
-
"detect_regression_in_history",
|
| 124 |
-
# ensemble
|
| 125 |
-
"detect_ensemble_opportunity",
|
| 126 |
-
# legacy
|
| 127 |
-
"DETECTORS_BY_TYPE",
|
| 128 |
-
"register_default_detectors",
|
| 129 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — package déplacé dans :mod:`picarones.measurements.narrative.detectors`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte. Les 18 détecteurs en 6 familles
|
| 4 |
+
(ranking, pareto, stratum, quality, history, ensemble) vivent
|
| 5 |
+
désormais dans ``picarones.measurements.narrative.detectors/``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.detectors import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.detectors as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,31 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
``picarones
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
Ces fonctions étaient privées (préfixe ``_``) au module historique.
|
| 8 |
-
Elles sont conservées telles quelles ici ; les sous-modules les
|
| 9 |
-
importent.
|
| 10 |
"""
|
| 11 |
|
| 12 |
-
from
|
| 13 |
-
|
| 14 |
-
from typing import Optional
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
def _engines_summary(data: dict) -> list[dict]:
|
| 18 |
-
"""Accès normalisé à la liste des résumés moteur."""
|
| 19 |
-
return data.get("engines", []) or []
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
def _engine_by_name(data: dict, name: str) -> Optional[dict]:
|
| 23 |
-
for e in _engines_summary(data):
|
| 24 |
-
if e.get("name") == name:
|
| 25 |
-
return e
|
| 26 |
-
return None
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
|
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.narrative.detectors._helpers`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2 — measurements/) a quitté ``picarones.core.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.detectors._helpers import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.detectors._helpers as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
@@ -1,96 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
"""
|
| 7 |
|
| 8 |
-
from
|
| 9 |
-
|
| 10 |
-
import statistics as _stats
|
| 11 |
-
from typing import Optional
|
| 12 |
-
|
| 13 |
-
from picarones.core.narrative.facts import Fact, FactImportance, FactType
|
| 14 |
-
from picarones.core.narrative.registry import register_detector
|
| 15 |
-
|
| 16 |
-
from picarones.core.narrative.detectors._helpers import (
|
| 17 |
-
_engine_by_name,
|
| 18 |
-
_engines_summary,
|
| 19 |
-
_n_docs,
|
| 20 |
-
)
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
@register_detector(
|
| 24 |
-
FactType.ENSEMBLE_OPPORTUNITY,
|
| 25 |
-
priority=130,
|
| 26 |
-
importance=FactImportance.MEDIUM,
|
| 27 |
-
)
|
| 28 |
-
def detect_ensemble_opportunity(benchmark_data: dict) -> list[Fact]:
|
| 29 |
-
"""Deux moteurs très complémentaires : un voting majoritaire entre eux
|
| 30 |
-
pourrait améliorer significativement le CER token-level.
|
| 31 |
-
|
| 32 |
-
Lit la structure ``inter_engine_analysis`` produite par le runner
|
| 33 |
-
(Sprint 35-36) et déclenche si la fraction d'erreurs du meilleur
|
| 34 |
-
moteur récupérable par un ensemble dépasse 25 %.
|
| 35 |
-
|
| 36 |
-
L'importance monte à ``HIGH`` quand le gap relatif dépasse 50 %
|
| 37 |
-
(ensemble franchement profitable) — sinon reste à ``MEDIUM``.
|
| 38 |
-
"""
|
| 39 |
-
iea = benchmark_data.get("inter_engine_analysis") or {}
|
| 40 |
-
comp = iea.get("complementarity") or {}
|
| 41 |
-
if not comp:
|
| 42 |
-
return []
|
| 43 |
-
|
| 44 |
-
relative_gap = float(comp.get("relative_gap") or 0.0)
|
| 45 |
-
if relative_gap < 0.25:
|
| 46 |
-
# En deçà de 25 %, l'ensemble n'apporterait quasi rien — on ne
|
| 47 |
-
# remonte pas le fait pour ne pas bruiter la synthèse.
|
| 48 |
-
return []
|
| 49 |
-
|
| 50 |
-
best_engine = comp.get("best_engine") or ""
|
| 51 |
-
if not best_engine:
|
| 52 |
-
return []
|
| 53 |
-
|
| 54 |
-
payload: dict = {
|
| 55 |
-
"best_engine": best_engine,
|
| 56 |
-
"best_recall_pct": round(float(comp.get("best_single_recall") or 0.0) * 100, 2),
|
| 57 |
-
"oracle_recall_pct": round(float(comp.get("oracle_recall") or 0.0) * 100, 2),
|
| 58 |
-
"absolute_gap_pct": round(float(comp.get("absolute_gap") or 0.0) * 100, 2),
|
| 59 |
-
"relative_gap_pct": round(relative_gap * 100, 1),
|
| 60 |
-
"doc_count": int(comp.get("doc_count") or 0),
|
| 61 |
-
}
|
| 62 |
-
|
| 63 |
-
# Paire la plus complémentaire — la divergence taxonomique, quand
|
| 64 |
-
# disponible, fournit deux moteurs « candidats naturels ». Sinon on
|
| 65 |
-
# tombe sur le best + le second-best en recall individuel.
|
| 66 |
-
div = iea.get("taxonomy_divergence") or {}
|
| 67 |
-
pair = div.get("max_pair") or []
|
| 68 |
-
pair_a = ""
|
| 69 |
-
pair_b = ""
|
| 70 |
-
divergence_value: Optional[float] = None
|
| 71 |
-
if pair and len(pair) >= 3 and isinstance(pair[2], (int, float)) and pair[2] > 0:
|
| 72 |
-
pair_a, pair_b, divergence_value = str(pair[0]), str(pair[1]), float(pair[2])
|
| 73 |
-
else:
|
| 74 |
-
# Fallback : best engine + second-best engine par recall individuel
|
| 75 |
-
per_engine = comp.get("per_engine_recall") or {}
|
| 76 |
-
if len(per_engine) >= 2:
|
| 77 |
-
ranked = sorted(per_engine.items(), key=lambda kv: kv[1], reverse=True)
|
| 78 |
-
pair_a, pair_b = ranked[0][0], ranked[1][0]
|
| 79 |
-
|
| 80 |
-
payload["pair_a"] = pair_a
|
| 81 |
-
payload["pair_b"] = pair_b
|
| 82 |
-
payload["divergence"] = round(divergence_value, 3) if divergence_value is not None else 0.0
|
| 83 |
-
payload["divergence_metric"] = (div.get("metric") or "js")
|
| 84 |
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
)
|
| 88 |
-
|
| 89 |
-
(pair_a, pair_b) if pair_a and pair_b else (best_engine,)
|
| 90 |
-
)
|
| 91 |
-
return [Fact(
|
| 92 |
-
type=FactType.ENSEMBLE_OPPORTUNITY,
|
| 93 |
-
importance=importance,
|
| 94 |
-
payload=payload,
|
| 95 |
-
engines_involved=engines_involved,
|
| 96 |
-
)]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.narrative.detectors.ensemble`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2 — measurements/) a quitté ``picarones.core.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques.
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.detectors.ensemble import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.detectors.ensemble as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,280 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
- :func:`detect_engine_unstable` (Sprint 90)
|
| 7 |
-
- :func:`detect_regression_in_history` (Sprint 92)
|
| 8 |
"""
|
| 9 |
|
| 10 |
-
from
|
| 11 |
-
|
| 12 |
-
import statistics as _stats
|
| 13 |
-
from typing import Optional
|
| 14 |
-
|
| 15 |
-
from picarones.core.narrative.facts import Fact, FactImportance, FactType
|
| 16 |
-
from picarones.core.narrative.registry import register_detector
|
| 17 |
-
|
| 18 |
-
from picarones.core.narrative.detectors._helpers import (
|
| 19 |
-
_engine_by_name,
|
| 20 |
-
_engines_summary,
|
| 21 |
-
_n_docs,
|
| 22 |
-
)
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
@register_detector(
|
| 26 |
-
FactType.ENGINE_OFF_BASELINE,
|
| 27 |
-
priority=150,
|
| 28 |
-
importance=FactImportance.MEDIUM,
|
| 29 |
-
)
|
| 30 |
-
def detect_engine_off_baseline(benchmark_data: dict) -> list[Fact]:
|
| 31 |
-
"""Émet un Fact pour chaque moteur dont le CER courant s'écarte
|
| 32 |
-
significativement de sa moyenne historique sur le **même corpus**.
|
| 33 |
-
|
| 34 |
-
Lit ``benchmark_data["baseline_comparisons"]`` (liste de dicts
|
| 35 |
-
produits par ``compute_engine_baseline`` du module
|
| 36 |
-
``baseline_comparison`` Sprint 73). Si la clé est absente ou
|
| 37 |
-
vide, le détecteur reste silencieux — typiquement le cas quand
|
| 38 |
-
aucun historique SQLite n'a été chargé.
|
| 39 |
-
|
| 40 |
-
Garde-fous :
|
| 41 |
-
|
| 42 |
-
- Si ``n_runs < 5`` (déjà filtré par ``compute_engine_baseline``
|
| 43 |
-
qui retourne ``None`` dans ce cas).
|
| 44 |
-
- Si ``relative_delta`` n'est pas calculable (baseline = 0).
|
| 45 |
-
- Importance ``HIGH`` si ``|relative_delta| ≥ 50 %``, sinon
|
| 46 |
-
``MEDIUM``.
|
| 47 |
-
"""
|
| 48 |
-
comparisons = benchmark_data.get("baseline_comparisons") or []
|
| 49 |
-
if not isinstance(comparisons, (list, tuple)):
|
| 50 |
-
return []
|
| 51 |
-
facts: list[Fact] = []
|
| 52 |
-
for comp in comparisons:
|
| 53 |
-
if not isinstance(comp, dict):
|
| 54 |
-
continue
|
| 55 |
-
if not comp.get("off_baseline"):
|
| 56 |
-
continue
|
| 57 |
-
rel = comp.get("relative_delta")
|
| 58 |
-
if rel is None:
|
| 59 |
-
continue
|
| 60 |
-
engine = comp.get("engine_name")
|
| 61 |
-
cer_current = comp.get("cer_current")
|
| 62 |
-
cer_hist_mean = comp.get("cer_historical_mean")
|
| 63 |
-
n_runs = comp.get("n_runs")
|
| 64 |
-
if engine is None or cer_current is None or cer_hist_mean is None:
|
| 65 |
-
continue
|
| 66 |
-
importance = (
|
| 67 |
-
FactImportance.HIGH if abs(float(rel)) >= 0.50
|
| 68 |
-
else FactImportance.MEDIUM
|
| 69 |
-
)
|
| 70 |
-
facts.append(Fact(
|
| 71 |
-
type=FactType.ENGINE_OFF_BASELINE,
|
| 72 |
-
importance=importance,
|
| 73 |
-
payload={
|
| 74 |
-
"engine": engine,
|
| 75 |
-
"cer_current_pct": round(float(cer_current) * 100, 2),
|
| 76 |
-
"cer_historical_mean_pct": round(
|
| 77 |
-
float(cer_hist_mean) * 100, 2,
|
| 78 |
-
),
|
| 79 |
-
"n_runs": int(n_runs or 0),
|
| 80 |
-
"relative_delta_pct": round(float(rel) * 100, 1),
|
| 81 |
-
"direction": "higher" if float(rel) > 0 else "lower",
|
| 82 |
-
},
|
| 83 |
-
engines_involved=(engine,),
|
| 84 |
-
))
|
| 85 |
-
return facts
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
@register_detector(
|
| 89 |
-
FactType.ENGINE_UNSTABLE,
|
| 90 |
-
priority=160,
|
| 91 |
-
importance=FactImportance.HIGH,
|
| 92 |
-
)
|
| 93 |
-
def detect_engine_unstable(benchmark_data: dict) -> list[Fact]:
|
| 94 |
-
"""Émet un Fact pour chaque moteur dont la stabilité multi-runs
|
| 95 |
-
est insuffisante (Sprint 83 + 90).
|
| 96 |
-
|
| 97 |
-
Lit ``benchmark_data["multirun_stability"]`` : liste de dicts
|
| 98 |
-
avec ``engine_name`` + champs de ``compute_multirun_stability``
|
| 99 |
-
(cer_cv, identical_run_rate, n_runs, etc.). Si la clé est
|
| 100 |
-
absente ou vide, le détecteur reste silencieux — typiquement
|
| 101 |
-
le cas quand l'utilisateur n'a pas exécuté `--repeats N`.
|
| 102 |
-
|
| 103 |
-
Garde-fous :
|
| 104 |
-
|
| 105 |
-
- ``n_runs ≥ 2`` (déjà filtré par
|
| 106 |
-
``compute_multirun_stability`` qui retourne ``None``).
|
| 107 |
-
- Déclenche si ``cer_cv > 0.10`` (variance relative > 10 % du
|
| 108 |
-
CER moyen) **ou** ``identical_run_rate < 0.50`` (moins
|
| 109 |
-
d'une paire de runs sur deux est identique).
|
| 110 |
-
- Importance ``HIGH`` (l'instabilité discrédite les
|
| 111 |
-
conclusions).
|
| 112 |
-
"""
|
| 113 |
-
stabilities = benchmark_data.get("multirun_stability") or []
|
| 114 |
-
if not isinstance(stabilities, (list, tuple)):
|
| 115 |
-
return []
|
| 116 |
-
facts: list[Fact] = []
|
| 117 |
-
for stab in stabilities:
|
| 118 |
-
if not isinstance(stab, dict):
|
| 119 |
-
continue
|
| 120 |
-
engine = stab.get("engine_name") or stab.get("engine")
|
| 121 |
-
if not engine:
|
| 122 |
-
continue
|
| 123 |
-
n_runs = stab.get("n_runs")
|
| 124 |
-
if not isinstance(n_runs, int) or n_runs < 2:
|
| 125 |
-
continue
|
| 126 |
-
cer_cv = stab.get("cer_cv")
|
| 127 |
-
identical_rate = stab.get("identical_run_rate")
|
| 128 |
-
# Critères de déclenchement
|
| 129 |
-
cv_high = (
|
| 130 |
-
isinstance(cer_cv, (int, float)) and float(cer_cv) > 0.10
|
| 131 |
-
)
|
| 132 |
-
runs_diverge = (
|
| 133 |
-
isinstance(identical_rate, (int, float))
|
| 134 |
-
and float(identical_rate) < 0.50
|
| 135 |
-
)
|
| 136 |
-
if not (cv_high or runs_diverge):
|
| 137 |
-
continue
|
| 138 |
-
payload: dict = {
|
| 139 |
-
"engine": engine,
|
| 140 |
-
"n_runs": int(n_runs),
|
| 141 |
-
}
|
| 142 |
-
if isinstance(cer_cv, (int, float)):
|
| 143 |
-
payload["cer_cv"] = float(cer_cv)
|
| 144 |
-
payload["cer_cv_pct"] = round(float(cer_cv) * 100, 1)
|
| 145 |
-
if isinstance(identical_rate, (int, float)):
|
| 146 |
-
payload["identical_run_rate"] = float(identical_rate)
|
| 147 |
-
payload["identical_run_rate_pct"] = round(
|
| 148 |
-
float(identical_rate) * 100, 1,
|
| 149 |
-
)
|
| 150 |
-
# Champs additionnels pour la phrase de synthèse
|
| 151 |
-
cer_mean = stab.get("cer_mean")
|
| 152 |
-
cer_stdev = stab.get("cer_stdev")
|
| 153 |
-
if isinstance(cer_mean, (int, float)):
|
| 154 |
-
payload["cer_mean_pct"] = round(float(cer_mean) * 100, 2)
|
| 155 |
-
if isinstance(cer_stdev, (int, float)):
|
| 156 |
-
payload["cer_stdev_pct"] = round(float(cer_stdev) * 100, 2)
|
| 157 |
-
n_distinct = stab.get("n_distinct_outputs")
|
| 158 |
-
if isinstance(n_distinct, int):
|
| 159 |
-
payload["n_distinct_outputs"] = int(n_distinct)
|
| 160 |
-
facts.append(Fact(
|
| 161 |
-
type=FactType.ENGINE_UNSTABLE,
|
| 162 |
-
importance=FactImportance.HIGH,
|
| 163 |
-
payload=payload,
|
| 164 |
-
engines_involved=(engine,),
|
| 165 |
-
))
|
| 166 |
-
return facts
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
@register_detector(
|
| 170 |
-
FactType.REGRESSION_IN_HISTORY,
|
| 171 |
-
priority=170,
|
| 172 |
-
importance=FactImportance.MEDIUM,
|
| 173 |
-
)
|
| 174 |
-
def detect_regression_in_history(benchmark_data: dict) -> list[Fact]:
|
| 175 |
-
"""Émet un Fact pour chaque moteur dont l'historique montre
|
| 176 |
-
une dégradation : pente positive significative ou rupture
|
| 177 |
-
brutale (Sprint 92).
|
| 178 |
-
|
| 179 |
-
Lit ``benchmark_data["longitudinal_trends"]`` : liste de
|
| 180 |
-
dicts produits par ``compute_corpus_longitudinal`` du module
|
| 181 |
-
``longitudinal``. Si la clé est absente ou vide, le
|
| 182 |
-
détecteur reste silencieux — typiquement le cas quand
|
| 183 |
-
aucun historique n'a été chargé ou que la série est trop
|
| 184 |
-
courte.
|
| 185 |
-
|
| 186 |
-
Garde-fous :
|
| 187 |
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
défaut équivalent à +1 point CER sur 365 jours), **soit**
|
| 193 |
-
``change_point.delta > change_threshold`` (défaut
|
| 194 |
-
0.01 = +1 point de CER d'un segment à l'autre).
|
| 195 |
-
- Importance ``HIGH`` si la dégradation cumulée
|
| 196 |
-
``absolute_delta`` ≥ 5 points de CER.
|
| 197 |
-
"""
|
| 198 |
-
trends = benchmark_data.get("longitudinal_trends") or []
|
| 199 |
-
if not isinstance(trends, (list, tuple)):
|
| 200 |
-
return []
|
| 201 |
-
slope_threshold = (
|
| 202 |
-
0.01 / 365.0 # +1 point de CER sur 365 jours minimum
|
| 203 |
-
)
|
| 204 |
-
change_threshold = 0.01
|
| 205 |
-
facts: list[Fact] = []
|
| 206 |
-
for entry in trends:
|
| 207 |
-
if not isinstance(entry, dict):
|
| 208 |
-
continue
|
| 209 |
-
engine = entry.get("engine_name")
|
| 210 |
-
if not engine:
|
| 211 |
-
continue
|
| 212 |
-
n_runs = entry.get("n_runs")
|
| 213 |
-
if not isinstance(n_runs, int) or n_runs < 3:
|
| 214 |
-
continue
|
| 215 |
-
trend = entry.get("trend") or {}
|
| 216 |
-
cp = entry.get("change_point")
|
| 217 |
-
slope = trend.get("slope")
|
| 218 |
-
slope_high = (
|
| 219 |
-
isinstance(slope, (int, float))
|
| 220 |
-
and float(slope) > slope_threshold
|
| 221 |
-
)
|
| 222 |
-
cp_high = (
|
| 223 |
-
isinstance(cp, dict)
|
| 224 |
-
and isinstance(cp.get("delta"), (int, float))
|
| 225 |
-
and float(cp["delta"]) > change_threshold
|
| 226 |
-
)
|
| 227 |
-
if not (slope_high or cp_high):
|
| 228 |
-
continue
|
| 229 |
-
absolute_delta = entry.get("absolute_delta") or 0.0
|
| 230 |
-
importance = (
|
| 231 |
-
FactImportance.HIGH
|
| 232 |
-
if isinstance(absolute_delta, (int, float))
|
| 233 |
-
and abs(float(absolute_delta)) >= 0.05
|
| 234 |
-
else FactImportance.MEDIUM
|
| 235 |
-
)
|
| 236 |
-
payload: dict = {
|
| 237 |
-
"engine": engine,
|
| 238 |
-
"n_runs": int(n_runs),
|
| 239 |
-
"absolute_delta_pct": round(
|
| 240 |
-
float(absolute_delta) * 100, 2,
|
| 241 |
-
) if isinstance(absolute_delta, (int, float)) else 0.0,
|
| 242 |
-
"first_cer_pct": round(
|
| 243 |
-
float(entry.get("first_cer") or 0.0) * 100, 2,
|
| 244 |
-
),
|
| 245 |
-
"last_cer_pct": round(
|
| 246 |
-
float(entry.get("last_cer") or 0.0) * 100, 2,
|
| 247 |
-
),
|
| 248 |
-
}
|
| 249 |
-
if slope_high:
|
| 250 |
-
payload["slope_per_year_pct"] = round(
|
| 251 |
-
float(slope) * 365 * 100, 2,
|
| 252 |
-
)
|
| 253 |
-
payload["r_squared"] = round(
|
| 254 |
-
float(trend.get("r_squared") or 0.0), 3,
|
| 255 |
-
)
|
| 256 |
-
payload["pattern"] = "trend"
|
| 257 |
-
if cp_high:
|
| 258 |
-
payload["change_point_timestamp"] = str(
|
| 259 |
-
cp.get("timestamp") or "?",
|
| 260 |
-
)
|
| 261 |
-
payload["change_delta_pct"] = round(
|
| 262 |
-
float(cp["delta"]) * 100, 2,
|
| 263 |
-
)
|
| 264 |
-
payload["mean_before_pct"] = round(
|
| 265 |
-
float(cp.get("mean_before") or 0.0) * 100, 2,
|
| 266 |
-
)
|
| 267 |
-
payload["mean_after_pct"] = round(
|
| 268 |
-
float(cp.get("mean_after") or 0.0) * 100, 2,
|
| 269 |
-
)
|
| 270 |
-
# Si on a aussi une rupture, le pattern domine
|
| 271 |
-
payload["pattern"] = (
|
| 272 |
-
"trend_and_change_point" if slope_high else "change_point"
|
| 273 |
-
)
|
| 274 |
-
facts.append(Fact(
|
| 275 |
-
type=FactType.REGRESSION_IN_HISTORY,
|
| 276 |
-
importance=importance,
|
| 277 |
-
payload=payload,
|
| 278 |
-
engines_involved=(engine,),
|
| 279 |
-
))
|
| 280 |
-
return facts
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.narrative.detectors.history`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2 — measurements/) a quitté ``picarones.core.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques.
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.detectors.history import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.detectors.history as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,136 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
- :func:`detect_cost_outlier` (Sprint 19) — moteur dont le coût est aberrant
|
| 7 |
"""
|
| 8 |
|
| 9 |
-
from
|
| 10 |
-
|
| 11 |
-
import statistics as _stats
|
| 12 |
-
from typing import Optional
|
| 13 |
-
|
| 14 |
-
from picarones.core.narrative.facts import Fact, FactImportance, FactType
|
| 15 |
-
from picarones.core.narrative.registry import register_detector
|
| 16 |
-
|
| 17 |
-
from picarones.core.narrative.detectors._helpers import (
|
| 18 |
-
_engine_by_name,
|
| 19 |
-
_engines_summary,
|
| 20 |
-
_n_docs,
|
| 21 |
-
)
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
@register_detector(
|
| 25 |
-
FactType.PARETO_ALTERNATIVE,
|
| 26 |
-
priority=90,
|
| 27 |
-
importance=FactImportance.HIGH,
|
| 28 |
-
)
|
| 29 |
-
def detect_pareto_alternative(benchmark_data: dict) -> list[Fact]:
|
| 30 |
-
"""Moteur Pareto-dominant différent du leader CER.
|
| 31 |
-
|
| 32 |
-
Lit ``benchmark_data["pareto"]["cost"]`` (Sprint 19) et émet un Fact si
|
| 33 |
-
la frontière contient un moteur autre que le leader CER, pour souligner
|
| 34 |
-
l'existence d'un compromis coût/qualité intéressant.
|
| 35 |
-
"""
|
| 36 |
-
pareto = (benchmark_data.get("pareto") or {}).get("cost") or {}
|
| 37 |
-
front = pareto.get("front") or []
|
| 38 |
-
points = pareto.get("points") or []
|
| 39 |
-
if len(front) < 2:
|
| 40 |
-
return []
|
| 41 |
-
|
| 42 |
-
ranking = benchmark_data.get("ranking") or []
|
| 43 |
-
if not ranking:
|
| 44 |
-
return []
|
| 45 |
-
leader = ranking[0].get("engine")
|
| 46 |
-
|
| 47 |
-
# Le moteur le moins cher sur le front (hors leader)
|
| 48 |
-
alt: Optional[dict] = None
|
| 49 |
-
for p in points:
|
| 50 |
-
if p.get("engine") == leader:
|
| 51 |
-
continue
|
| 52 |
-
if p.get("engine") not in front:
|
| 53 |
-
continue
|
| 54 |
-
if alt is None or float(p.get("cost") or 0.0) < float(alt.get("cost") or 0.0):
|
| 55 |
-
alt = p
|
| 56 |
-
if alt is None:
|
| 57 |
-
return []
|
| 58 |
-
|
| 59 |
-
leader_point = next((p for p in points if p.get("engine") == leader), None)
|
| 60 |
-
if leader_point is None:
|
| 61 |
-
return []
|
| 62 |
-
|
| 63 |
-
alt_cer = float(alt.get("cer") or 0.0)
|
| 64 |
-
alt_cost = float(alt.get("cost") or 0.0)
|
| 65 |
-
leader_cer = float(leader_point.get("cer") or 0.0)
|
| 66 |
-
leader_cost = float(leader_point.get("cost") or 0.0)
|
| 67 |
-
if alt_cost >= leader_cost or alt_cost <= 0:
|
| 68 |
-
return [] # pas réellement moins cher — pas intéressant à remonter
|
| 69 |
-
|
| 70 |
-
return [Fact(
|
| 71 |
-
type=FactType.PARETO_ALTERNATIVE,
|
| 72 |
-
importance=FactImportance.HIGH,
|
| 73 |
-
payload={
|
| 74 |
-
"engine": alt["engine"],
|
| 75 |
-
"leader": leader,
|
| 76 |
-
"cer": round(alt_cer, 4),
|
| 77 |
-
"cer_pct": round(alt_cer * 100, 2),
|
| 78 |
-
"cost": round(alt_cost, 2),
|
| 79 |
-
"leader_cer": round(leader_cer, 4),
|
| 80 |
-
"leader_cer_pct": round(leader_cer * 100, 2),
|
| 81 |
-
"leader_cost": round(leader_cost, 2),
|
| 82 |
-
"cost_saving_ratio": round(leader_cost / alt_cost, 1) if alt_cost > 0 else None,
|
| 83 |
-
"delta_cer_pct": round((alt_cer - leader_cer) * 100, 2),
|
| 84 |
-
# Unité du coût — propagée pour traçabilité (le template ne
|
| 85 |
-
# hardcode plus "1000 pages").
|
| 86 |
-
"cost_unit_pages": 1000,
|
| 87 |
-
},
|
| 88 |
-
engines_involved=(alt["engine"],),
|
| 89 |
-
)]
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
@register_detector(
|
| 93 |
-
FactType.COST_OUTLIER,
|
| 94 |
-
priority=110,
|
| 95 |
-
importance=FactImportance.MEDIUM,
|
| 96 |
-
)
|
| 97 |
-
def detect_cost_outlier(benchmark_data: dict) -> list[Fact]:
|
| 98 |
-
"""Moteur dont le coût est très disproportionné par rapport à son apport.
|
| 99 |
-
|
| 100 |
-
Flag un moteur dont le coût ≥ 5× la médiane ET qui n'est pas sur le
|
| 101 |
-
front Pareto (donc dominé par moins cher OU meilleur CER).
|
| 102 |
-
"""
|
| 103 |
-
pareto = (benchmark_data.get("pareto") or {}).get("cost") or {}
|
| 104 |
-
points = pareto.get("points") or []
|
| 105 |
-
front = set(pareto.get("front") or [])
|
| 106 |
-
if len(points) < 3:
|
| 107 |
-
return []
|
| 108 |
-
|
| 109 |
-
costs = [float(p["cost"]) for p in points if p.get("cost") is not None]
|
| 110 |
-
if not costs:
|
| 111 |
-
return []
|
| 112 |
-
median_cost = _stats.median(costs)
|
| 113 |
-
if median_cost <= 0:
|
| 114 |
-
return []
|
| 115 |
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
continue
|
| 121 |
-
if p["engine"] in front:
|
| 122 |
-
continue # sur le front → coût justifié par une qualité unique
|
| 123 |
-
facts.append(Fact(
|
| 124 |
-
type=FactType.COST_OUTLIER,
|
| 125 |
-
importance=FactImportance.MEDIUM,
|
| 126 |
-
payload={
|
| 127 |
-
"engine": p["engine"],
|
| 128 |
-
"cost": round(c, 2),
|
| 129 |
-
"median_cost": round(median_cost, 2),
|
| 130 |
-
"ratio_to_median": round(c / median_cost, 1),
|
| 131 |
-
"cer_pct": round(float(p.get("cer") or 0.0) * 100, 2),
|
| 132 |
-
"cost_unit_pages": 1000,
|
| 133 |
-
},
|
| 134 |
-
engines_involved=(p["engine"],),
|
| 135 |
-
))
|
| 136 |
-
return facts
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.narrative.detectors.pareto`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2 — measurements/) a quitté ``picarones.core.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques.
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.detectors.pareto import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.detectors.pareto as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,251 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
- :func:`detect_llm_hallucination_flag` (Sprint 4)
|
| 7 |
-
- :func:`detect_robustness_fragile` (Sprint 4)
|
| 8 |
-
- :func:`detect_confidence_warning` (Sprint 4)
|
| 9 |
"""
|
| 10 |
|
| 11 |
-
from
|
| 12 |
-
|
| 13 |
-
import statistics as _stats
|
| 14 |
-
from typing import Optional
|
| 15 |
-
|
| 16 |
-
from picarones.core.narrative.facts import Fact, FactImportance, FactType
|
| 17 |
-
from picarones.core.narrative.registry import register_detector
|
| 18 |
-
|
| 19 |
-
from picarones.core.narrative.detectors._helpers import (
|
| 20 |
-
_engine_by_name,
|
| 21 |
-
_engines_summary,
|
| 22 |
-
_n_docs,
|
| 23 |
-
)
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
@register_detector(
|
| 27 |
-
FactType.ERROR_PROFILE_OUTLIER,
|
| 28 |
-
priority=60,
|
| 29 |
-
importance=FactImportance.MEDIUM,
|
| 30 |
-
)
|
| 31 |
-
def detect_error_profile_outlier(benchmark_data: dict) -> list[Fact]:
|
| 32 |
-
"""Moteur au profil taxonomique atypique.
|
| 33 |
-
|
| 34 |
-
Émet un Fact si, pour un moteur et une classe d'erreur, la part relative
|
| 35 |
-
est au moins 2× plus élevée que la médiane des autres moteurs (et > 15 %
|
| 36 |
-
du total pour éviter les strates marginales).
|
| 37 |
-
"""
|
| 38 |
-
engines = _engines_summary(benchmark_data)
|
| 39 |
-
# {engine: {class_name: proportion}}
|
| 40 |
-
profiles: dict[str, dict[str, float]] = {}
|
| 41 |
-
for e in engines:
|
| 42 |
-
tax = e.get("aggregated_taxonomy") or {}
|
| 43 |
-
distribution = tax.get("distribution") or tax.get("proportions") or {}
|
| 44 |
-
if not distribution:
|
| 45 |
-
continue
|
| 46 |
-
profiles[e["name"]] = {k: float(v) for k, v in distribution.items()}
|
| 47 |
-
if len(profiles) < 2:
|
| 48 |
-
return []
|
| 49 |
-
|
| 50 |
-
# Collecter toutes les classes rencontrées
|
| 51 |
-
all_classes: set[str] = set()
|
| 52 |
-
for p in profiles.values():
|
| 53 |
-
all_classes.update(p.keys())
|
| 54 |
-
|
| 55 |
-
facts: list[Fact] = []
|
| 56 |
-
for cls in all_classes:
|
| 57 |
-
values = [(name, p.get(cls, 0.0)) for name, p in profiles.items()]
|
| 58 |
-
props = [v for _, v in values]
|
| 59 |
-
if not props:
|
| 60 |
-
continue
|
| 61 |
-
median_prop = _stats.median(props)
|
| 62 |
-
for name, v in values:
|
| 63 |
-
if v < 0.15: # trop marginal pour être notable
|
| 64 |
-
continue
|
| 65 |
-
if median_prop <= 0:
|
| 66 |
-
continue
|
| 67 |
-
if v >= 2.0 * median_prop:
|
| 68 |
-
facts.append(Fact(
|
| 69 |
-
type=FactType.ERROR_PROFILE_OUTLIER,
|
| 70 |
-
importance=FactImportance.HIGH,
|
| 71 |
-
payload={
|
| 72 |
-
"engine": name,
|
| 73 |
-
"error_class": cls,
|
| 74 |
-
"proportion": round(v, 4),
|
| 75 |
-
"proportion_pct": round(v * 100, 1),
|
| 76 |
-
"median_proportion": round(median_prop, 4),
|
| 77 |
-
"median_proportion_pct": round(median_prop * 100, 1),
|
| 78 |
-
"ratio_to_median": round(v / median_prop, 2) if median_prop else None,
|
| 79 |
-
},
|
| 80 |
-
engines_involved=(name,),
|
| 81 |
-
))
|
| 82 |
-
return facts
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
@register_detector(
|
| 86 |
-
FactType.LLM_HALLUCINATION_FLAG,
|
| 87 |
-
priority=70,
|
| 88 |
-
importance=FactImportance.HIGH,
|
| 89 |
-
)
|
| 90 |
-
def detect_llm_hallucination_flag(benchmark_data: dict) -> list[Fact]:
|
| 91 |
-
"""LLM/VLM au taux d'hallucination notablement élevé.
|
| 92 |
-
|
| 93 |
-
Déclenché si ``hallucinating_doc_rate`` > 30 % OU ``anchor_score_mean`` < 0,6
|
| 94 |
-
pour un moteur dont le champ ``is_pipeline`` ou ``is_vlm`` est ``True``.
|
| 95 |
-
"""
|
| 96 |
-
facts: list[Fact] = []
|
| 97 |
-
for e in _engines_summary(benchmark_data):
|
| 98 |
-
agg = e.get("aggregated_hallucination") or {}
|
| 99 |
-
if not agg:
|
| 100 |
-
continue
|
| 101 |
-
rate = agg.get("hallucinating_doc_rate")
|
| 102 |
-
anchor = agg.get("anchor_score_mean")
|
| 103 |
-
length_ratio = agg.get("length_ratio_mean")
|
| 104 |
-
# Signal seulement si c'est un pipeline LLM ou un VLM
|
| 105 |
-
is_llm = bool(e.get("is_pipeline")) or bool(e.get("is_vlm"))
|
| 106 |
-
if not is_llm:
|
| 107 |
-
continue
|
| 108 |
-
|
| 109 |
-
flagged = False
|
| 110 |
-
reasons = []
|
| 111 |
-
if rate is not None and float(rate) > 0.30:
|
| 112 |
-
flagged = True
|
| 113 |
-
reasons.append("taux de documents hallucinés")
|
| 114 |
-
if anchor is not None and float(anchor) < 0.60:
|
| 115 |
-
flagged = True
|
| 116 |
-
reasons.append("ancrage faible")
|
| 117 |
-
if length_ratio is not None and float(length_ratio) > 1.30:
|
| 118 |
-
flagged = True
|
| 119 |
-
reasons.append("sortie anormalement longue")
|
| 120 |
-
if not flagged:
|
| 121 |
-
continue
|
| 122 |
-
|
| 123 |
-
facts.append(Fact(
|
| 124 |
-
type=FactType.LLM_HALLUCINATION_FLAG,
|
| 125 |
-
importance=FactImportance.HIGH,
|
| 126 |
-
payload={
|
| 127 |
-
"engine": e["name"],
|
| 128 |
-
"hallucinating_rate": round(float(rate or 0.0), 4),
|
| 129 |
-
"hallucinating_rate_pct": round(float(rate or 0.0) * 100, 1),
|
| 130 |
-
"anchor_score": round(float(anchor), 3) if anchor is not None else None,
|
| 131 |
-
"length_ratio": round(float(length_ratio), 3) if length_ratio is not None else None,
|
| 132 |
-
"reasons": reasons,
|
| 133 |
-
"reasons_list": ", ".join(reasons),
|
| 134 |
-
},
|
| 135 |
-
engines_involved=(e["name"],),
|
| 136 |
-
))
|
| 137 |
-
return facts
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
@register_detector(
|
| 141 |
-
FactType.ROBUSTNESS_FRAGILE,
|
| 142 |
-
priority=80,
|
| 143 |
-
importance=FactImportance.MEDIUM,
|
| 144 |
-
)
|
| 145 |
-
def detect_robustness_fragile(benchmark_data: dict) -> list[Fact]:
|
| 146 |
-
"""Moteur qui dégrade fortement au-dessus d'un seuil de bruit/flou.
|
| 147 |
-
|
| 148 |
-
Activé si les données de robustesse sont embarquées dans
|
| 149 |
-
``benchmark_data["robustness"]`` (hors scope du benchmark classique,
|
| 150 |
-
produit par ``picarones robustness`` et injecté optionnellement).
|
| 151 |
-
"""
|
| 152 |
-
robustness = benchmark_data.get("robustness")
|
| 153 |
-
if not robustness:
|
| 154 |
-
return []
|
| 155 |
-
|
| 156 |
-
facts: list[Fact] = []
|
| 157 |
-
curves = robustness.get("curves") or robustness.get("engines") or []
|
| 158 |
-
# Structure attendue : [{engine, degradation_type, points: [{level, cer}]}]
|
| 159 |
-
# Flag : CER à niveau max > 3× CER au niveau min.
|
| 160 |
-
for entry in curves:
|
| 161 |
-
engine = entry.get("engine")
|
| 162 |
-
dtype = entry.get("degradation_type")
|
| 163 |
-
points = entry.get("points") or []
|
| 164 |
-
if not engine or not points or len(points) < 2:
|
| 165 |
-
continue
|
| 166 |
-
try:
|
| 167 |
-
sorted_pts = sorted(points, key=lambda p: float(p["level"]))
|
| 168 |
-
except (KeyError, TypeError, ValueError):
|
| 169 |
-
continue
|
| 170 |
-
first, last = sorted_pts[0], sorted_pts[-1]
|
| 171 |
-
c0 = float(first.get("cer") or 0.0)
|
| 172 |
-
c1 = float(last.get("cer") or 0.0)
|
| 173 |
-
if c0 <= 0.01: # éviter division par quasi-zéro
|
| 174 |
-
continue
|
| 175 |
-
if c1 >= 3.0 * c0 and c1 > 0.15:
|
| 176 |
-
facts.append(Fact(
|
| 177 |
-
type=FactType.ROBUSTNESS_FRAGILE,
|
| 178 |
-
importance=FactImportance.HIGH,
|
| 179 |
-
payload={
|
| 180 |
-
"engine": engine,
|
| 181 |
-
"degradation": dtype,
|
| 182 |
-
"cer_baseline": round(c0, 4),
|
| 183 |
-
"cer_baseline_pct": round(c0 * 100, 1),
|
| 184 |
-
"cer_degraded": round(c1, 4),
|
| 185 |
-
"cer_degraded_pct": round(c1 * 100, 1),
|
| 186 |
-
"ratio": round(c1 / c0, 1),
|
| 187 |
-
"level_max": float(last.get("level") or 0),
|
| 188 |
-
},
|
| 189 |
-
engines_involved=(engine,),
|
| 190 |
-
))
|
| 191 |
-
return facts
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
@register_detector(
|
| 195 |
-
FactType.CONFIDENCE_WARNING,
|
| 196 |
-
priority=120,
|
| 197 |
-
importance=FactImportance.MEDIUM,
|
| 198 |
-
)
|
| 199 |
-
def detect_confidence_warning(benchmark_data: dict) -> list[Fact]:
|
| 200 |
-
"""Intervalle de confiance large → classement peu fiable.
|
| 201 |
-
|
| 202 |
-
Déclenché si, pour le leader ou le runner-up, la largeur de l'IC 95 %
|
| 203 |
-
est plus du triple de l'écart |leader − runner-up| OU > 5 points de CER.
|
| 204 |
-
"""
|
| 205 |
-
stats = benchmark_data.get("statistics", {}) or {}
|
| 206 |
-
cis = stats.get("bootstrap_cis") or []
|
| 207 |
-
if len(cis) < 2:
|
| 208 |
-
return []
|
| 209 |
-
|
| 210 |
-
ranking = benchmark_data.get("ranking") or []
|
| 211 |
-
valid = [r for r in ranking if r.get("mean_cer") is not None]
|
| 212 |
-
if len(valid) < 2:
|
| 213 |
-
return []
|
| 214 |
-
|
| 215 |
-
by_name = {c["engine"]: c for c in cis if "engine" in c}
|
| 216 |
-
leader = valid[0]["engine"]
|
| 217 |
-
runner_up = valid[1]["engine"]
|
| 218 |
-
leader_ci = by_name.get(leader)
|
| 219 |
-
runner_ci = by_name.get(runner_up)
|
| 220 |
-
if not leader_ci or not runner_ci:
|
| 221 |
-
return []
|
| 222 |
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
for
|
| 226 |
-
|
| 227 |
-
hi = float(ci.get("ci_upper") or 0.0)
|
| 228 |
-
width = hi - lo
|
| 229 |
-
wide_vs_gap = gap > 0 and width > 3.0 * gap
|
| 230 |
-
wide_absolute = width > 0.05
|
| 231 |
-
if wide_vs_gap or wide_absolute:
|
| 232 |
-
facts.append(Fact(
|
| 233 |
-
type=FactType.CONFIDENCE_WARNING,
|
| 234 |
-
importance=FactImportance.MEDIUM,
|
| 235 |
-
payload={
|
| 236 |
-
"engine": engine_name,
|
| 237 |
-
"ci_lower": round(lo, 4),
|
| 238 |
-
"ci_upper": round(hi, 4),
|
| 239 |
-
"ci_width": round(width, 4),
|
| 240 |
-
"ci_width_pct": round(width * 100, 2),
|
| 241 |
-
"mean_cer": round(float(ci.get("mean") or 0.0), 4),
|
| 242 |
-
"mean_cer_pct": round(float(ci.get("mean") or 0.0) * 100, 2),
|
| 243 |
-
"gap_to_runner_up_pct": round(gap * 100, 2),
|
| 244 |
-
# Niveau de confiance des bornes — propagé pour traçabilité
|
| 245 |
-
# anti-hallucination (le template ne hardcode plus "95 %").
|
| 246 |
-
"confidence_level": 95,
|
| 247 |
-
},
|
| 248 |
-
engines_involved=(engine_name,),
|
| 249 |
-
))
|
| 250 |
-
break # un seul avertissement suffit
|
| 251 |
-
return facts
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.narrative.detectors.quality`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2 — measurements/) a quitté ``picarones.core.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques.
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.detectors.quality import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.detectors.quality as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,279 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
- :func:`detect_statistical_tie` (Sprint 18)
|
| 7 |
-
- :func:`detect_significant_gap` (Sprint 4)
|
| 8 |
-
- :func:`detect_speed_winner` (Sprint 4)
|
| 9 |
-
- :func:`detect_median_mean_gap_warning` (Sprint 44)
|
| 10 |
-
|
| 11 |
-
Comportement et signature inchangés. Tous restent enregistrés
|
| 12 |
-
automatiquement via ``@register_detector`` à l'import.
|
| 13 |
"""
|
| 14 |
|
| 15 |
-
from
|
| 16 |
-
|
| 17 |
-
import statistics as _stats
|
| 18 |
-
from typing import Optional
|
| 19 |
-
|
| 20 |
-
from picarones.core.narrative.facts import Fact, FactImportance, FactType
|
| 21 |
-
from picarones.core.narrative.registry import register_detector
|
| 22 |
-
|
| 23 |
-
from picarones.core.narrative.detectors._helpers import (
|
| 24 |
-
_engine_by_name,
|
| 25 |
-
_engines_summary,
|
| 26 |
-
_n_docs,
|
| 27 |
-
)
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
@register_detector(
|
| 31 |
-
FactType.GLOBAL_LEADER_CER,
|
| 32 |
-
priority=10,
|
| 33 |
-
importance=FactImportance.CRITICAL,
|
| 34 |
-
)
|
| 35 |
-
def detect_global_leader_cer(benchmark_data: dict) -> list[Fact]:
|
| 36 |
-
"""Moteur avec le CER moyen le plus bas sur l'ensemble du corpus.
|
| 37 |
-
|
| 38 |
-
Émet un Fact CRITICAL si au moins 2 moteurs sont comparés, en attachant
|
| 39 |
-
aussi le 2ᵉ pour permettre à l'arbitre de fusionner avec ``significant_gap``.
|
| 40 |
-
"""
|
| 41 |
-
ranking = benchmark_data.get("ranking") or []
|
| 42 |
-
# Éliminer les entrées sans CER calculé
|
| 43 |
-
valid = [r for r in ranking if r.get("mean_cer") is not None]
|
| 44 |
-
if len(valid) < 1:
|
| 45 |
-
return []
|
| 46 |
-
|
| 47 |
-
leader = valid[0]
|
| 48 |
-
runner_up = valid[1] if len(valid) >= 2 else None
|
| 49 |
-
|
| 50 |
-
payload = {
|
| 51 |
-
"engine": leader["engine"],
|
| 52 |
-
"cer": float(leader["mean_cer"]),
|
| 53 |
-
"cer_pct": round(float(leader["mean_cer"]) * 100, 2),
|
| 54 |
-
"n_engines": len(valid),
|
| 55 |
-
"n_docs": _n_docs(benchmark_data),
|
| 56 |
-
}
|
| 57 |
-
if runner_up is not None:
|
| 58 |
-
payload["runner_up"] = runner_up["engine"]
|
| 59 |
-
payload["runner_up_cer"] = float(runner_up["mean_cer"])
|
| 60 |
-
payload["runner_up_cer_pct"] = round(float(runner_up["mean_cer"]) * 100, 2)
|
| 61 |
-
|
| 62 |
-
return [Fact(
|
| 63 |
-
type=FactType.GLOBAL_LEADER_CER,
|
| 64 |
-
importance=FactImportance.CRITICAL,
|
| 65 |
-
payload=payload,
|
| 66 |
-
engines_involved=(leader["engine"],),
|
| 67 |
-
)]
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
@register_detector(
|
| 71 |
-
FactType.STATISTICAL_TIE,
|
| 72 |
-
priority=20,
|
| 73 |
-
importance=FactImportance.CRITICAL,
|
| 74 |
-
)
|
| 75 |
-
def detect_statistical_tie(benchmark_data: dict) -> list[Fact]:
|
| 76 |
-
"""Groupes de moteurs statistiquement indiscernables (Nemenyi)."""
|
| 77 |
-
nemenyi = benchmark_data.get("statistics", {}).get("nemenyi", {})
|
| 78 |
-
if not nemenyi or nemenyi.get("error"):
|
| 79 |
-
return []
|
| 80 |
-
|
| 81 |
-
tied_groups = nemenyi.get("tied_groups", [])
|
| 82 |
-
mean_ranks = nemenyi.get("mean_ranks", {})
|
| 83 |
-
cd = nemenyi.get("critical_distance", 0.0)
|
| 84 |
-
alpha = nemenyi.get("alpha", 0.05)
|
| 85 |
-
n_blocks = nemenyi.get("n_blocks", 0)
|
| 86 |
-
|
| 87 |
-
facts: list[Fact] = []
|
| 88 |
-
for group in tied_groups:
|
| 89 |
-
if len(group) < 2:
|
| 90 |
-
continue
|
| 91 |
-
is_leader_tie = min(mean_ranks.get(n, 999) for n in group) == min(
|
| 92 |
-
mean_ranks.values(), default=0
|
| 93 |
-
)
|
| 94 |
-
importance = FactImportance.CRITICAL if is_leader_tie else FactImportance.HIGH
|
| 95 |
-
|
| 96 |
-
facts.append(Fact(
|
| 97 |
-
type=FactType.STATISTICAL_TIE,
|
| 98 |
-
importance=importance,
|
| 99 |
-
payload={
|
| 100 |
-
"engines": list(group),
|
| 101 |
-
"engines_list": ", ".join(group),
|
| 102 |
-
"mean_ranks": {n: mean_ranks.get(n) for n in group},
|
| 103 |
-
"critical_distance": round(cd, 3),
|
| 104 |
-
"alpha": alpha,
|
| 105 |
-
"n_blocks": n_blocks,
|
| 106 |
-
"includes_leader": is_leader_tie,
|
| 107 |
-
"n_tied": len(group),
|
| 108 |
-
},
|
| 109 |
-
engines_involved=tuple(group),
|
| 110 |
-
))
|
| 111 |
-
return facts
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
@register_detector(
|
| 115 |
-
FactType.SIGNIFICANT_GAP,
|
| 116 |
-
priority=30,
|
| 117 |
-
importance=FactImportance.HIGH,
|
| 118 |
-
)
|
| 119 |
-
def detect_significant_gap(benchmark_data: dict) -> list[Fact]:
|
| 120 |
-
"""Écart statistiquement significatif entre le 1ᵉʳ et le 2ᵉ du classement.
|
| 121 |
-
|
| 122 |
-
Lit la matrice de Wilcoxon pairwise et vérifie si la paire (leader,
|
| 123 |
-
runner-up) y apparaît avec ``significant = True``.
|
| 124 |
-
"""
|
| 125 |
-
ranking = benchmark_data.get("ranking") or []
|
| 126 |
-
valid = [r for r in ranking if r.get("mean_cer") is not None]
|
| 127 |
-
if len(valid) < 2:
|
| 128 |
-
return []
|
| 129 |
-
|
| 130 |
-
leader = valid[0]["engine"]
|
| 131 |
-
runner_up = valid[1]["engine"]
|
| 132 |
-
|
| 133 |
-
pairwise = benchmark_data.get("statistics", {}).get("pairwise_wilcoxon") or []
|
| 134 |
-
match = None
|
| 135 |
-
for p in pairwise:
|
| 136 |
-
names = {p.get("engine_a"), p.get("engine_b")}
|
| 137 |
-
if names == {leader, runner_up}:
|
| 138 |
-
match = p
|
| 139 |
-
break
|
| 140 |
-
if match is None:
|
| 141 |
-
return []
|
| 142 |
-
|
| 143 |
-
if not match.get("significant"):
|
| 144 |
-
return [] # pas d'écart significatif — rien à signaler ici
|
| 145 |
-
|
| 146 |
-
delta_cer = abs(float(valid[0]["mean_cer"]) - float(valid[1]["mean_cer"]))
|
| 147 |
-
return [Fact(
|
| 148 |
-
type=FactType.SIGNIFICANT_GAP,
|
| 149 |
-
importance=FactImportance.CRITICAL,
|
| 150 |
-
payload={
|
| 151 |
-
"leader": leader,
|
| 152 |
-
"runner_up": runner_up,
|
| 153 |
-
"p_value": float(match.get("p_value", 0.0)),
|
| 154 |
-
"delta_cer": round(delta_cer, 4),
|
| 155 |
-
"delta_cer_pct": round(delta_cer * 100, 2),
|
| 156 |
-
"n_pairs": int(match.get("n_pairs", 0)),
|
| 157 |
-
},
|
| 158 |
-
engines_involved=(leader, runner_up),
|
| 159 |
-
)]
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
@register_detector(
|
| 163 |
-
FactType.SPEED_WINNER,
|
| 164 |
-
priority=100,
|
| 165 |
-
importance=FactImportance.MEDIUM,
|
| 166 |
-
)
|
| 167 |
-
def detect_speed_winner(benchmark_data: dict) -> list[Fact]:
|
| 168 |
-
"""Moteur significativement plus rapide pour une qualité comparable.
|
| 169 |
-
|
| 170 |
-
Déclenché si un moteur est au moins 3× plus rapide que la médiane ET que
|
| 171 |
-
son CER n'est pas significativement pire (dans le même groupe Nemenyi que
|
| 172 |
-
le leader OU CER ≤ 1,1 × CER du leader).
|
| 173 |
-
"""
|
| 174 |
-
durations = _mean_duration_per_engine(benchmark_data)
|
| 175 |
-
if len(durations) < 2:
|
| 176 |
-
return []
|
| 177 |
-
|
| 178 |
-
values = list(durations.values())
|
| 179 |
-
median_dur = _stats.median(values)
|
| 180 |
-
if median_dur <= 0:
|
| 181 |
-
return []
|
| 182 |
-
|
| 183 |
-
ranking = benchmark_data.get("ranking") or []
|
| 184 |
-
valid = [r for r in ranking if r.get("mean_cer") is not None]
|
| 185 |
-
if not valid:
|
| 186 |
-
return []
|
| 187 |
-
leader_cer = float(valid[0]["mean_cer"])
|
| 188 |
-
quality_ceiling = max(0.01, leader_cer * 1.10)
|
| 189 |
-
|
| 190 |
-
tied_groups = benchmark_data.get("statistics", {}).get("nemenyi", {}).get("tied_groups") or []
|
| 191 |
-
leader_group: set[str] = set()
|
| 192 |
-
for g in tied_groups:
|
| 193 |
-
if valid[0]["engine"] in g:
|
| 194 |
-
leader_group = set(g)
|
| 195 |
-
break
|
| 196 |
-
|
| 197 |
-
facts: list[Fact] = []
|
| 198 |
-
candidates = sorted(durations.items(), key=lambda kv: kv[1])
|
| 199 |
-
for engine, dur in candidates:
|
| 200 |
-
if dur * 3.0 > median_dur:
|
| 201 |
-
break # les suivants sont encore plus lents
|
| 202 |
-
summary = _engine_by_name(benchmark_data, engine) or {}
|
| 203 |
-
engine_cer = summary.get("cer")
|
| 204 |
-
if engine_cer is None:
|
| 205 |
-
continue
|
| 206 |
-
acceptable_quality = (
|
| 207 |
-
engine in leader_group or float(engine_cer) <= quality_ceiling
|
| 208 |
-
)
|
| 209 |
-
if not acceptable_quality:
|
| 210 |
-
continue
|
| 211 |
-
facts.append(Fact(
|
| 212 |
-
type=FactType.SPEED_WINNER,
|
| 213 |
-
importance=FactImportance.MEDIUM,
|
| 214 |
-
payload={
|
| 215 |
-
"engine": engine,
|
| 216 |
-
"mean_duration": round(dur, 3),
|
| 217 |
-
"median_duration": round(median_dur, 3),
|
| 218 |
-
"speedup": round(median_dur / dur, 1) if dur > 0 else None,
|
| 219 |
-
"cer": round(float(engine_cer), 4),
|
| 220 |
-
"cer_pct": round(float(engine_cer) * 100, 2),
|
| 221 |
-
},
|
| 222 |
-
engines_involved=(engine,),
|
| 223 |
-
))
|
| 224 |
-
return facts[:1] # seulement le plus rapide — éviter le bruit
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
@register_detector(
|
| 228 |
-
FactType.MEDIAN_MEAN_GAP_WARNING,
|
| 229 |
-
priority=140,
|
| 230 |
-
importance=FactImportance.MEDIUM,
|
| 231 |
-
)
|
| 232 |
-
def detect_median_mean_gap_warning(benchmark_data: dict) -> list[Fact]:
|
| 233 |
-
"""Avertit quand le ratio ``|moyenne - médiane| / médiane`` du leader
|
| 234 |
-
dépasse 30 %, ce qui indique une distribution fortement asymétrique
|
| 235 |
-
où la moyenne masque les performances réelles.
|
| 236 |
-
|
| 237 |
-
Sprint 44 — A.I.2 du plan d'évolution. Cohérent avec le passage du
|
| 238 |
-
tri par défaut sur la médiane : si la moyenne du leader diverge
|
| 239 |
-
fortement de la médiane, l'utilisateur doit le savoir pour
|
| 240 |
-
interpréter correctement les chiffres.
|
| 241 |
-
"""
|
| 242 |
-
ranking = benchmark_data.get("ranking") or []
|
| 243 |
-
valid = [
|
| 244 |
-
r for r in ranking
|
| 245 |
-
if r.get("median_cer") is not None
|
| 246 |
-
and r.get("mean_cer") is not None
|
| 247 |
-
]
|
| 248 |
-
if not valid:
|
| 249 |
-
return []
|
| 250 |
-
|
| 251 |
-
leader = valid[0]
|
| 252 |
-
median_cer = float(leader["median_cer"])
|
| 253 |
-
mean_cer = float(leader["mean_cer"])
|
| 254 |
-
|
| 255 |
-
if median_cer <= 0:
|
| 256 |
-
# Médiane nulle (corpus très facile pour ce moteur) — l'écart
|
| 257 |
-
# relatif n'est pas calculable de manière utile, on s'abstient.
|
| 258 |
-
return []
|
| 259 |
-
|
| 260 |
-
relative_gap = abs(mean_cer - median_cer) / median_cer
|
| 261 |
-
if relative_gap < 0.30:
|
| 262 |
-
return []
|
| 263 |
-
|
| 264 |
-
importance = (
|
| 265 |
-
FactImportance.HIGH if relative_gap >= 1.0 else FactImportance.MEDIUM
|
| 266 |
-
)
|
| 267 |
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
"engine": leader["engine"],
|
| 273 |
-
"median_cer_pct": round(median_cer * 100, 2),
|
| 274 |
-
"mean_cer_pct": round(mean_cer * 100, 2),
|
| 275 |
-
"relative_gap_pct": round(relative_gap * 100, 1),
|
| 276 |
-
"n_docs": int(leader.get("documents") or 0),
|
| 277 |
-
},
|
| 278 |
-
engines_involved=(leader["engine"],),
|
| 279 |
-
)]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.narrative.detectors.ranking`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2 — measurements/) a quitté ``picarones.core.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.detectors.ranking import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.detectors.ranking as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,203 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
``
|
| 5 |
-
|
| 6 |
-
- :func:`detect_stratum_winner` (Sprint 4)
|
| 7 |
-
- :func:`detect_stratum_collapse` (Sprint 4)
|
| 8 |
-
- :func:`detect_stratification_recommended` (Sprint 45)
|
| 9 |
"""
|
| 10 |
|
| 11 |
-
from
|
| 12 |
-
|
| 13 |
-
import statistics as _stats
|
| 14 |
-
from typing import Optional
|
| 15 |
-
|
| 16 |
-
from picarones.core.narrative.facts import Fact, FactImportance, FactType
|
| 17 |
-
from picarones.core.narrative.registry import register_detector
|
| 18 |
-
|
| 19 |
-
from picarones.core.narrative.detectors._helpers import (
|
| 20 |
-
_engine_by_name,
|
| 21 |
-
_engines_summary,
|
| 22 |
-
_n_docs,
|
| 23 |
-
)
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
def _stratum_cer_by_engine(benchmark_data: dict) -> dict[str, dict[str, list[float]]]:
|
| 27 |
-
"""Agrège les CER par (moteur, strate).
|
| 28 |
-
|
| 29 |
-
Strate = ``document["script_type"]`` si présent. Retourne ``{}`` si aucun
|
| 30 |
-
document n'expose de strate (pas d'émission possible).
|
| 31 |
-
"""
|
| 32 |
-
out: dict[str, dict[str, list[float]]] = {}
|
| 33 |
-
for doc in benchmark_data.get("documents") or []:
|
| 34 |
-
stratum = doc.get("script_type")
|
| 35 |
-
if not stratum:
|
| 36 |
-
continue
|
| 37 |
-
for er in doc.get("engine_results") or []:
|
| 38 |
-
if er.get("error"):
|
| 39 |
-
continue
|
| 40 |
-
cer = er.get("cer")
|
| 41 |
-
if cer is None:
|
| 42 |
-
continue
|
| 43 |
-
name = er.get("engine")
|
| 44 |
-
out.setdefault(name, {}).setdefault(stratum, []).append(float(cer))
|
| 45 |
-
return out
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
@register_detector(
|
| 49 |
-
FactType.STRATUM_WINNER,
|
| 50 |
-
priority=40,
|
| 51 |
-
importance=FactImportance.MEDIUM,
|
| 52 |
-
)
|
| 53 |
-
def detect_stratum_winner(benchmark_data: dict) -> list[Fact]:
|
| 54 |
-
"""Moteur qui domine nettement sur une strate (≥ 3 documents, CER
|
| 55 |
-
au moins 25 % plus bas que le second sur cette strate).
|
| 56 |
-
"""
|
| 57 |
-
agg = _stratum_cer_by_engine(benchmark_data)
|
| 58 |
-
if not agg:
|
| 59 |
-
return []
|
| 60 |
-
|
| 61 |
-
# Inverser : {stratum: {engine: mean_cer}}
|
| 62 |
-
by_stratum: dict[str, dict[str, float]] = {}
|
| 63 |
-
for engine, strata in agg.items():
|
| 64 |
-
for stratum, vals in strata.items():
|
| 65 |
-
if len(vals) < 3:
|
| 66 |
-
continue
|
| 67 |
-
by_stratum.setdefault(stratum, {})[engine] = sum(vals) / len(vals)
|
| 68 |
-
|
| 69 |
-
facts: list[Fact] = []
|
| 70 |
-
for stratum, engine_cer in by_stratum.items():
|
| 71 |
-
if len(engine_cer) < 2:
|
| 72 |
-
continue
|
| 73 |
-
ordered = sorted(engine_cer.items(), key=lambda kv: kv[1])
|
| 74 |
-
best_name, best_cer = ordered[0]
|
| 75 |
-
second_cer = ordered[1][1]
|
| 76 |
-
if second_cer == 0:
|
| 77 |
-
continue
|
| 78 |
-
if best_cer < second_cer * 0.75: # dominance ≥ 25 %
|
| 79 |
-
facts.append(Fact(
|
| 80 |
-
type=FactType.STRATUM_WINNER,
|
| 81 |
-
importance=FactImportance.HIGH,
|
| 82 |
-
payload={
|
| 83 |
-
"engine": best_name,
|
| 84 |
-
"stratum": stratum,
|
| 85 |
-
"cer": round(best_cer, 4),
|
| 86 |
-
"cer_pct": round(best_cer * 100, 2),
|
| 87 |
-
"second_engine": ordered[1][0],
|
| 88 |
-
"second_cer": round(second_cer, 4),
|
| 89 |
-
"second_cer_pct": round(second_cer * 100, 2),
|
| 90 |
-
"n_docs_stratum": len(agg[best_name][stratum]),
|
| 91 |
-
},
|
| 92 |
-
engines_involved=(best_name,),
|
| 93 |
-
stratum=stratum,
|
| 94 |
-
))
|
| 95 |
-
return facts
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
@register_detector(
|
| 99 |
-
FactType.STRATUM_COLLAPSE,
|
| 100 |
-
priority=50,
|
| 101 |
-
importance=FactImportance.HIGH,
|
| 102 |
-
)
|
| 103 |
-
def detect_stratum_collapse(benchmark_data: dict) -> list[Fact]:
|
| 104 |
-
"""Moteur globalement compétitif qui s'effondre sur une strate.
|
| 105 |
-
|
| 106 |
-
Déclenché si, pour un moteur, le CER moyen sur une strate ≥ 3 documents
|
| 107 |
-
est plus du double du CER global du même moteur.
|
| 108 |
-
"""
|
| 109 |
-
agg = _stratum_cer_by_engine(benchmark_data)
|
| 110 |
-
if not agg:
|
| 111 |
-
return []
|
| 112 |
-
|
| 113 |
-
facts: list[Fact] = []
|
| 114 |
-
for engine_name, strata in agg.items():
|
| 115 |
-
summary = _engine_by_name(benchmark_data, engine_name) or {}
|
| 116 |
-
global_cer = summary.get("cer")
|
| 117 |
-
if global_cer is None:
|
| 118 |
-
continue
|
| 119 |
-
global_cer = float(global_cer)
|
| 120 |
-
if global_cer <= 0:
|
| 121 |
-
continue
|
| 122 |
-
for stratum, vals in strata.items():
|
| 123 |
-
if len(vals) < 3:
|
| 124 |
-
continue
|
| 125 |
-
local_cer = sum(vals) / len(vals)
|
| 126 |
-
if local_cer > 2.0 * global_cer and (local_cer - global_cer) > 0.05:
|
| 127 |
-
facts.append(Fact(
|
| 128 |
-
type=FactType.STRATUM_COLLAPSE,
|
| 129 |
-
importance=FactImportance.HIGH,
|
| 130 |
-
payload={
|
| 131 |
-
"engine": engine_name,
|
| 132 |
-
"stratum": stratum,
|
| 133 |
-
"local_cer": round(local_cer, 4),
|
| 134 |
-
"local_cer_pct": round(local_cer * 100, 2),
|
| 135 |
-
"global_cer": round(global_cer, 4),
|
| 136 |
-
"global_cer_pct": round(global_cer * 100, 2),
|
| 137 |
-
"delta_cer_pct": round((local_cer - global_cer) * 100, 2),
|
| 138 |
-
"n_docs_stratum": len(vals),
|
| 139 |
-
},
|
| 140 |
-
engines_involved=(engine_name,),
|
| 141 |
-
stratum=stratum,
|
| 142 |
-
))
|
| 143 |
-
return facts
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
@register_detector(
|
| 147 |
-
FactType.STRATIFICATION_RECOMMENDED,
|
| 148 |
-
priority=45, # juste après STRATUM_WINNER (40), avant STRATUM_COLLAPSE (50)
|
| 149 |
-
importance=FactImportance.HIGH,
|
| 150 |
-
)
|
| 151 |
-
def detect_stratification_recommended(benchmark_data: dict) -> list[Fact]:
|
| 152 |
-
"""Avertit quand le corpus est hétérogène et que la vue stratifiée
|
| 153 |
-
apporte un éclairage qualitativement différent du classement global.
|
| 154 |
-
|
| 155 |
-
Critère : ``corpus_homogeneity.max_inter_strata_gap > 5 points`` de
|
| 156 |
-
CER médian sur le moteur leader. Au-delà de 10 points, importance
|
| 157 |
-
``HIGH`` (situation très hétérogène où le seul classement global
|
| 158 |
-
serait trompeur).
|
| 159 |
-
|
| 160 |
-
Lit ``benchmark_data["corpus_homogeneity"]`` exposé par
|
| 161 |
-
``BenchmarkResult.as_dict()`` (Sprint 45).
|
| 162 |
-
"""
|
| 163 |
-
homog = benchmark_data.get("corpus_homogeneity")
|
| 164 |
-
if not homog:
|
| 165 |
-
return []
|
| 166 |
-
|
| 167 |
-
gap = homog.get("max_inter_strata_gap")
|
| 168 |
-
if gap is None:
|
| 169 |
-
return []
|
| 170 |
-
|
| 171 |
-
gap = float(gap)
|
| 172 |
-
if gap < 0.05:
|
| 173 |
-
return [] # 5 points de CER : seuil de pertinence éditoriale
|
| 174 |
-
|
| 175 |
-
leader = str(homog.get("leader") or "")
|
| 176 |
-
n_strata = int(homog.get("n_strata") or 0)
|
| 177 |
-
pair = homog.get("leader_max_gap_strata") or ["", ""]
|
| 178 |
-
if len(pair) < 2:
|
| 179 |
-
return []
|
| 180 |
-
min_strat, max_strat = str(pair[0]), str(pair[1])
|
| 181 |
-
|
| 182 |
-
leader_per_stratum = homog.get("leader_per_stratum_median") or {}
|
| 183 |
-
min_med = float(leader_per_stratum.get(min_strat, 0.0))
|
| 184 |
-
max_med = float(leader_per_stratum.get(max_strat, 0.0))
|
| 185 |
-
|
| 186 |
-
importance = (
|
| 187 |
-
FactImportance.HIGH if gap >= 0.10 else FactImportance.MEDIUM
|
| 188 |
-
)
|
| 189 |
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
"leader": leader,
|
| 195 |
-
"n_strata": n_strata,
|
| 196 |
-
"gap_pct": round(gap * 100, 1),
|
| 197 |
-
"min_stratum": min_strat,
|
| 198 |
-
"max_stratum": max_strat,
|
| 199 |
-
"min_stratum_cer_pct": round(min_med * 100, 2),
|
| 200 |
-
"max_stratum_cer_pct": round(max_med * 100, 2),
|
| 201 |
-
},
|
| 202 |
-
engines_involved=(leader,) if leader else (),
|
| 203 |
-
)]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.narrative.detectors.stratum`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2 — measurements/) a quitté ``picarones.core.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques.
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.detectors.stratum import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.detectors.stratum as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,212 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
Règle d'or (à vérifier par tests) : chaque valeur numérique ou nom d'entité
|
| 8 |
-
présent dans ``payload`` doit provenir directement du JSON d'entrée, jamais
|
| 9 |
-
d'une génération. C'est ce qui rend la synthèse reproductible bit-à-bit et
|
| 10 |
-
immune à l'hallucination par construction.
|
| 11 |
"""
|
| 12 |
|
| 13 |
-
from
|
| 14 |
-
|
| 15 |
-
from dataclasses import dataclass, field
|
| 16 |
-
from enum import Enum
|
| 17 |
-
from typing import Callable, Optional
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
class FactType(str, Enum):
|
| 21 |
-
"""Types de faits détectables.
|
| 22 |
-
|
| 23 |
-
L'ajout d'un nouveau type se fait ici + un détecteur dans ``detectors.py``
|
| 24 |
-
+ un template dans ``narrative/templates_{lang}.yaml`` (Sprint 4).
|
| 25 |
-
"""
|
| 26 |
-
|
| 27 |
-
GLOBAL_LEADER_CER = "global_leader_cer"
|
| 28 |
-
"""Moteur avec le CER médian le plus bas sur l'ensemble du corpus."""
|
| 29 |
-
|
| 30 |
-
STATISTICAL_TIE = "statistical_tie"
|
| 31 |
-
"""Top-N moteurs statistiquement indiscernables (Nemenyi, Sprint 3)."""
|
| 32 |
-
|
| 33 |
-
SIGNIFICANT_GAP = "significant_gap"
|
| 34 |
-
"""Écart statistiquement significatif entre le 1ᵉʳ et le 2ᵉ du classement."""
|
| 35 |
-
|
| 36 |
-
PARETO_ALTERNATIVE = "pareto_alternative"
|
| 37 |
-
"""Moteur sur la frontière Pareto différent du leader CER pur (Sprint 5)."""
|
| 38 |
-
|
| 39 |
-
STRATUM_WINNER = "stratum_winner"
|
| 40 |
-
"""Moteur qui domine sur une strate spécifique (siècle, langue, type)."""
|
| 41 |
-
|
| 42 |
-
STRATUM_COLLAPSE = "stratum_collapse"
|
| 43 |
-
"""Moteur globalement bon qui s'effondre sur une strate spécifique."""
|
| 44 |
-
|
| 45 |
-
ERROR_PROFILE_OUTLIER = "error_profile_outlier"
|
| 46 |
-
"""Moteur avec un profil taxonomique atypique (ex : 3× plus d'erreurs d'abréviation)."""
|
| 47 |
-
|
| 48 |
-
LLM_HALLUCINATION_FLAG = "llm_hallucination_flag"
|
| 49 |
-
"""LLM avec un taux d'hallucination notablement supérieur aux autres."""
|
| 50 |
-
|
| 51 |
-
ROBUSTNESS_FRAGILE = "robustness_fragile"
|
| 52 |
-
"""Moteur qui dégrade fortement au-dessus d'un seuil de bruit/flou."""
|
| 53 |
-
|
| 54 |
-
COST_OUTLIER = "cost_outlier"
|
| 55 |
-
"""Moteur au ratio coût/qualité très défavorable (Sprint 5)."""
|
| 56 |
-
|
| 57 |
-
SPEED_WINNER = "speed_winner"
|
| 58 |
-
"""Moteur significativement plus rapide pour une qualité comparable."""
|
| 59 |
-
|
| 60 |
-
CONFIDENCE_WARNING = "confidence_warning"
|
| 61 |
-
"""Intervalle de confiance très large : classement peu fiable."""
|
| 62 |
-
|
| 63 |
-
ENSEMBLE_OPPORTUNITY = "ensemble_opportunity"
|
| 64 |
-
"""Deux moteurs sont fortement complémentaires : un voting majoritaire
|
| 65 |
-
pourrait améliorer significativement le CER (Sprint 36)."""
|
| 66 |
-
|
| 67 |
-
MEDIAN_MEAN_GAP_WARNING = "median_mean_gap_warning"
|
| 68 |
-
"""Distribution des CER fortement asymétrique sur le corpus —
|
| 69 |
-
la moyenne du leader est tirée par quelques documents catastrophiques
|
| 70 |
-
et masque les performances réelles. La médiane (utilisée pour le tri
|
| 71 |
-
par défaut depuis Sprint 44) est plus représentative."""
|
| 72 |
-
|
| 73 |
-
STRATIFICATION_RECOMMENDED = "stratification_recommended"
|
| 74 |
-
"""Le corpus est hétérogène du point de vue script_type : le moteur
|
| 75 |
-
leader varie fortement selon la strate. Le lecteur doit consulter
|
| 76 |
-
la vue stratifiée plutôt que de se fier au seul classement global
|
| 77 |
-
(Sprint 46)."""
|
| 78 |
-
|
| 79 |
-
ENGINE_OFF_BASELINE = "engine_off_baseline"
|
| 80 |
-
"""Le CER courant d'un moteur s'écarte significativement de sa
|
| 81 |
-
moyenne historique sur le même corpus (lue depuis l'historique
|
| 82 |
-
SQLite, Sprint 8). Lit ``BenchmarkHistory`` via le module
|
| 83 |
-
``baseline_comparison`` (Sprint 73). Garde-fous : ≥ 5 runs
|
| 84 |
-
historiques même corpus + |delta_relatif| > 20 %."""
|
| 85 |
-
|
| 86 |
-
ENGINE_UNSTABLE = "engine_unstable"
|
| 87 |
-
"""Un moteur LLM/VLM exécuté plusieurs fois sur les mêmes
|
| 88 |
-
documents produit des sorties différentes au-delà d'un seuil
|
| 89 |
-
de variance (Sprint 90). Lit ``compute_multirun_stability``
|
| 90 |
-
(Sprint 83). Garde-fous : ≥ 2 runs et seuil sur le coefficient
|
| 91 |
-
de variation du CER (>10 % par défaut) ou sur le rappel de
|
| 92 |
-
runs identiques (<50 %)."""
|
| 93 |
-
|
| 94 |
-
REGRESSION_IN_HISTORY = "regression_in_history"
|
| 95 |
-
"""Un moteur montre une tendance ou une rupture défavorable
|
| 96 |
-
sur l'historique SQLite : son CER moyen s'est dégradé sur
|
| 97 |
-
les N derniers runs (Sprint 92). Lit
|
| 98 |
-
``compute_corpus_longitudinal`` du module ``longitudinal``.
|
| 99 |
-
Garde-fous : ≥ 3 runs historiques et soit pente > seuil
|
| 100 |
-
(régression progressive), soit change-point avec delta >
|
| 101 |
-
seuil (rupture brutale)."""
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
class FactImportance(int, Enum):
|
| 105 |
-
"""Score d'importance d'un fait — décide l'ordre et la sélection."""
|
| 106 |
-
|
| 107 |
-
CRITICAL = 100
|
| 108 |
-
"""À remonter systématiquement en synthèse (ex : leader + écart significatif)."""
|
| 109 |
-
|
| 110 |
-
HIGH = 70
|
| 111 |
-
"""À remonter sauf si déjà redondant avec un fait critique."""
|
| 112 |
-
|
| 113 |
-
MEDIUM = 40
|
| 114 |
-
"""À remonter si la synthèse a encore de la place."""
|
| 115 |
-
|
| 116 |
-
LOW = 10
|
| 117 |
-
"""Informatif, remonté uniquement en vue détaillée."""
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
@dataclass
|
| 121 |
-
class Fact:
|
| 122 |
-
"""Observation structurée extraite d'un benchmark.
|
| 123 |
-
|
| 124 |
-
Attributes
|
| 125 |
-
----------
|
| 126 |
-
type:
|
| 127 |
-
Type de fait (voir ``FactType``).
|
| 128 |
-
importance:
|
| 129 |
-
Priorité de sélection (voir ``FactImportance``).
|
| 130 |
-
payload:
|
| 131 |
-
Dict de données brutes sérialisables. **Toutes les valeurs doivent
|
| 132 |
-
provenir du JSON d'entrée** — c'est le garde-fou anti-hallucination.
|
| 133 |
-
engines_involved:
|
| 134 |
-
Noms des moteurs concernés. Utilisé par l'arbitre pour détecter
|
| 135 |
-
les redondances (deux faits sur le même moteur = fusion ou sélection).
|
| 136 |
-
stratum:
|
| 137 |
-
Strate concernée (ex : "XVIIe siècle", "latin médiéval") ou None.
|
| 138 |
-
"""
|
| 139 |
-
|
| 140 |
-
type: FactType
|
| 141 |
-
importance: FactImportance
|
| 142 |
-
payload: dict
|
| 143 |
-
engines_involved: tuple[str, ...] = ()
|
| 144 |
-
stratum: Optional[str] = None
|
| 145 |
-
|
| 146 |
-
def as_dict(self) -> dict:
|
| 147 |
-
return {
|
| 148 |
-
"type": self.type.value,
|
| 149 |
-
"importance": int(self.importance),
|
| 150 |
-
"payload": self.payload,
|
| 151 |
-
"engines_involved": list(self.engines_involved),
|
| 152 |
-
"stratum": self.stratum,
|
| 153 |
-
}
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
# ---------------------------------------------------------------------------
|
| 157 |
-
# Registre de détecteurs
|
| 158 |
-
# ---------------------------------------------------------------------------
|
| 159 |
-
|
| 160 |
-
# Signature d'un détecteur : prend le dict JSON du benchmark, retourne une liste
|
| 161 |
-
# de Fact (potentiellement vide). Doit être pure et déterministe.
|
| 162 |
-
DetectorFn = Callable[[dict], list[Fact]]
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
@dataclass
|
| 166 |
-
class DetectorRegistry:
|
| 167 |
-
"""Registre central des détecteurs de faits.
|
| 168 |
-
|
| 169 |
-
Un détecteur est enregistré via ``register(fact_type, fn)``. ``detect_all``
|
| 170 |
-
appelle tous les détecteurs enregistrés et renvoie la liste consolidée.
|
| 171 |
-
"""
|
| 172 |
-
|
| 173 |
-
_detectors: dict[FactType, DetectorFn] = field(default_factory=dict)
|
| 174 |
-
|
| 175 |
-
def register(self, fact_type: FactType, fn: DetectorFn) -> None:
|
| 176 |
-
self._detectors[fact_type] = fn
|
| 177 |
-
|
| 178 |
-
def unregister(self, fact_type: FactType) -> None:
|
| 179 |
-
self._detectors.pop(fact_type, None)
|
| 180 |
-
|
| 181 |
-
def registered_types(self) -> tuple[FactType, ...]:
|
| 182 |
-
return tuple(self._detectors.keys())
|
| 183 |
-
|
| 184 |
-
def run(self, benchmark_data: dict) -> list[Fact]:
|
| 185 |
-
facts: list[Fact] = []
|
| 186 |
-
for fact_type, fn in self._detectors.items():
|
| 187 |
-
try:
|
| 188 |
-
result = fn(benchmark_data)
|
| 189 |
-
except Exception as e:
|
| 190 |
-
import logging
|
| 191 |
-
logging.getLogger(__name__).warning(
|
| 192 |
-
"[narrative.detector.%s] fonctionnalité dégradée : %s",
|
| 193 |
-
fact_type.value, e,
|
| 194 |
-
)
|
| 195 |
-
continue
|
| 196 |
-
if result:
|
| 197 |
-
facts.extend(result)
|
| 198 |
-
return facts
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
def detect_all(benchmark_data: dict, registry: Optional[DetectorRegistry] = None) -> list[Fact]:
|
| 202 |
-
"""Applique tous les détecteurs enregistrés au benchmark donné.
|
| 203 |
-
|
| 204 |
-
Point d'entrée du Sprint 4. Pour Sprint 1, le registre par défaut est vide :
|
| 205 |
-
les détecteurs concrets sont ajoutés sprint par sprint.
|
| 206 |
-
"""
|
| 207 |
-
if registry is None:
|
| 208 |
-
registry = _DEFAULT_REGISTRY
|
| 209 |
-
return registry.run(benchmark_data)
|
| 210 |
-
|
| 211 |
|
| 212 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.narrative.facts`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2 — measurements/) a quitté ``picarones.core.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.facts import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.facts as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
@@ -1,217 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
1. ``facts.py`` — ajouter une valeur à ``FactType`` ;
|
| 7 |
-
2. ``detectors.py`` — écrire ``def detect_xxx(data) -> list[Fact]`` ;
|
| 8 |
-
3. ``detectors.py`` — l'inscrire dans le dict ``DETECTORS_BY_TYPE`` ;
|
| 9 |
-
4. ``arbiter.py`` — ajouter le type à la séquence ``DEFAULT_TYPE_ORDER``
|
| 10 |
-
au bon endroit pour la priorité éditoriale.
|
| 11 |
-
|
| 12 |
-
Sprint 29 ramène le nombre de modifications à **deux** :
|
| 13 |
-
|
| 14 |
-
1. ``facts.py`` — toujours nécessaire pour le type énuméré ;
|
| 15 |
-
2. ``detectors.py`` — décorer la fonction avec ``@register_detector(...)``.
|
| 16 |
-
|
| 17 |
-
Le décorateur :
|
| 18 |
-
- enregistre la fonction dans un registre global trié par ``priority`` ;
|
| 19 |
-
- vérifie qu'aucun détecteur ne se réenregistre sur le même ``FactType`` ;
|
| 20 |
-
- laisse la fonction utilisable telle quelle (rétrocompatibilité) ;
|
| 21 |
-
- alimente automatiquement ``arbiter.DEFAULT_TYPE_ORDER``.
|
| 22 |
-
|
| 23 |
-
Conventions de priorité (« politique éditoriale » du rapport)
|
| 24 |
-
-------------------------------------------------------------
|
| 25 |
-
Plus la valeur est petite, plus le fait remonte tôt en synthèse à
|
| 26 |
-
importance égale. Pour conserver l'ordre historique du Sprint 23, on
|
| 27 |
-
utilise un pas de 10 pour laisser de la place à des insertions futures :
|
| 28 |
-
|
| 29 |
-
10 GLOBAL_LEADER_CER qui gagne globalement
|
| 30 |
-
20 STATISTICAL_TIE y a-t-il un ex-aequo
|
| 31 |
-
30 SIGNIFICANT_GAP à quel point l'écart est solide
|
| 32 |
-
40 STRATUM_WINNER qui domine sur quel sous-corpus
|
| 33 |
-
50 STRATUM_COLLAPSE qui s'effondre sur quoi
|
| 34 |
-
60 ERROR_PROFILE_OUTLIER qui se trompe différemment
|
| 35 |
-
70 LLM_HALLUCINATION_FLAG hallucinations VLM
|
| 36 |
-
80 ROBUSTNESS_FRAGILE sensibilité aux dégradations
|
| 37 |
-
90 PARETO_ALTERNATIVE compromis coût/qualité
|
| 38 |
-
100 SPEED_WINNER vitesse
|
| 39 |
-
110 COST_OUTLIER coût aberrant
|
| 40 |
-
120 CONFIDENCE_WARNING mise en garde sur la fiabilité
|
| 41 |
-
|
| 42 |
-
Le décorateur n'impose **pas** de pas — un détecteur tiers peut très
|
| 43 |
-
bien utiliser ``priority=42`` pour s'insérer entre STRATUM_WINNER et
|
| 44 |
-
STRATUM_COLLAPSE par exemple.
|
| 45 |
"""
|
| 46 |
|
| 47 |
-
from
|
| 48 |
-
|
| 49 |
-
import logging
|
| 50 |
-
import threading
|
| 51 |
-
from dataclasses import dataclass
|
| 52 |
-
from typing import Callable, Optional
|
| 53 |
-
|
| 54 |
-
from picarones.core.narrative.facts import (
|
| 55 |
-
DetectorFn,
|
| 56 |
-
DetectorRegistry,
|
| 57 |
-
FactImportance,
|
| 58 |
-
FactType,
|
| 59 |
-
)
|
| 60 |
-
|
| 61 |
-
logger = logging.getLogger(__name__)
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
# ---------------------------------------------------------------------------
|
| 65 |
-
# Métadonnées d'un détecteur
|
| 66 |
-
# ---------------------------------------------------------------------------
|
| 67 |
-
|
| 68 |
-
@dataclass(frozen=True)
|
| 69 |
-
class DetectorEntry:
|
| 70 |
-
"""Métadonnées d'un détecteur enregistré."""
|
| 71 |
-
fact_type: FactType
|
| 72 |
-
fn: DetectorFn
|
| 73 |
-
priority: int
|
| 74 |
-
importance: FactImportance
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
# ---------------------------------------------------------------------------
|
| 78 |
-
# Registre global
|
| 79 |
-
# ---------------------------------------------------------------------------
|
| 80 |
-
|
| 81 |
-
_REGISTRY: dict[FactType, DetectorEntry] = {}
|
| 82 |
-
_REGISTRY_LOCK = threading.Lock()
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
def register_detector(
|
| 86 |
-
fact_type: FactType,
|
| 87 |
-
*,
|
| 88 |
-
priority: int,
|
| 89 |
-
importance: FactImportance = FactImportance.MEDIUM,
|
| 90 |
-
) -> Callable[[DetectorFn], DetectorFn]:
|
| 91 |
-
"""Décorateur d'enregistrement.
|
| 92 |
-
|
| 93 |
-
Usage::
|
| 94 |
-
|
| 95 |
-
@register_detector(FactType.GLOBAL_LEADER_CER, priority=10,
|
| 96 |
-
importance=FactImportance.CRITICAL)
|
| 97 |
-
def detect_global_leader_cer(data: dict) -> list[Fact]:
|
| 98 |
-
...
|
| 99 |
-
|
| 100 |
-
Le décorateur :
|
| 101 |
-
- vérifie qu'aucun autre détecteur n'est déjà enregistré sur
|
| 102 |
-
``fact_type`` (sinon ``ValueError``) ;
|
| 103 |
-
- vérifie que ``priority`` est un entier ;
|
| 104 |
-
- retourne la fonction inchangée pour ne pas casser les imports
|
| 105 |
-
existants.
|
| 106 |
-
|
| 107 |
-
L'``importance`` mémorisée ici sert de **métadonnée** au registre :
|
| 108 |
-
chaque détecteur reste libre d'émettre des ``Fact`` avec une
|
| 109 |
-
importance différente selon le contexte (ex. CRITICAL si l'écart
|
| 110 |
-
est gigantesque, HIGH sinon).
|
| 111 |
-
"""
|
| 112 |
-
def _decorator(fn: DetectorFn) -> DetectorFn:
|
| 113 |
-
with _REGISTRY_LOCK:
|
| 114 |
-
if fact_type in _REGISTRY:
|
| 115 |
-
raise ValueError(
|
| 116 |
-
f"Détecteur déjà enregistré pour {fact_type.value!r} : "
|
| 117 |
-
f"{_REGISTRY[fact_type].fn.__name__}. Désenregistrer "
|
| 118 |
-
"explicitement avant de réassigner."
|
| 119 |
-
)
|
| 120 |
-
entry = DetectorEntry(
|
| 121 |
-
fact_type=fact_type,
|
| 122 |
-
fn=fn,
|
| 123 |
-
priority=int(priority),
|
| 124 |
-
importance=importance,
|
| 125 |
-
)
|
| 126 |
-
_REGISTRY[fact_type] = entry
|
| 127 |
-
logger.debug(
|
| 128 |
-
"[narrative.registry] enregistré %s priority=%s importance=%s",
|
| 129 |
-
fact_type.value, priority, importance.name,
|
| 130 |
-
)
|
| 131 |
-
return fn
|
| 132 |
-
|
| 133 |
-
return _decorator
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
def unregister(fact_type: FactType) -> None:
|
| 137 |
-
"""Retire un détecteur du registre — utilisé par les tests."""
|
| 138 |
-
with _REGISTRY_LOCK:
|
| 139 |
-
_REGISTRY.pop(fact_type, None)
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
def iter_detectors() -> list[DetectorEntry]:
|
| 143 |
-
"""Retourne tous les détecteurs enregistrés, triés par ``priority``.
|
| 144 |
-
|
| 145 |
-
Le tri est stable : à ``priority`` égale, l'ordre d'enregistrement
|
| 146 |
-
est préservé (utile en présence d'extensions tierces).
|
| 147 |
-
"""
|
| 148 |
-
with _REGISTRY_LOCK:
|
| 149 |
-
entries = list(_REGISTRY.values())
|
| 150 |
-
entries.sort(key=lambda e: e.priority)
|
| 151 |
-
return entries
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
def detector_for(fact_type: FactType) -> Optional[DetectorEntry]:
|
| 155 |
-
with _REGISTRY_LOCK:
|
| 156 |
-
return _REGISTRY.get(fact_type)
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
def clear_registry() -> None:
|
| 160 |
-
"""Vide le registre — réservé aux tests d'isolation."""
|
| 161 |
-
with _REGISTRY_LOCK:
|
| 162 |
-
_REGISTRY.clear()
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
def default_type_order() -> tuple[FactType, ...]:
|
| 166 |
-
"""Calcule l'ordre canonique des types depuis le registre courant.
|
| 167 |
-
|
| 168 |
-
Source de vérité de ``arbiter.DEFAULT_TYPE_ORDER`` depuis le Sprint 29.
|
| 169 |
-
"""
|
| 170 |
-
return tuple(e.fact_type for e in iter_detectors())
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
# ---------------------------------------------------------------------------
|
| 174 |
-
# Pont avec ``DetectorRegistry`` historique
|
| 175 |
-
# ---------------------------------------------------------------------------
|
| 176 |
-
|
| 177 |
-
def populate_legacy_registry(registry: DetectorRegistry) -> None:
|
| 178 |
-
"""Synchronise le ``DetectorRegistry`` historique depuis le décorateur.
|
| 179 |
-
|
| 180 |
-
L'objet ``DetectorRegistry`` reste l'API publique pour les
|
| 181 |
-
consommateurs externes (cf. ``DetectorRegistry.run``) ; cette
|
| 182 |
-
fonction l'alimente depuis le registre déclaratif courant.
|
| 183 |
-
"""
|
| 184 |
-
for entry in iter_detectors():
|
| 185 |
-
registry.register(entry.fact_type, entry.fn)
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
__all__ = [
|
| 189 |
-
"DetectorEntry",
|
| 190 |
-
"register_detector",
|
| 191 |
-
"unregister",
|
| 192 |
-
"iter_detectors",
|
| 193 |
-
"detector_for",
|
| 194 |
-
"clear_registry",
|
| 195 |
-
"default_type_order",
|
| 196 |
-
"populate_legacy_registry",
|
| 197 |
-
]
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
# ---------------------------------------------------------------------------
|
| 201 |
-
# Sentinel — sans usage direct ; vérifie au build qu'on n'introduit pas
|
| 202 |
-
# de valeur ``priority`` dupliquée par accident parmi les builtins.
|
| 203 |
-
# ---------------------------------------------------------------------------
|
| 204 |
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
for
|
| 208 |
-
|
| 209 |
-
logger.warning(
|
| 210 |
-
"[narrative.registry] priority %s dupliquée : "
|
| 211 |
-
"%s et %s — ordre indéterministe à priorité égale.",
|
| 212 |
-
entry.priority,
|
| 213 |
-
seen[entry.priority].value,
|
| 214 |
-
entry.fact_type.value,
|
| 215 |
-
)
|
| 216 |
-
else:
|
| 217 |
-
seen[entry.priority] = entry.fact_type
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.narrative.registry`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2 — measurements/) a quitté ``picarones.core.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.registry import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.registry as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,105 +1,13 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
des valeurs venant strictement du JSON d'entrée.
|
| 7 |
"""
|
| 8 |
|
| 9 |
-
from
|
| 10 |
|
| 11 |
-
import
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
import yaml
|
| 17 |
-
|
| 18 |
-
from picarones.core.narrative.facts import Fact
|
| 19 |
-
|
| 20 |
-
logger = logging.getLogger(__name__)
|
| 21 |
-
|
| 22 |
-
_TEMPLATES_DIR = Path(__file__).parent / "templates"
|
| 23 |
-
_TEMPLATES_CACHE: dict[str, dict[str, str]] = {}
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
def _load_templates(lang: str) -> dict[str, str]:
|
| 27 |
-
"""Charge et met en cache les templates de la langue demandée.
|
| 28 |
-
|
| 29 |
-
Fallback : si la langue n'existe pas, retourne les templates FR. Si FR
|
| 30 |
-
est également absent (incident d'installation), retourne un dict vide.
|
| 31 |
-
"""
|
| 32 |
-
if lang in _TEMPLATES_CACHE:
|
| 33 |
-
return _TEMPLATES_CACHE[lang]
|
| 34 |
-
|
| 35 |
-
path = _TEMPLATES_DIR / f"{lang}.yaml"
|
| 36 |
-
if not path.exists():
|
| 37 |
-
if lang != "fr":
|
| 38 |
-
return _load_templates("fr")
|
| 39 |
-
_TEMPLATES_CACHE[lang] = {}
|
| 40 |
-
return _TEMPLATES_CACHE[lang]
|
| 41 |
-
|
| 42 |
-
try:
|
| 43 |
-
with path.open(encoding="utf-8") as fh:
|
| 44 |
-
data = yaml.safe_load(fh) or {}
|
| 45 |
-
if not isinstance(data, dict):
|
| 46 |
-
logger.warning("[narrative] %s n'est pas un dict YAML — ignoré", path)
|
| 47 |
-
_TEMPLATES_CACHE[lang] = {}
|
| 48 |
-
else:
|
| 49 |
-
_TEMPLATES_CACHE[lang] = {str(k): str(v).strip() for k, v in data.items()}
|
| 50 |
-
except yaml.YAMLError as e:
|
| 51 |
-
logger.warning("[narrative] échec parsing %s : %s", path, e)
|
| 52 |
-
_TEMPLATES_CACHE[lang] = {}
|
| 53 |
-
|
| 54 |
-
return _TEMPLATES_CACHE[lang]
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
class _SafeFormatMap(dict):
|
| 58 |
-
"""Dict qui retourne ``'?'`` pour les clés manquantes dans un template.
|
| 59 |
-
|
| 60 |
-
Évite qu'un détecteur mal documenté fasse crasher le rendu. En pratique
|
| 61 |
-
les tests couvrent les clés attendues, mais la robustesse prévaut.
|
| 62 |
-
"""
|
| 63 |
-
|
| 64 |
-
def __missing__(self, key: str) -> str:
|
| 65 |
-
logger.warning("[narrative] clé manquante dans payload : %r", key)
|
| 66 |
-
return "?"
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
def render_fact(fact: Fact, lang: str = "fr") -> str:
|
| 70 |
-
"""Rend un Fact en une phrase selon la langue.
|
| 71 |
-
|
| 72 |
-
Retourne ``""`` si le template est absent pour ce type.
|
| 73 |
-
"""
|
| 74 |
-
templates = _load_templates(lang)
|
| 75 |
-
tpl = templates.get(fact.type.value)
|
| 76 |
-
if not tpl:
|
| 77 |
-
return ""
|
| 78 |
-
|
| 79 |
-
try:
|
| 80 |
-
return tpl.format_map(_SafeFormatMap(fact.payload))
|
| 81 |
-
except (ValueError, KeyError) as e:
|
| 82 |
-
logger.warning(
|
| 83 |
-
"[narrative] rendu impossible pour %s : %s", fact.type.value, e,
|
| 84 |
-
)
|
| 85 |
-
return ""
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
def render_synthesis(facts: Iterable[Fact], lang: str = "fr") -> list[str]:
|
| 89 |
-
"""Rend une liste de Fact en liste de phrases (ordre préservé)."""
|
| 90 |
-
out: list[str] = []
|
| 91 |
-
for fact in facts:
|
| 92 |
-
phrase = render_fact(fact, lang)
|
| 93 |
-
phrase = re.sub(r"\s+", " ", phrase).strip()
|
| 94 |
-
if phrase:
|
| 95 |
-
out.append(phrase)
|
| 96 |
-
return out
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
def extract_numbers(text: str) -> list[str]:
|
| 100 |
-
"""Extrait les nombres (décimaux ou entiers) présents dans une phrase.
|
| 101 |
-
|
| 102 |
-
Utilisé par le test de traçabilité : chaque nombre remonté en synthèse
|
| 103 |
-
doit être présent dans le JSON d'entrée.
|
| 104 |
-
"""
|
| 105 |
-
return re.findall(r"\d+(?:[.,]\d+)?", text)
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.narrative.renderer`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Le moteur narratif
|
| 4 |
+
(Cercle 2 — measurements/) a quitté ``picarones.core.narrative``.
|
| 5 |
+
Cet alias maintient la rétrocompat des imports historiques.
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
+
from picarones.measurements.narrative.renderer import * # noqa: F401, F403
|
| 9 |
|
| 10 |
+
import picarones.measurements.narrative.renderer as _module
|
| 11 |
+
__all__ = getattr(_module, "__all__", [
|
| 12 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 13 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,309 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
l'utilité aval d'un OCR ne se mesure pas seulement au CER ; ce qui
|
| 9 |
-
compte c'est de savoir si les **entités nommées** (personnes, lieux,
|
| 10 |
-
dates, organisations) ont survécu à la transcription. Un CER de 5 %
|
| 11 |
-
qui rate 80 % des noms propres est inutilisable pour l'indexation
|
| 12 |
-
prosopographique.
|
| 13 |
-
|
| 14 |
-
Stratégie de découpage en sprints
|
| 15 |
-
---------------------------------
|
| 16 |
-
Comme pour la divergence taxonomique (Sprints 35-37), on découpe :
|
| 17 |
-
|
| 18 |
-
- **Sprint 38** (ici) — couche de calcul pure : alignement IoU entre
|
| 19 |
-
deux listes d'entités, calcul de Precision/Recall/F1 par catégorie
|
| 20 |
-
et global, détection des hallucinations d'entité. Aucune dépendance
|
| 21 |
-
externe (pas de spaCy, pas de Stanza) ; les listes d'entités sont
|
| 22 |
-
fournies en entrée. Un test de l'enregistrement dans le registre
|
| 23 |
-
typé Sprint 34 garantit l'intégration.
|
| 24 |
-
- **Sprint à venir** — backend extracteur (spaCy / Stanza / HIPE) et
|
| 25 |
-
câblage runner+narratif+HTML.
|
| 26 |
-
|
| 27 |
-
Format des entités
|
| 28 |
-
------------------
|
| 29 |
-
Compatible avec ``EntitiesGT`` du Sprint 32 — chaque entité est un
|
| 30 |
-
dictionnaire ``{"label": str, "start": int, "end": int, "text": str}``
|
| 31 |
-
où ``start``/``end`` sont des offsets caractère.
|
| 32 |
-
|
| 33 |
-
Convention d'alignement
|
| 34 |
-
-----------------------
|
| 35 |
-
Une entité hypothèse "matche" une entité de référence si :
|
| 36 |
-
|
| 37 |
-
1. les **labels sont identiques** (case-insensitive),
|
| 38 |
-
2. le ratio d'**Intersection-over-Union** (IoU) sur leurs spans
|
| 39 |
-
caractère est ``≥ iou_threshold`` (défaut : 0,5).
|
| 40 |
-
|
| 41 |
-
Une entité de référence non matchée → faux négatif (recall pénalisé).
|
| 42 |
-
Une entité hypothèse non matchée → faux positif (précision pénalisée).
|
| 43 |
-
Un faux positif est aussi compté comme **hallucination d'entité**, ce
|
| 44 |
-
qui est utile pour les VLM/LLM qui inventent.
|
| 45 |
-
|
| 46 |
-
Limites
|
| 47 |
-
-------
|
| 48 |
-
- L'alignement bag-of-spans : une entité peut être matchée par au plus
|
| 49 |
-
une entité de l'autre côté (sinon double-comptage).
|
| 50 |
-
- Les modèles NER (spaCy, etc.) hallucinent eux-mêmes. La métrique
|
| 51 |
-
mesure conjointement OCR + NER. Documenter explicitement.
|
| 52 |
"""
|
| 53 |
|
| 54 |
-
from
|
| 55 |
-
|
| 56 |
-
import logging
|
| 57 |
-
from dataclasses import dataclass
|
| 58 |
-
from typing import Iterable
|
| 59 |
-
|
| 60 |
-
from picarones.core.metric_registry import register_metric
|
| 61 |
-
from picarones.core.modules import ArtifactType
|
| 62 |
-
|
| 63 |
-
logger = logging.getLogger(__name__)
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 67 |
-
# Modèle de données
|
| 68 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
@dataclass(frozen=True)
|
| 72 |
-
class Entity:
|
| 73 |
-
"""Entité nommée alignée sur un texte.
|
| 74 |
-
|
| 75 |
-
Attributs
|
| 76 |
-
---------
|
| 77 |
-
label:
|
| 78 |
-
Catégorie de l'entité (ex. ``"PER"``, ``"LOC"``, ``"DATE"``).
|
| 79 |
-
La comparaison se fait en *case-insensitive*.
|
| 80 |
-
start, end:
|
| 81 |
-
Offsets caractère (inclus, exclu) sur le texte de référence.
|
| 82 |
-
text:
|
| 83 |
-
Forme de surface — informative, **non utilisée pour
|
| 84 |
-
l'alignement** (deux entités peuvent matcher même si leur
|
| 85 |
-
forme de surface diffère, du moment que leurs spans
|
| 86 |
-
chevauchent suffisamment).
|
| 87 |
-
"""
|
| 88 |
-
|
| 89 |
-
label: str
|
| 90 |
-
start: int
|
| 91 |
-
end: int
|
| 92 |
-
text: str = ""
|
| 93 |
-
|
| 94 |
-
def __post_init__(self) -> None:
|
| 95 |
-
if self.start > self.end:
|
| 96 |
-
raise ValueError(
|
| 97 |
-
f"Entity span invalide : start={self.start} > end={self.end}"
|
| 98 |
-
)
|
| 99 |
-
|
| 100 |
-
@property
|
| 101 |
-
def length(self) -> int:
|
| 102 |
-
return max(0, self.end - self.start)
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
def _to_entity(obj: Entity | dict) -> Entity:
|
| 106 |
-
"""Coerce un dict (format EntitiesGT) en ``Entity``."""
|
| 107 |
-
if isinstance(obj, Entity):
|
| 108 |
-
return obj
|
| 109 |
-
return Entity(
|
| 110 |
-
label=str(obj["label"]),
|
| 111 |
-
start=int(obj["start"]),
|
| 112 |
-
end=int(obj["end"]),
|
| 113 |
-
text=str(obj.get("text", "")),
|
| 114 |
-
)
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
# ────────────────────────────────────────────────────────────��─────────────
|
| 118 |
-
# Alignement par IoU
|
| 119 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
def _iou(a: Entity, b: Entity) -> float:
|
| 123 |
-
"""Intersection-over-Union sur les spans caractère."""
|
| 124 |
-
inter_start = max(a.start, b.start)
|
| 125 |
-
inter_end = min(a.end, b.end)
|
| 126 |
-
inter = max(0, inter_end - inter_start)
|
| 127 |
-
union = a.length + b.length - inter
|
| 128 |
-
if union <= 0:
|
| 129 |
-
return 0.0
|
| 130 |
-
return inter / union
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
def _align(
|
| 134 |
-
references: list[Entity],
|
| 135 |
-
hypotheses: list[Entity],
|
| 136 |
-
iou_threshold: float,
|
| 137 |
-
) -> tuple[list[tuple[int, int, float]], set[int], set[int]]:
|
| 138 |
-
"""Aligne deux listes d'entités par IoU décroissant (greedy).
|
| 139 |
-
|
| 140 |
-
Returns
|
| 141 |
-
-------
|
| 142 |
-
matches:
|
| 143 |
-
Liste de triplets ``(idx_ref, idx_hyp, iou)`` triés par IoU
|
| 144 |
-
décroissant — chaque entité n'apparaît qu'une fois.
|
| 145 |
-
unmatched_refs:
|
| 146 |
-
Indices des entités GT non matchées (faux négatifs).
|
| 147 |
-
unmatched_hyps:
|
| 148 |
-
Indices des entités hypothèse non matchées (faux positifs).
|
| 149 |
-
"""
|
| 150 |
-
candidates: list[tuple[float, int, int]] = []
|
| 151 |
-
for i, r in enumerate(references):
|
| 152 |
-
for j, h in enumerate(hypotheses):
|
| 153 |
-
if r.label.casefold() != h.label.casefold():
|
| 154 |
-
continue
|
| 155 |
-
score = _iou(r, h)
|
| 156 |
-
if score >= iou_threshold:
|
| 157 |
-
candidates.append((score, i, j))
|
| 158 |
-
|
| 159 |
-
# Tri par IoU décroissant ; à IoU égale, on prend l'ordre des paires
|
| 160 |
-
# pour garantir un tri stable et déterministe.
|
| 161 |
-
candidates.sort(key=lambda t: (-t[0], t[1], t[2]))
|
| 162 |
-
|
| 163 |
-
matched_refs: set[int] = set()
|
| 164 |
-
matched_hyps: set[int] = set()
|
| 165 |
-
matches: list[tuple[int, int, float]] = []
|
| 166 |
-
for score, i, j in candidates:
|
| 167 |
-
if i in matched_refs or j in matched_hyps:
|
| 168 |
-
continue
|
| 169 |
-
matched_refs.add(i)
|
| 170 |
-
matched_hyps.add(j)
|
| 171 |
-
matches.append((i, j, score))
|
| 172 |
-
|
| 173 |
-
unmatched_refs = set(range(len(references))) - matched_refs
|
| 174 |
-
unmatched_hyps = set(range(len(hypotheses))) - matched_hyps
|
| 175 |
-
return matches, unmatched_refs, unmatched_hyps
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 179 |
-
# Calcul des métriques
|
| 180 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
def _prf(tp: int, fp: int, fn: int) -> dict[str, float]:
|
| 184 |
-
"""Précision / rappel / F1 à partir des comptes."""
|
| 185 |
-
precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
|
| 186 |
-
recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
|
| 187 |
-
f1 = (
|
| 188 |
-
2 * precision * recall / (precision + recall)
|
| 189 |
-
if (precision + recall) > 0
|
| 190 |
-
else 0.0
|
| 191 |
-
)
|
| 192 |
-
return {
|
| 193 |
-
"precision": precision,
|
| 194 |
-
"recall": recall,
|
| 195 |
-
"f1": f1,
|
| 196 |
-
"support": tp + fn,
|
| 197 |
-
}
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
def compute_ner_metrics(
|
| 201 |
-
reference_entities: Iterable[Entity | dict],
|
| 202 |
-
hypothesis_entities: Iterable[Entity | dict],
|
| 203 |
-
iou_threshold: float = 0.5,
|
| 204 |
-
) -> dict:
|
| 205 |
-
"""Calcule la précision/rappel/F1 sur entités nommées.
|
| 206 |
-
|
| 207 |
-
Parameters
|
| 208 |
-
----------
|
| 209 |
-
reference_entities:
|
| 210 |
-
Liste d'entités GT (format ``Entity`` ou dict de
|
| 211 |
-
``EntitiesGT``).
|
| 212 |
-
hypothesis_entities:
|
| 213 |
-
Liste d'entités produites par le NER sur la sortie OCR.
|
| 214 |
-
iou_threshold:
|
| 215 |
-
Seuil de chevauchement caractère pour qu'un appariement
|
| 216 |
-
soit valide (défaut : 0,5 — convention CoNLL/HIPE).
|
| 217 |
-
|
| 218 |
-
Returns
|
| 219 |
-
-------
|
| 220 |
-
dict
|
| 221 |
-
``{
|
| 222 |
-
"global": {"precision", "recall", "f1", "support"},
|
| 223 |
-
"per_category": {label: {"precision", ...}},
|
| 224 |
-
"true_positives": int,
|
| 225 |
-
"false_positives": int,
|
| 226 |
-
"false_negatives": int,
|
| 227 |
-
"hallucinated_entities": list[dict], # entités OCR sans GT
|
| 228 |
-
"missed_entities": list[dict], # entités GT non détectées
|
| 229 |
-
"iou_threshold": float,
|
| 230 |
-
}``
|
| 231 |
-
"""
|
| 232 |
-
refs = [_to_entity(e) for e in reference_entities]
|
| 233 |
-
hyps = [_to_entity(e) for e in hypothesis_entities]
|
| 234 |
-
|
| 235 |
-
matches, unmatched_refs, unmatched_hyps = _align(refs, hyps, iou_threshold)
|
| 236 |
-
|
| 237 |
-
tp = len(matches)
|
| 238 |
-
fn = len(unmatched_refs)
|
| 239 |
-
fp = len(unmatched_hyps)
|
| 240 |
-
|
| 241 |
-
# Comptes par catégorie
|
| 242 |
-
cat_tp: dict[str, int] = {}
|
| 243 |
-
cat_fn: dict[str, int] = {}
|
| 244 |
-
cat_fp: dict[str, int] = {}
|
| 245 |
-
for i, _j, _score in matches:
|
| 246 |
-
cat = refs[i].label
|
| 247 |
-
cat_tp[cat] = cat_tp.get(cat, 0) + 1
|
| 248 |
-
for i in unmatched_refs:
|
| 249 |
-
cat = refs[i].label
|
| 250 |
-
cat_fn[cat] = cat_fn.get(cat, 0) + 1
|
| 251 |
-
for j in unmatched_hyps:
|
| 252 |
-
cat = hyps[j].label
|
| 253 |
-
cat_fp[cat] = cat_fp.get(cat, 0) + 1
|
| 254 |
-
|
| 255 |
-
all_categories = sorted(set(cat_tp) | set(cat_fn) | set(cat_fp))
|
| 256 |
-
per_category = {
|
| 257 |
-
cat: _prf(cat_tp.get(cat, 0), cat_fp.get(cat, 0), cat_fn.get(cat, 0))
|
| 258 |
-
for cat in all_categories
|
| 259 |
-
}
|
| 260 |
-
|
| 261 |
-
return {
|
| 262 |
-
"global": _prf(tp, fp, fn),
|
| 263 |
-
"per_category": per_category,
|
| 264 |
-
"true_positives": tp,
|
| 265 |
-
"false_positives": fp,
|
| 266 |
-
"false_negatives": fn,
|
| 267 |
-
"hallucinated_entities": [
|
| 268 |
-
{"label": hyps[j].label, "start": hyps[j].start,
|
| 269 |
-
"end": hyps[j].end, "text": hyps[j].text}
|
| 270 |
-
for j in sorted(unmatched_hyps)
|
| 271 |
-
],
|
| 272 |
-
"missed_entities": [
|
| 273 |
-
{"label": refs[i].label, "start": refs[i].start,
|
| 274 |
-
"end": refs[i].end, "text": refs[i].text}
|
| 275 |
-
for i in sorted(unmatched_refs)
|
| 276 |
-
],
|
| 277 |
-
"iou_threshold": iou_threshold,
|
| 278 |
-
}
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 282 |
-
# Enregistrement dans le registre typé (Sprint 34)
|
| 283 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 284 |
-
|
| 285 |
-
|
| 286 |
-
@register_metric(
|
| 287 |
-
name="ner_f1",
|
| 288 |
-
input_types=(ArtifactType.ENTITIES, ArtifactType.ENTITIES),
|
| 289 |
-
description=(
|
| 290 |
-
"F1 global sur les entités nommées (alignement IoU ≥ 0,5, "
|
| 291 |
-
"labels case-insensitive). Pour le détail par catégorie, "
|
| 292 |
-
"utiliser compute_ner_metrics directement."
|
| 293 |
-
),
|
| 294 |
-
higher_is_better=True,
|
| 295 |
-
tags={"downstream", "ner", "structure"},
|
| 296 |
-
)
|
| 297 |
-
def ner_f1(
|
| 298 |
-
reference_entities: Iterable[Entity | dict],
|
| 299 |
-
hypothesis_entities: Iterable[Entity | dict],
|
| 300 |
-
) -> float:
|
| 301 |
-
"""F1 global ; raccourci enregistré pour les jonctions ``(ENTITIES, ENTITIES)``."""
|
| 302 |
-
return compute_ner_metrics(reference_entities, hypothesis_entities)["global"]["f1"]
|
| 303 |
-
|
| 304 |
|
| 305 |
-
|
| 306 |
-
|
| 307 |
-
"
|
| 308 |
-
|
| 309 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.ner`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.ner import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.ner import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.ner as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
@@ -1,227 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
|
|
|
| 7 |
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
callable ``(text: str) -> list[dict]`` est un extracteur valide. Le
|
| 12 |
-
format de sortie est compatible ``EntitiesGT`` (Sprint 32) et
|
| 13 |
-
``compute_ner_metrics`` (Sprint 38).
|
| 14 |
-
- ``SpacyEntityExtractor`` : implémentation par défaut, lazy-import de
|
| 15 |
-
spaCy. Si spaCy n'est pas installé OU si le modèle n'est pas
|
| 16 |
-
téléchargé, retourne ``[]`` avec un ``logger.warning`` explicite
|
| 17 |
-
(cf. règle CLAUDE.md : pas de ``except: pass``).
|
| 18 |
-
- ``SPACY_PROFILES`` : dict de profils nommés vers noms de modèles
|
| 19 |
-
spaCy (FR, EN, multilingue, HIPE pour les corpus historiques).
|
| 20 |
-
- ``get_extractor(profile)`` : factory qui retourne l'extracteur
|
| 21 |
-
correspondant au profil demandé.
|
| 22 |
-
|
| 23 |
-
Découplage runner ↔ backend
|
| 24 |
-
---------------------------
|
| 25 |
-
Le runner reçoit un ``EntityExtractor`` en paramètre — il n'importe
|
| 26 |
-
**jamais** spaCy directement. Cela permet :
|
| 27 |
-
|
| 28 |
-
1. de **tester** sans dépendance externe (le test injecte un callable
|
| 29 |
-
qui simule l'extraction) ;
|
| 30 |
-
2. de **brancher** des backends alternatifs (Stanza, HIPE custom,
|
| 31 |
-
modèle fine-tuné maison) sans modifier le runner ;
|
| 32 |
-
3. de **désactiver** la métrique en passant ``None`` — comportement
|
| 33 |
-
par défaut, rétrocompat stricte.
|
| 34 |
"""
|
| 35 |
|
| 36 |
-
from
|
| 37 |
-
|
| 38 |
-
import logging
|
| 39 |
-
from typing import Any, Protocol
|
| 40 |
-
|
| 41 |
-
logger = logging.getLogger(__name__)
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 45 |
-
# Interface
|
| 46 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
class EntityExtractor(Protocol):
|
| 50 |
-
"""Tout callable ``(text) -> list[dict]`` est un extracteur valide.
|
| 51 |
-
|
| 52 |
-
Format de sortie attendu : liste de dicts
|
| 53 |
-
``{"label": str, "start": int, "end": int, "text": str}``
|
| 54 |
-
compatibles avec ``compute_ner_metrics`` (Sprint 38) et
|
| 55 |
-
``EntitiesGT`` (Sprint 32).
|
| 56 |
-
"""
|
| 57 |
-
|
| 58 |
-
def __call__(self, text: str) -> list[dict[str, Any]]: ...
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 62 |
-
# Profils spaCy nommés
|
| 63 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
SPACY_PROFILES: dict[str, str] = {
|
| 67 |
-
"fr": "fr_core_news_sm",
|
| 68 |
-
"fr_lg": "fr_core_news_lg",
|
| 69 |
-
"en": "en_core_web_sm",
|
| 70 |
-
"en_lg": "en_core_web_lg",
|
| 71 |
-
"multilingual": "xx_ent_wiki_sm",
|
| 72 |
-
# HIPE 2022 — modèle historique multilingue (Hugging Face). Pas
|
| 73 |
-
# toujours disponible via ``spacy.load`` direct ; documenté pour
|
| 74 |
-
# mémoire, l'utilisateur peut le wrapper dans un EntityExtractor
|
| 75 |
-
# custom si besoin.
|
| 76 |
-
"hipe": "fr_core_news_lg",
|
| 77 |
-
}
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 81 |
-
# Backend spaCy
|
| 82 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
class SpacyEntityExtractor:
|
| 86 |
-
"""Extracteur d'entités basé sur spaCy.
|
| 87 |
-
|
| 88 |
-
Lazy-import : ``spacy`` n'est importé qu'au premier appel. Le
|
| 89 |
-
modèle est chargé une seule fois et mis en cache sur l'instance.
|
| 90 |
-
|
| 91 |
-
Si spaCy n'est pas installé OU si le modèle demandé n'est pas
|
| 92 |
-
téléchargé, l'extracteur tombe en mode dégradé (retourne ``[]``
|
| 93 |
-
pour chaque appel) et émet un ``logger.warning`` au premier
|
| 94 |
-
appel.
|
| 95 |
-
|
| 96 |
-
Parameters
|
| 97 |
-
----------
|
| 98 |
-
model_name:
|
| 99 |
-
Nom du modèle spaCy à charger (ex. ``"fr_core_news_sm"``).
|
| 100 |
-
label_mapping:
|
| 101 |
-
Dict optionnel ``{spacy_label: target_label}`` pour
|
| 102 |
-
normaliser les labels (ex. spaCy utilise ``"PERSON"``,
|
| 103 |
-
on veut ``"PER"``). Si ``None``, garde les labels tels
|
| 104 |
-
quels.
|
| 105 |
-
|
| 106 |
-
Examples
|
| 107 |
-
--------
|
| 108 |
-
>>> extractor = SpacyEntityExtractor("fr_core_news_sm")
|
| 109 |
-
>>> entities = extractor("Marie de Bourgogne, en 1477.")
|
| 110 |
-
>>> # liste de dicts {label, start, end, text}, ou [] si spaCy absent
|
| 111 |
-
"""
|
| 112 |
-
|
| 113 |
-
# Mapping par défaut spaCy → conventions HIPE/CoNLL courtes
|
| 114 |
-
DEFAULT_LABEL_MAPPING: dict[str, str] = {
|
| 115 |
-
"PERSON": "PER",
|
| 116 |
-
"PER": "PER",
|
| 117 |
-
"LOC": "LOC",
|
| 118 |
-
"GPE": "LOC", # Geo-Political Entity → LOC
|
| 119 |
-
"ORG": "ORG",
|
| 120 |
-
"DATE": "DATE",
|
| 121 |
-
"TIME": "DATE",
|
| 122 |
-
"MISC": "MISC",
|
| 123 |
-
}
|
| 124 |
-
|
| 125 |
-
def __init__(
|
| 126 |
-
self,
|
| 127 |
-
model_name: str = "fr_core_news_sm",
|
| 128 |
-
label_mapping: dict[str, str] | None = None,
|
| 129 |
-
) -> None:
|
| 130 |
-
self.model_name = model_name
|
| 131 |
-
self.label_mapping = (
|
| 132 |
-
dict(label_mapping)
|
| 133 |
-
if label_mapping is not None
|
| 134 |
-
else dict(self.DEFAULT_LABEL_MAPPING)
|
| 135 |
-
)
|
| 136 |
-
self._nlp: Any | None = None
|
| 137 |
-
self._loaded: bool = False
|
| 138 |
-
self._available: bool = False
|
| 139 |
-
|
| 140 |
-
def _load(self) -> None:
|
| 141 |
-
"""Charge spaCy + modèle au premier appel. Idempotent."""
|
| 142 |
-
if self._loaded:
|
| 143 |
-
return
|
| 144 |
-
self._loaded = True
|
| 145 |
-
try:
|
| 146 |
-
import spacy # type: ignore[import-untyped]
|
| 147 |
-
except ImportError as exc:
|
| 148 |
-
logger.warning(
|
| 149 |
-
"[ner_backends] spaCy non installé (%s) — extraction NER "
|
| 150 |
-
"désactivée. Installer avec `pip install picarones[ner]`.",
|
| 151 |
-
exc,
|
| 152 |
-
)
|
| 153 |
-
return
|
| 154 |
-
try:
|
| 155 |
-
self._nlp = spacy.load(self.model_name)
|
| 156 |
-
self._available = True
|
| 157 |
-
except OSError as exc:
|
| 158 |
-
logger.warning(
|
| 159 |
-
"[ner_backends] Modèle spaCy %r introuvable (%s) — extraction "
|
| 160 |
-
"NER désactivée. Télécharger avec `python -m spacy download %s`.",
|
| 161 |
-
self.model_name, exc, self.model_name,
|
| 162 |
-
)
|
| 163 |
-
|
| 164 |
-
@property
|
| 165 |
-
def available(self) -> bool:
|
| 166 |
-
"""``True`` si spaCy + le modèle sont chargés et utilisables."""
|
| 167 |
-
if not self._loaded:
|
| 168 |
-
self._load()
|
| 169 |
-
return self._available
|
| 170 |
-
|
| 171 |
-
def __call__(self, text: str) -> list[dict[str, Any]]:
|
| 172 |
-
if not text:
|
| 173 |
-
return []
|
| 174 |
-
if not self.available or self._nlp is None:
|
| 175 |
-
return []
|
| 176 |
-
doc = self._nlp(text)
|
| 177 |
-
results: list[dict[str, Any]] = []
|
| 178 |
-
for ent in doc.ents:
|
| 179 |
-
label = self.label_mapping.get(ent.label_, ent.label_)
|
| 180 |
-
results.append({
|
| 181 |
-
"label": label,
|
| 182 |
-
"start": int(ent.start_char),
|
| 183 |
-
"end": int(ent.end_char),
|
| 184 |
-
"text": ent.text,
|
| 185 |
-
})
|
| 186 |
-
return results
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 190 |
-
# Factory
|
| 191 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
def get_extractor(profile: str = "fr") -> SpacyEntityExtractor:
|
| 195 |
-
"""Retourne un extracteur spaCy pour le profil demandé.
|
| 196 |
-
|
| 197 |
-
Le profil peut être :
|
| 198 |
-
|
| 199 |
-
- une clé de ``SPACY_PROFILES`` (ex. ``"fr"``, ``"en"``,
|
| 200 |
-
``"multilingual"``)
|
| 201 |
-
- un nom de modèle spaCy direct (ex. ``"fr_core_news_lg"``)
|
| 202 |
-
|
| 203 |
-
L'extracteur est instancié paresseusement (le modèle n'est chargé
|
| 204 |
-
qu'au premier appel). Si le modèle n'est pas disponible,
|
| 205 |
-
l'extracteur tombe en mode dégradé silencieux (retourne ``[]``).
|
| 206 |
-
"""
|
| 207 |
-
model_name = SPACY_PROFILES.get(profile, profile)
|
| 208 |
-
return SpacyEntityExtractor(model_name=model_name)
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
def is_spacy_available() -> bool:
|
| 212 |
-
"""``True`` si la librairie ``spacy`` est importable, sans charger
|
| 213 |
-
de modèle."""
|
| 214 |
-
try:
|
| 215 |
-
import spacy # noqa: F401
|
| 216 |
-
except ImportError:
|
| 217 |
-
return False
|
| 218 |
-
return True
|
| 219 |
-
|
| 220 |
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
"
|
| 224 |
-
|
| 225 |
-
"get_extractor",
|
| 226 |
-
"is_spacy_available",
|
| 227 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.ner_backends`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.ner_backends import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.ner_backends import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.ner_backends as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
@@ -1,420 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
Trois niveaux de normalisation sont disponibles :
|
| 11 |
-
|
| 12 |
-
1. NFC : normalisation Unicode canonique (décomposition+recomposition)
|
| 13 |
-
2. caseless : NFC + pliage de casse (casefold)
|
| 14 |
-
3. diplomatic: NFC + table de correspondances historiques configurables
|
| 15 |
-
|
| 16 |
-
Les profils préconfigurés couvrent les cas d'usage patrimoniaux courants.
|
| 17 |
-
Ils sont également chargeables depuis un fichier YAML.
|
| 18 |
-
|
| 19 |
-
Exemple YAML
|
| 20 |
-
------------
|
| 21 |
-
name: medieval_custom
|
| 22 |
-
caseless: false
|
| 23 |
-
diplomatic:
|
| 24 |
-
ſ: s
|
| 25 |
-
u: v
|
| 26 |
-
i: j
|
| 27 |
-
y: i
|
| 28 |
-
æ: ae
|
| 29 |
-
œ: oe
|
| 30 |
"""
|
| 31 |
|
| 32 |
-
from
|
| 33 |
-
|
| 34 |
-
import unicodedata
|
| 35 |
-
from dataclasses import dataclass, field
|
| 36 |
-
from pathlib import Path
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
# ---------------------------------------------------------------------------
|
| 40 |
-
# Tables de correspondances diplomatiques préconfigurées
|
| 41 |
-
# ---------------------------------------------------------------------------
|
| 42 |
-
|
| 43 |
-
#: Français médiéval (XIIe–XVe siècle)
|
| 44 |
-
DIPLOMATIC_FR_MEDIEVAL: dict[str, str] = {
|
| 45 |
-
"ſ": "s", # s long → s
|
| 46 |
-
"u": "v", # u/v interchangeables en position initiale
|
| 47 |
-
"i": "j", # i/j interchangeables
|
| 48 |
-
"y": "i", # y vocalique → i
|
| 49 |
-
"æ": "ae", # ligature æ
|
| 50 |
-
"œ": "oe", # ligature œ
|
| 51 |
-
"ꝑ": "per", # abréviation per/par
|
| 52 |
-
"ꝓ": "pro", # abréviation pro
|
| 53 |
-
"\u0026": "et", # & → et
|
| 54 |
-
}
|
| 55 |
-
|
| 56 |
-
#: Français moderne / imprimés anciens (XVIe–XVIIIe siècle)
|
| 57 |
-
DIPLOMATIC_FR_EARLY_MODERN: dict[str, str] = {
|
| 58 |
-
"ſ": "s", # s long
|
| 59 |
-
"æ": "ae",
|
| 60 |
-
"œ": "oe",
|
| 61 |
-
"\u0026": "et",
|
| 62 |
-
"ỹ": "yn", # y tilde
|
| 63 |
-
}
|
| 64 |
-
|
| 65 |
-
#: Latin médiéval
|
| 66 |
-
DIPLOMATIC_LATIN_MEDIEVAL: dict[str, str] = {
|
| 67 |
-
"ſ": "s",
|
| 68 |
-
"u": "v",
|
| 69 |
-
"i": "j",
|
| 70 |
-
"y": "i",
|
| 71 |
-
"æ": "ae",
|
| 72 |
-
"œ": "oe",
|
| 73 |
-
"ꝑ": "per",
|
| 74 |
-
"ꝓ": "pro",
|
| 75 |
-
"ꝗ": "que", # q barré → que
|
| 76 |
-
"\u0026": "et",
|
| 77 |
-
}
|
| 78 |
-
|
| 79 |
-
#: Profil minimal — uniquement NFC + s long
|
| 80 |
-
DIPLOMATIC_MINIMAL: dict[str, str] = {
|
| 81 |
-
"ſ": "s",
|
| 82 |
-
}
|
| 83 |
-
|
| 84 |
-
#: Anglais moderne / imprimés anciens (XVIe–XVIIIe siècle)
|
| 85 |
-
#: Orthographe «early modern» : ſ=s, u/v, i/j, vv=w, þ=th, ð=th, ȝ=y
|
| 86 |
-
DIPLOMATIC_EN_EARLY_MODERN: dict[str, str] = {
|
| 87 |
-
"ſ": "s", # s long → s
|
| 88 |
-
"u": "v", # u/v interchangeables (vpon → upon)
|
| 89 |
-
"i": "j", # i/j interchangeables (ioy → joy)
|
| 90 |
-
"vv": "w", # vv → w (vvhich → which)
|
| 91 |
-
"þ": "th", # thorn → th
|
| 92 |
-
"ð": "th", # eth → th
|
| 93 |
-
"ȝ": "y", # yogh → y
|
| 94 |
-
"æ": "ae", # ligature æ
|
| 95 |
-
"œ": "oe", # ligature œ
|
| 96 |
-
"\u0026": "and", # & → and
|
| 97 |
-
}
|
| 98 |
-
|
| 99 |
-
#: Anglais médiéval (XIIe–XVe siècle) — abréviations manuscrites incluses
|
| 100 |
-
DIPLOMATIC_EN_MEDIEVAL: dict[str, str] = {
|
| 101 |
-
"ſ": "s",
|
| 102 |
-
"u": "v",
|
| 103 |
-
"i": "j",
|
| 104 |
-
"vv": "w",
|
| 105 |
-
"þ": "th",
|
| 106 |
-
"ð": "th",
|
| 107 |
-
"ȝ": "y",
|
| 108 |
-
"æ": "ae",
|
| 109 |
-
"œ": "oe",
|
| 110 |
-
"\u0026": "and",
|
| 111 |
-
# Abréviations courantes dans les manuscrits anglais médiévaux
|
| 112 |
-
"ꝑ": "per", # p barré → per/par
|
| 113 |
-
"ꝓ": "pro", # p crocheté → pro
|
| 114 |
-
"ꝗ": "que", # q barré → que
|
| 115 |
-
"\ua75b": "r", # lettre r rotunda → r
|
| 116 |
-
}
|
| 117 |
-
|
| 118 |
-
#: Écriture secrétaire (XVIe–XVIIe siècle) — secretary hand
|
| 119 |
-
#: Confusions visuelles propres à l'écriture cursive anglaise
|
| 120 |
-
DIPLOMATIC_EN_SECRETARY: dict[str, str] = {
|
| 121 |
-
"ſ": "s",
|
| 122 |
-
"u": "v",
|
| 123 |
-
"i": "j",
|
| 124 |
-
"vv": "w",
|
| 125 |
-
"þ": "th",
|
| 126 |
-
"ð": "th",
|
| 127 |
-
"ȝ": "y",
|
| 128 |
-
"\u0026": "and",
|
| 129 |
-
# Confusions visuelles typiques : e/c, n/u, m/w en secrétaire
|
| 130 |
-
# Note : ne pas normaliser e/c automatiquement (trop agressif) ;
|
| 131 |
-
# on se limite aux substituts graphiques historiquement documentés
|
| 132 |
-
}
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
# ---------------------------------------------------------------------------
|
| 136 |
-
# Profil de normalisation
|
| 137 |
-
# ---------------------------------------------------------------------------
|
| 138 |
-
|
| 139 |
-
@dataclass
|
| 140 |
-
class NormalizationProfile:
|
| 141 |
-
"""Décrit une stratégie de normalisation pour le calcul du CER diplomatique.
|
| 142 |
-
|
| 143 |
-
Parameters
|
| 144 |
-
----------
|
| 145 |
-
name:
|
| 146 |
-
Identifiant lisible du profil (ex : ``"medieval_french"``).
|
| 147 |
-
nfc:
|
| 148 |
-
Applique la normalisation Unicode NFC (recommandé, activé par défaut).
|
| 149 |
-
caseless:
|
| 150 |
-
Pliage de casse (casefold) après NFC.
|
| 151 |
-
diplomatic_table:
|
| 152 |
-
Table de correspondances graphiques historiques appliquée caractère
|
| 153 |
-
par caractère sur les deux textes avant calcul du CER.
|
| 154 |
-
exclude_chars:
|
| 155 |
-
Ensemble de caractères supprimés des deux textes (GT et OCR) avant
|
| 156 |
-
tout calcul de métriques (CER, WER, MER, WIL et CER diplomatique).
|
| 157 |
-
Utile pour ignorer la ponctuation ou les apostrophes.
|
| 158 |
-
description:
|
| 159 |
-
Description courte du profil (affichée dans le rapport HTML).
|
| 160 |
-
"""
|
| 161 |
-
|
| 162 |
-
name: str
|
| 163 |
-
nfc: bool = True
|
| 164 |
-
caseless: bool = False
|
| 165 |
-
diplomatic_table: dict[str, str] = field(default_factory=dict)
|
| 166 |
-
exclude_chars: frozenset = field(default_factory=frozenset)
|
| 167 |
-
description: str = ""
|
| 168 |
-
|
| 169 |
-
def normalize(self, text: str) -> str:
|
| 170 |
-
"""Applique le profil de normalisation à un texte."""
|
| 171 |
-
if self.exclude_chars:
|
| 172 |
-
text = "".join(c for c in text if c not in self.exclude_chars)
|
| 173 |
-
if self.nfc:
|
| 174 |
-
text = unicodedata.normalize("NFC", text)
|
| 175 |
-
if self.caseless:
|
| 176 |
-
text = text.casefold()
|
| 177 |
-
if self.diplomatic_table:
|
| 178 |
-
text = _apply_diplomatic_table(text, self.diplomatic_table)
|
| 179 |
-
return text
|
| 180 |
-
|
| 181 |
-
def as_dict(self) -> dict:
|
| 182 |
-
return {
|
| 183 |
-
"name": self.name,
|
| 184 |
-
"nfc": self.nfc,
|
| 185 |
-
"caseless": self.caseless,
|
| 186 |
-
"diplomatic_table": self.diplomatic_table,
|
| 187 |
-
"exclude_chars": sorted(self.exclude_chars),
|
| 188 |
-
"description": self.description,
|
| 189 |
-
}
|
| 190 |
-
|
| 191 |
-
@classmethod
|
| 192 |
-
def from_yaml(cls, path: str | Path) -> "NormalizationProfile":
|
| 193 |
-
"""Charge un profil depuis un fichier YAML.
|
| 194 |
-
|
| 195 |
-
Le fichier YAML doit contenir les clés ``name``, optionnellement
|
| 196 |
-
``caseless``, ``description``, ``diplomatic`` (dict str→str) et
|
| 197 |
-
``exclude_chars`` (liste ou chaîne de caractères à ignorer).
|
| 198 |
-
|
| 199 |
-
Example
|
| 200 |
-
-------
|
| 201 |
-
.. code-block:: yaml
|
| 202 |
-
|
| 203 |
-
name: medieval_custom
|
| 204 |
-
caseless: false
|
| 205 |
-
description: Français médiéval personnalisé
|
| 206 |
-
exclude_chars: ".,;:!?"
|
| 207 |
-
diplomatic:
|
| 208 |
-
ſ: s
|
| 209 |
-
u: v
|
| 210 |
-
"""
|
| 211 |
-
try:
|
| 212 |
-
import yaml
|
| 213 |
-
except ImportError as exc:
|
| 214 |
-
raise RuntimeError(
|
| 215 |
-
"Le package 'pyyaml' est requis pour charger les profils YAML. "
|
| 216 |
-
"Installez-le avec : pip install pyyaml"
|
| 217 |
-
) from exc
|
| 218 |
-
|
| 219 |
-
data = yaml.safe_load(Path(path).read_text(encoding="utf-8"))
|
| 220 |
-
return cls(
|
| 221 |
-
name=data.get("name", Path(path).stem),
|
| 222 |
-
nfc=bool(data.get("nfc", True)),
|
| 223 |
-
caseless=bool(data.get("caseless", False)),
|
| 224 |
-
diplomatic_table=data.get("diplomatic", {}),
|
| 225 |
-
exclude_chars=_parse_exclude_chars(data.get("exclude_chars", "")),
|
| 226 |
-
description=data.get("description", ""),
|
| 227 |
-
)
|
| 228 |
-
|
| 229 |
-
@classmethod
|
| 230 |
-
def from_dict(cls, data: dict) -> "NormalizationProfile":
|
| 231 |
-
"""Charge un profil depuis un dictionnaire (ex : section YAML inline)."""
|
| 232 |
-
return cls(
|
| 233 |
-
name=data.get("name", "custom"),
|
| 234 |
-
nfc=bool(data.get("nfc", True)),
|
| 235 |
-
caseless=bool(data.get("caseless", False)),
|
| 236 |
-
diplomatic_table=data.get("diplomatic", {}),
|
| 237 |
-
exclude_chars=_parse_exclude_chars(data.get("exclude_chars", "")),
|
| 238 |
-
description=data.get("description", ""),
|
| 239 |
-
)
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
# ---------------------------------------------------------------------------
|
| 243 |
-
# Profils préconfigurés
|
| 244 |
-
# ---------------------------------------------------------------------------
|
| 245 |
-
|
| 246 |
-
NORMALIZATION_PROFILES: dict[str, NormalizationProfile] = {
|
| 247 |
-
"nfc": NormalizationProfile(
|
| 248 |
-
name="nfc",
|
| 249 |
-
nfc=True,
|
| 250 |
-
caseless=False,
|
| 251 |
-
diplomatic_table={},
|
| 252 |
-
description="Normalisation NFC uniquement",
|
| 253 |
-
),
|
| 254 |
-
"caseless": NormalizationProfile(
|
| 255 |
-
name="caseless",
|
| 256 |
-
nfc=True,
|
| 257 |
-
caseless=True,
|
| 258 |
-
diplomatic_table={},
|
| 259 |
-
description="NFC + insensible à la casse",
|
| 260 |
-
),
|
| 261 |
-
"minimal": NormalizationProfile(
|
| 262 |
-
name="minimal",
|
| 263 |
-
nfc=True,
|
| 264 |
-
caseless=False,
|
| 265 |
-
diplomatic_table=DIPLOMATIC_MINIMAL,
|
| 266 |
-
description="Minimal : NFC + s long seulement",
|
| 267 |
-
),
|
| 268 |
-
"medieval_french": NormalizationProfile(
|
| 269 |
-
name="medieval_french",
|
| 270 |
-
nfc=True,
|
| 271 |
-
caseless=False,
|
| 272 |
-
diplomatic_table=DIPLOMATIC_FR_MEDIEVAL,
|
| 273 |
-
description="Français médiéval (XIIe–XVe) : ſ=s, u=v, i=j, æ=ae, œ=oe",
|
| 274 |
-
),
|
| 275 |
-
"early_modern_french": NormalizationProfile(
|
| 276 |
-
name="early_modern_french",
|
| 277 |
-
nfc=True,
|
| 278 |
-
caseless=False,
|
| 279 |
-
diplomatic_table=DIPLOMATIC_FR_EARLY_MODERN,
|
| 280 |
-
description="Imprimés anciens (XVIe–XVIIIe) : ſ=s, æ=ae, œ=oe",
|
| 281 |
-
),
|
| 282 |
-
"medieval_latin": NormalizationProfile(
|
| 283 |
-
name="medieval_latin",
|
| 284 |
-
nfc=True,
|
| 285 |
-
caseless=False,
|
| 286 |
-
diplomatic_table=DIPLOMATIC_LATIN_MEDIEVAL,
|
| 287 |
-
description="Latin médiéval : ſ=s, u=v, i=j, ꝑ=per, ꝓ=pro",
|
| 288 |
-
),
|
| 289 |
-
"early_modern_english": NormalizationProfile(
|
| 290 |
-
name="early_modern_english",
|
| 291 |
-
nfc=True,
|
| 292 |
-
caseless=False,
|
| 293 |
-
diplomatic_table=DIPLOMATIC_EN_EARLY_MODERN,
|
| 294 |
-
description="Early Modern English (XVIth–XVIIIth c.): ſ=s, u=v, i=j, vv=w, þ=th, ð=th, ȝ=y",
|
| 295 |
-
),
|
| 296 |
-
"medieval_english": NormalizationProfile(
|
| 297 |
-
name="medieval_english",
|
| 298 |
-
nfc=True,
|
| 299 |
-
caseless=False,
|
| 300 |
-
diplomatic_table=DIPLOMATIC_EN_MEDIEVAL,
|
| 301 |
-
description="Medieval English (XIIth–XVth c.): ſ=s, u=v, i=j, þ=th, ȝ=y, ꝑ=per, ꝓ=pro",
|
| 302 |
-
),
|
| 303 |
-
"secretary_hand": NormalizationProfile(
|
| 304 |
-
name="secretary_hand",
|
| 305 |
-
nfc=True,
|
| 306 |
-
caseless=False,
|
| 307 |
-
diplomatic_table=DIPLOMATIC_EN_SECRETARY,
|
| 308 |
-
description="Secretary hand (XVIth–XVIIth c.): ſ=s, u=v, i=j, vv=w, þ=th, ð=th, ȝ=y",
|
| 309 |
-
),
|
| 310 |
-
# ── Profils d'exclusion de caractères ────────────────────────────────
|
| 311 |
-
"sans_ponctuation": NormalizationProfile(
|
| 312 |
-
name="sans_ponctuation",
|
| 313 |
-
nfc=True,
|
| 314 |
-
caseless=False,
|
| 315 |
-
diplomatic_table={},
|
| 316 |
-
exclude_chars=frozenset(". , ; : ! ? ' \u2019 \" - \u2013 \u2014 ( ) [ ]".split()),
|
| 317 |
-
description="NFC + suppression de la ponctuation courante : . , ; : ! ? ' \" - – — ( ) [ ]",
|
| 318 |
-
),
|
| 319 |
-
"sans_apostrophes": NormalizationProfile(
|
| 320 |
-
name="sans_apostrophes",
|
| 321 |
-
nfc=True,
|
| 322 |
-
caseless=False,
|
| 323 |
-
diplomatic_table={},
|
| 324 |
-
exclude_chars=frozenset(["'", "\u2019"]), # apostrophe droite + apostrophe typographique
|
| 325 |
-
description="NFC + suppression des apostrophes droite (') et typographique (\u2019)",
|
| 326 |
-
),
|
| 327 |
-
}
|
| 328 |
-
|
| 329 |
-
|
| 330 |
-
def get_builtin_profile(name: str) -> NormalizationProfile:
|
| 331 |
-
"""Retourne un profil préconfigurée par son identifiant.
|
| 332 |
-
|
| 333 |
-
Identifiants disponibles
|
| 334 |
-
------------------------
|
| 335 |
-
- ``"medieval_french"`` : français médiéval XIIe–XVe (ſ=s, u=v, i=j, æ=ae, œ=oe…)
|
| 336 |
-
- ``"early_modern_french"`` : imprimés anciens XVIe–XVIIIe (ſ=s, œ=oe, æ=ae…)
|
| 337 |
-
- ``"medieval_latin"`` : latin médiéval (ſ=s, u=v, i=j, ꝑ=per, ꝓ=pro…)
|
| 338 |
-
- ``"early_modern_english"`` : anglais imprimé XVIe–XVIIIe (ſ=s, u=v, i=j, vv=w, þ=th, ð=th, ȝ=y)
|
| 339 |
-
- ``"medieval_english"`` : anglais manuscrit XIIe–XVe (+ abréviations ꝑ, ꝓ…)
|
| 340 |
-
- ``"secretary_hand"`` : écriture secrétaire anglaise XVIe–XVIIe (cursive administrative)
|
| 341 |
-
- ``"minimal"`` : uniquement NFC + s long
|
| 342 |
-
- ``"nfc"`` : NFC seul (sans table diplomatique)
|
| 343 |
-
- ``"caseless"`` : NFC + pliage de casse
|
| 344 |
-
|
| 345 |
-
Raises
|
| 346 |
-
------
|
| 347 |
-
KeyError
|
| 348 |
-
Si le nom n'est pas reconnu.
|
| 349 |
-
"""
|
| 350 |
-
if name not in NORMALIZATION_PROFILES:
|
| 351 |
-
raise KeyError(
|
| 352 |
-
f"Profil de normalisation inconnu : '{name}'. "
|
| 353 |
-
f"Disponibles : {', '.join(NORMALIZATION_PROFILES)}"
|
| 354 |
-
)
|
| 355 |
-
return NORMALIZATION_PROFILES[name]
|
| 356 |
-
|
| 357 |
-
|
| 358 |
-
# ---------------------------------------------------------------------------
|
| 359 |
-
# Fonctions utilitaires
|
| 360 |
-
# ---------------------------------------------------------------------------
|
| 361 |
-
|
| 362 |
-
def _parse_exclude_chars(value: "str | list | None") -> frozenset:
|
| 363 |
-
"""Convertit une liste de caractères (str ou list) en frozenset.
|
| 364 |
-
|
| 365 |
-
Accepte :
|
| 366 |
-
- Une chaîne de caractères séparés par une virgule+espace (ex. ``"', -, –"``)
|
| 367 |
-
ou simplement concaténés sans séparateur (ex. ``".,;:!?"``)
|
| 368 |
-
- Une liste Python/YAML de chaînes (chacune un caractère)
|
| 369 |
-
- None ou chaîne vide → frozenset vide
|
| 370 |
-
|
| 371 |
-
Règle de désambiguïsation : si la chaîne contient la séquence ``", "``
|
| 372 |
-
(virgule suivie d'un espace), on découpe par ``", "``. Sinon, chaque
|
| 373 |
-
caractère Unicode est un item distinct.
|
| 374 |
-
"""
|
| 375 |
-
if not value:
|
| 376 |
-
return frozenset()
|
| 377 |
-
if isinstance(value, (list, tuple)):
|
| 378 |
-
return frozenset(str(c) for c in value if c)
|
| 379 |
-
raw = str(value)
|
| 380 |
-
# Désambiguïsation : séparer par ", " si présent (format lisible)
|
| 381 |
-
if ", " in raw:
|
| 382 |
-
return frozenset(c.strip() for c in raw.split(",") if c.strip())
|
| 383 |
-
# Sinon, chaque caractère Unicode est un item distinct
|
| 384 |
-
return frozenset(raw)
|
| 385 |
-
|
| 386 |
-
|
| 387 |
-
def _apply_diplomatic_table(text: str, table: dict[str, str]) -> str:
|
| 388 |
-
"""Applique une table de correspondances diplomatiques en un seul pass.
|
| 389 |
-
|
| 390 |
-
Les clés multi-caractères (ex : ``"ae"`` → ``"æ"``) sont gérées en priorité
|
| 391 |
-
sur les correspondances simples. Le remplacement est fait en un seul pass
|
| 392 |
-
via regex pour éviter les remplacements en cascade (ex : ``"ſ"→"s"`` puis
|
| 393 |
-
``"s"→"z"`` donnerait ``"z"`` au lieu de ``"s"``).
|
| 394 |
-
"""
|
| 395 |
-
if not table:
|
| 396 |
-
return text
|
| 397 |
-
|
| 398 |
-
import re
|
| 399 |
-
|
| 400 |
-
# Séparer les clés simples (1 char) des clés multi-chars
|
| 401 |
-
multi_keys = sorted(
|
| 402 |
-
(k for k in table if len(k) > 1), key=len, reverse=True
|
| 403 |
-
)
|
| 404 |
-
simple_table = {k: v for k, v in table.items() if len(k) == 1}
|
| 405 |
-
|
| 406 |
-
if multi_keys:
|
| 407 |
-
# Single-pass : construire un pattern regex avec toutes les clés multi-chars
|
| 408 |
-
# triées par longueur décroissante pour matcher les plus longues d'abord
|
| 409 |
-
pattern = re.compile("|".join(re.escape(k) for k in multi_keys))
|
| 410 |
-
text = pattern.sub(lambda m: table[m.group(0)], text)
|
| 411 |
-
|
| 412 |
-
# Remplacements char par char (single-pass via itération)
|
| 413 |
-
if simple_table:
|
| 414 |
-
text = "".join(simple_table.get(c, c) for c in text)
|
| 415 |
-
|
| 416 |
-
return text
|
| 417 |
-
|
| 418 |
|
| 419 |
-
|
| 420 |
-
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.normalization`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.normalization import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.normalization import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.normalization as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,422 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
archiviste, la **fidélité aux séquences numériques** est un
|
| 9 |
-
proxy direct de la qualité éditoriale. Un OCR qui rate
|
| 10 |
-
*« 1789 »* dans une charte révolutionnaire ou *« f. 12v »*
|
| 11 |
-
dans une cote d'archives produit un corpus inutilisable pour la
|
| 12 |
-
recherche fine, même si le CER global est respectable.
|
| 13 |
-
|
| 14 |
-
Catégories couvertes
|
| 15 |
-
--------------------
|
| 16 |
-
1. **Dates arabes** : ``1789``, ``1450``, ``1ᵉʳ janvier 1789``
|
| 17 |
-
(le module détecte les **années** sur 4 chiffres dans la
|
| 18 |
-
plage [1000-2099]).
|
| 19 |
-
2. **Numéraux romains** : ``MDCLXVIII``, ``XIV``, ``Tome IV``.
|
| 20 |
-
Réutilise ``picarones.core.roman_numerals`` (Sprint 60).
|
| 21 |
-
3. **Foliotation** : ``f. 12``, ``f. 12r``, ``fol. 24v``,
|
| 22 |
-
``p. 5``, ``pp. 12-15``, ``n° 42``.
|
| 23 |
-
4. **Montants** : ``12 livres``, ``5 sols``, ``8 deniers``,
|
| 24 |
-
``100 £``, ``50 ₣``, ``20 €``, formes Ancien Régime
|
| 25 |
-
(``l.``, ``s.``, ``d.``).
|
| 26 |
-
5. **Années régnales** : ``an III``, ``l'an V``, ``an de
|
| 27 |
-
grâce 1450``, ``an de la République``.
|
| 28 |
-
|
| 29 |
-
Méthode
|
| 30 |
-
-------
|
| 31 |
-
Pour chaque catégorie, on extrait les occurrences (regex
|
| 32 |
-
spécialisée) en GT et en hypothèse. On classe ensuite chaque
|
| 33 |
-
GT en **3 statuts** :
|
| 34 |
-
|
| 35 |
-
- ``strict_preserved`` : forme exacte présente dans
|
| 36 |
-
l'hypothèse (sensible à la casse seulement pour la
|
| 37 |
-
foliotation, sinon la convention est documentée par
|
| 38 |
-
catégorie) ;
|
| 39 |
-
- ``value_preserved`` : la **valeur** apparaît même si la
|
| 40 |
-
forme diffère (ex. ``XIV`` GT et ``14`` hypothèse —
|
| 41 |
-
considéré comme valeur préservée mais forme non) ;
|
| 42 |
-
- ``lost`` : aucune trace exploitable.
|
| 43 |
-
|
| 44 |
-
Sortie
|
| 45 |
-
------
|
| 46 |
-
``compute_numerical_sequence_metrics(reference, hypothesis)``
|
| 47 |
-
retourne :
|
| 48 |
-
|
| 49 |
-
```
|
| 50 |
-
{
|
| 51 |
-
"global_strict_score": float, # ∈ [0, 1]
|
| 52 |
-
"global_value_score": float, # ∈ [0, 1]
|
| 53 |
-
"n_total": int,
|
| 54 |
-
"per_category": {
|
| 55 |
-
"year": {"n_total": int, "strict": int, "value": int,
|
| 56 |
-
"strict_score": float, "value_score": float,
|
| 57 |
-
"lost_items": list[str]},
|
| 58 |
-
"roman": {...},
|
| 59 |
-
"foliation": {...},
|
| 60 |
-
"currency": {...},
|
| 61 |
-
"regnal": {...},
|
| 62 |
-
},
|
| 63 |
-
}
|
| 64 |
-
```
|
| 65 |
-
|
| 66 |
-
Limites
|
| 67 |
-
-------
|
| 68 |
-
- Les regex sont **conservatrices** : on rate quelques
|
| 69 |
-
formes rares plutôt que de produire des faux positifs (par
|
| 70 |
-
exemple, ``mil cinq cens`` en français médiéval n'est pas
|
| 71 |
-
détecté comme année — la couche calcul s'en tient aux
|
| 72 |
-
formes les plus reconnaissables). Pour un corpus
|
| 73 |
-
spécifique, l'utilisateur peut composer ses propres
|
| 74 |
-
détecteurs et les passer via ``custom_detectors``.
|
| 75 |
-
- ``value_preserved`` exige une équivalence de **valeur
|
| 76 |
-
numérique** : ``XIV`` ↔ ``14`` est OK pour les romains ;
|
| 77 |
-
``f. 12v`` ↔ ``f. 12r`` n'est **pas** OK pour la
|
| 78 |
-
foliotation (recto/verso est une information distincte).
|
| 79 |
"""
|
| 80 |
|
| 81 |
-
from
|
| 82 |
-
|
| 83 |
-
import logging
|
| 84 |
-
import re
|
| 85 |
-
from typing import Optional
|
| 86 |
-
|
| 87 |
-
from picarones.core.metric_registry import register_metric
|
| 88 |
-
from picarones.core.modules import ArtifactType
|
| 89 |
-
from picarones.core.roman_numerals import (
|
| 90 |
-
detect_roman_numerals,
|
| 91 |
-
roman_to_int,
|
| 92 |
-
)
|
| 93 |
-
|
| 94 |
-
logger = logging.getLogger(__name__)
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 98 |
-
# Constantes / catégories
|
| 99 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
CATEGORIES = ("year", "roman", "foliation", "currency", "regnal")
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
# Dates arabes — 4 chiffres dans la plage [1000-2099].
|
| 106 |
-
# On exige une frontière de mot pour ne pas attraper
|
| 107 |
-
# « 12345 » (volume) ou « 0001 » (numéro de page).
|
| 108 |
-
_RE_YEAR = re.compile(r"\b(1[0-9]{3}|20[0-9]{2})\b")
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
# Foliotation : f. 12, f. 12r, fol. 24v, p. 5, pp. 12-15, n° 42
|
| 112 |
-
# La capture conserve la forme intégrale (avec ponctuation et
|
| 113 |
-
# r/v) parce que recto/verso est une information distincte.
|
| 114 |
-
_RE_FOLIATION = re.compile(
|
| 115 |
-
r"\b(?:fol\.?|f\.|pp\.|p\.|n\.°|n°)\s*" # préfixe : fol., f., pp., p., n°
|
| 116 |
-
r"(\d+(?:\s*-\s*\d+)?)" # nombre ou plage (12 / 12-15)
|
| 117 |
-
r"\s*([rvRV])?", # suffixe optionnel r/v
|
| 118 |
-
re.UNICODE,
|
| 119 |
-
)
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
# Montants : nombre suivi d'une unité monétaire.
|
| 123 |
-
# On accepte espaces multiples mais pas de saut de ligne.
|
| 124 |
-
_RE_CURRENCY = re.compile(
|
| 125 |
-
r"\b(\d+(?:[.,]\d+)?)\s*" # montant (entier ou décimal)
|
| 126 |
-
r"(livres?|sols?|deniers?|écus?|florins?|francs?|"
|
| 127 |
-
r"l\.|s\.|d\.|£|€|₣)" # unité
|
| 128 |
-
r"(?=\b|[\s,;.!?:]|$)", # frontière souple post-symbole
|
| 129 |
-
re.UNICODE | re.IGNORECASE,
|
| 130 |
-
)
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
# Années régnales : « an III », « an de grâce 1450 »,
|
| 134 |
-
# « l'an V de la République ».
|
| 135 |
-
# Capture le numéral (romain ou arabe).
|
| 136 |
-
_RE_REGNAL = re.compile(
|
| 137 |
-
r"\b(?:l['’]\s*)?an\s+(?:de\s+(?:grâce|la\s+R[eé]publique)\s+)?"
|
| 138 |
-
r"([IVXLCDMivxlcdm]+|\d{1,4})\b",
|
| 139 |
-
re.UNICODE,
|
| 140 |
-
)
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 144 |
-
# Détection par catégorie
|
| 145 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
def _detect_years(text: str) -> list[tuple[str, int]]:
|
| 149 |
-
"""Retourne [(forme, valeur)] pour chaque année 4 chiffres."""
|
| 150 |
-
if not text:
|
| 151 |
-
return []
|
| 152 |
-
return [(m.group(0), int(m.group(0))) for m in _RE_YEAR.finditer(text)]
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
def _detect_romans_with_values(text: str) -> list[tuple[str, int]]:
|
| 156 |
-
"""Numéraux romains accompagnés de leur valeur entière.
|
| 157 |
-
Délègue à ``roman_numerals.detect_roman_numerals`` (Sprint 60),
|
| 158 |
-
qui retourne ``(start, form, value)``.
|
| 159 |
-
"""
|
| 160 |
-
if not text:
|
| 161 |
-
return []
|
| 162 |
-
out: list[tuple[str, int]] = []
|
| 163 |
-
for _start, form, value in detect_roman_numerals(text, min_length=2):
|
| 164 |
-
if value is not None:
|
| 165 |
-
out.append((form, value))
|
| 166 |
-
return out
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
def _detect_foliations(text: str) -> list[tuple[str, str]]:
|
| 170 |
-
"""Foliotation. Retourne [(forme_complète, clé_normalisée)] où la
|
| 171 |
-
clé inclut le suffixe r/v normalisé (recto/verso).
|
| 172 |
-
"""
|
| 173 |
-
if not text:
|
| 174 |
-
return []
|
| 175 |
-
out: list[tuple[str, str]] = []
|
| 176 |
-
for m in _RE_FOLIATION.finditer(text):
|
| 177 |
-
full = m.group(0).strip()
|
| 178 |
-
nums = re.sub(r"\s+", "", m.group(1)) # ex : "12-15"
|
| 179 |
-
suffix = (m.group(2) or "").lower()
|
| 180 |
-
key = f"{nums}{suffix}"
|
| 181 |
-
out.append((full, key))
|
| 182 |
-
return out
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
def _detect_currencies(text: str) -> list[tuple[str, tuple[str, str]]]:
|
| 186 |
-
"""Montants. Clé = (montant_normalisé, unité_canonique).
|
| 187 |
-
|
| 188 |
-
L'unité canonique compresse les variantes (« livres » et
|
| 189 |
-
« livre » → « livre » ; « £ » reste « £ »).
|
| 190 |
-
"""
|
| 191 |
-
if not text:
|
| 192 |
-
return []
|
| 193 |
-
canon = {
|
| 194 |
-
"livre": "livre", "livres": "livre", "l.": "livre",
|
| 195 |
-
"sol": "sol", "sols": "sol", "s.": "sol",
|
| 196 |
-
"denier": "denier", "deniers": "denier", "d.": "denier",
|
| 197 |
-
"écu": "écu", "écus": "écu",
|
| 198 |
-
"florin": "florin", "florins": "florin",
|
| 199 |
-
"franc": "franc", "francs": "franc",
|
| 200 |
-
"£": "£", "€": "€", "₣": "₣",
|
| 201 |
-
}
|
| 202 |
-
out: list[tuple[str, tuple[str, str]]] = []
|
| 203 |
-
for m in _RE_CURRENCY.finditer(text):
|
| 204 |
-
amount = m.group(1).replace(",", ".")
|
| 205 |
-
unit_raw = m.group(2).lower()
|
| 206 |
-
unit = canon.get(unit_raw, unit_raw)
|
| 207 |
-
out.append((m.group(0), (amount, unit)))
|
| 208 |
-
return out
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
def _detect_regnal(text: str) -> list[tuple[str, int]]:
|
| 212 |
-
"""Années régnales. Retourne [(forme, valeur_int)] avec la
|
| 213 |
-
valeur extraite (romain → int ou arabe → int).
|
| 214 |
-
"""
|
| 215 |
-
if not text:
|
| 216 |
-
return []
|
| 217 |
-
out: list[tuple[str, int]] = []
|
| 218 |
-
for m in _RE_REGNAL.finditer(text):
|
| 219 |
-
numeral = m.group(1)
|
| 220 |
-
value: Optional[int]
|
| 221 |
-
if numeral.isdigit():
|
| 222 |
-
value = int(numeral)
|
| 223 |
-
else:
|
| 224 |
-
value = roman_to_int(numeral)
|
| 225 |
-
if value is not None:
|
| 226 |
-
out.append((m.group(0), value))
|
| 227 |
-
return out
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
_DETECTORS = {
|
| 231 |
-
"year": _detect_years,
|
| 232 |
-
"roman": _detect_romans_with_values,
|
| 233 |
-
"foliation": _detect_foliations,
|
| 234 |
-
"currency": _detect_currencies,
|
| 235 |
-
"regnal": _detect_regnal,
|
| 236 |
-
}
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 240 |
-
# Calcul principal
|
| 241 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
def _classify_per_category(
|
| 245 |
-
gt_items: list,
|
| 246 |
-
hyp_items: list,
|
| 247 |
-
*,
|
| 248 |
-
form_extractor,
|
| 249 |
-
value_extractor,
|
| 250 |
-
) -> dict:
|
| 251 |
-
"""Pour chaque item GT, le classe en strict_preserved /
|
| 252 |
-
value_preserved / lost.
|
| 253 |
-
|
| 254 |
-
Multiplicité respectée : un item hypothèse ne peut servir
|
| 255 |
-
qu'à un seul match (forme prioritaire sur valeur).
|
| 256 |
-
"""
|
| 257 |
-
hyp_used = [False] * len(hyp_items)
|
| 258 |
-
n_strict = 0
|
| 259 |
-
n_value = 0
|
| 260 |
-
lost: list[str] = []
|
| 261 |
-
# Première passe : matchs stricts (forme exacte)
|
| 262 |
-
matched: list[bool] = [False] * len(gt_items)
|
| 263 |
-
for gi, gt_item in enumerate(gt_items):
|
| 264 |
-
gt_form = form_extractor(gt_item)
|
| 265 |
-
for hi, hyp_item in enumerate(hyp_items):
|
| 266 |
-
if hyp_used[hi]:
|
| 267 |
-
continue
|
| 268 |
-
if form_extractor(hyp_item) == gt_form:
|
| 269 |
-
hyp_used[hi] = True
|
| 270 |
-
matched[gi] = True
|
| 271 |
-
n_strict += 1
|
| 272 |
-
break
|
| 273 |
-
# Deuxième passe : matchs sur valeur (forme différente)
|
| 274 |
-
for gi, gt_item in enumerate(gt_items):
|
| 275 |
-
if matched[gi]:
|
| 276 |
-
n_value += 1 # strict implique value
|
| 277 |
-
continue
|
| 278 |
-
gt_val = value_extractor(gt_item)
|
| 279 |
-
for hi, hyp_item in enumerate(hyp_items):
|
| 280 |
-
if hyp_used[hi]:
|
| 281 |
-
continue
|
| 282 |
-
if value_extractor(hyp_item) == gt_val:
|
| 283 |
-
hyp_used[hi] = True
|
| 284 |
-
matched[gi] = True
|
| 285 |
-
n_value += 1
|
| 286 |
-
break
|
| 287 |
-
if not matched[gi]:
|
| 288 |
-
lost.append(form_extractor(gt_item))
|
| 289 |
-
n_total = len(gt_items)
|
| 290 |
-
return {
|
| 291 |
-
"n_total": n_total,
|
| 292 |
-
"strict": n_strict,
|
| 293 |
-
"value": n_value,
|
| 294 |
-
"strict_score": n_strict / n_total if n_total else 0.0,
|
| 295 |
-
"value_score": n_value / n_total if n_total else 0.0,
|
| 296 |
-
"lost_items": lost,
|
| 297 |
-
}
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
def compute_numerical_sequence_metrics(
|
| 301 |
-
reference: Optional[str],
|
| 302 |
-
hypothesis: Optional[str],
|
| 303 |
-
) -> dict:
|
| 304 |
-
"""Calcule la précision sur séquences numériques.
|
| 305 |
-
|
| 306 |
-
Returns
|
| 307 |
-
-------
|
| 308 |
-
dict
|
| 309 |
-
Voir docstring du module. Si ``reference`` est vide
|
| 310 |
-
ou ne contient aucune séquence détectée, retourne
|
| 311 |
-
``{n_total: 0, ...}`` avec scores à 0 (pas None).
|
| 312 |
-
"""
|
| 313 |
-
ref = reference or ""
|
| 314 |
-
hyp = hypothesis or ""
|
| 315 |
-
|
| 316 |
-
# Spécifications par catégorie : (gt_items, hyp_items,
|
| 317 |
-
# extractor de forme, extractor de valeur).
|
| 318 |
-
specs: dict[str, dict] = {}
|
| 319 |
-
# year : (form="1789", value=1789)
|
| 320 |
-
specs["year"] = {
|
| 321 |
-
"gt": _detect_years(ref),
|
| 322 |
-
"hyp": _detect_years(hyp),
|
| 323 |
-
"form": lambda it: it[0],
|
| 324 |
-
"value": lambda it: it[1],
|
| 325 |
-
}
|
| 326 |
-
# roman : (form="MDCLXVIII", value=1668)
|
| 327 |
-
specs["roman"] = {
|
| 328 |
-
"gt": _detect_romans_with_values(ref),
|
| 329 |
-
"hyp": _detect_romans_with_values(hyp),
|
| 330 |
-
"form": lambda it: it[0],
|
| 331 |
-
"value": lambda it: it[1],
|
| 332 |
-
}
|
| 333 |
-
# foliation : (form="f. 12r", value="12r")
|
| 334 |
-
specs["foliation"] = {
|
| 335 |
-
"gt": _detect_foliations(ref),
|
| 336 |
-
"hyp": _detect_foliations(hyp),
|
| 337 |
-
"form": lambda it: it[0],
|
| 338 |
-
"value": lambda it: it[1],
|
| 339 |
-
}
|
| 340 |
-
# currency : (form="12 livres", value=("12", "livre"))
|
| 341 |
-
specs["currency"] = {
|
| 342 |
-
"gt": _detect_currencies(ref),
|
| 343 |
-
"hyp": _detect_currencies(hyp),
|
| 344 |
-
"form": lambda it: it[0],
|
| 345 |
-
"value": lambda it: it[1],
|
| 346 |
-
}
|
| 347 |
-
# regnal : (form="an III", value=3)
|
| 348 |
-
specs["regnal"] = {
|
| 349 |
-
"gt": _detect_regnal(ref),
|
| 350 |
-
"hyp": _detect_regnal(hyp),
|
| 351 |
-
"form": lambda it: it[0],
|
| 352 |
-
"value": lambda it: it[1],
|
| 353 |
-
}
|
| 354 |
-
|
| 355 |
-
per_category: dict[str, dict] = {}
|
| 356 |
-
total = 0
|
| 357 |
-
total_strict = 0
|
| 358 |
-
total_value = 0
|
| 359 |
-
for cat, spec in specs.items():
|
| 360 |
-
breakdown = _classify_per_category(
|
| 361 |
-
spec["gt"], spec["hyp"],
|
| 362 |
-
form_extractor=spec["form"],
|
| 363 |
-
value_extractor=spec["value"],
|
| 364 |
-
)
|
| 365 |
-
per_category[cat] = breakdown
|
| 366 |
-
total += breakdown["n_total"]
|
| 367 |
-
total_strict += breakdown["strict"]
|
| 368 |
-
total_value += breakdown["value"]
|
| 369 |
-
|
| 370 |
-
return {
|
| 371 |
-
"n_total": total,
|
| 372 |
-
"global_strict_score": (
|
| 373 |
-
total_strict / total if total else 0.0
|
| 374 |
-
),
|
| 375 |
-
"global_value_score": (
|
| 376 |
-
total_value / total if total else 0.0
|
| 377 |
-
),
|
| 378 |
-
"per_category": per_category,
|
| 379 |
-
}
|
| 380 |
-
|
| 381 |
-
|
| 382 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 383 |
-
# Enregistrement registre typé
|
| 384 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 385 |
-
|
| 386 |
-
|
| 387 |
-
@register_metric(
|
| 388 |
-
name="numerical_sequence_strict_score",
|
| 389 |
-
input_types=(ArtifactType.TEXT, ArtifactType.TEXT),
|
| 390 |
-
description=(
|
| 391 |
-
"Précision sur séquences numériques en mode strict (forme "
|
| 392 |
-
"préservée). Couvre années arabes, numéraux romains, "
|
| 393 |
-
"foliotation, montants Ancien Régime, années régnales."
|
| 394 |
-
),
|
| 395 |
-
)
|
| 396 |
-
def numerical_sequence_strict_score(reference: str, hypothesis: str) -> float:
|
| 397 |
-
return compute_numerical_sequence_metrics(
|
| 398 |
-
reference, hypothesis,
|
| 399 |
-
)["global_strict_score"]
|
| 400 |
-
|
| 401 |
-
|
| 402 |
-
@register_metric(
|
| 403 |
-
name="numerical_sequence_value_score",
|
| 404 |
-
input_types=(ArtifactType.TEXT, ArtifactType.TEXT),
|
| 405 |
-
description=(
|
| 406 |
-
"Précision sur séquences numériques en mode valeur "
|
| 407 |
-
"(la valeur est préservée même si la forme diffère, "
|
| 408 |
-
"ex. XIV → 14)."
|
| 409 |
-
),
|
| 410 |
-
)
|
| 411 |
-
def numerical_sequence_value_score(reference: str, hypothesis: str) -> float:
|
| 412 |
-
return compute_numerical_sequence_metrics(
|
| 413 |
-
reference, hypothesis,
|
| 414 |
-
)["global_value_score"]
|
| 415 |
-
|
| 416 |
|
| 417 |
-
|
| 418 |
-
|
| 419 |
-
"
|
| 420 |
-
|
| 421 |
-
"numerical_sequence_value_score",
|
| 422 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.numerical_sequences`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.numerical_sequences import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.numerical_sequences import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.numerical_sequences as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
@@ -1,102 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
Adaptive masking
|
| 10 |
-
----------------
|
| 11 |
-
On ne stocke le résultat que si la GT contient au moins une
|
| 12 |
-
séquence numérique détectée — sinon le module n'apparaît pas
|
| 13 |
-
dans le rapport.
|
| 14 |
"""
|
| 15 |
|
| 16 |
-
from
|
| 17 |
-
|
| 18 |
-
import logging
|
| 19 |
-
from typing import Iterable, Optional
|
| 20 |
-
|
| 21 |
-
from picarones.core.numerical_sequences import (
|
| 22 |
-
CATEGORIES,
|
| 23 |
-
compute_numerical_sequence_metrics,
|
| 24 |
-
)
|
| 25 |
-
|
| 26 |
-
logger = logging.getLogger(__name__)
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
def compute_numerical_sequence_metrics_adaptive(
|
| 30 |
-
reference: Optional[str],
|
| 31 |
-
hypothesis: Optional[str],
|
| 32 |
-
) -> Optional[dict]:
|
| 33 |
-
"""Calcule les métriques séquences numériques avec masquage
|
| 34 |
-
adaptatif : retourne ``None`` si la GT n'en contient
|
| 35 |
-
aucune."""
|
| 36 |
-
if not reference:
|
| 37 |
-
return None
|
| 38 |
-
result = compute_numerical_sequence_metrics(reference, hypothesis or "")
|
| 39 |
-
if (result.get("n_total") or 0) == 0:
|
| 40 |
-
return None
|
| 41 |
-
return result
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
def aggregate_numerical_sequence_metrics(
|
| 45 |
-
per_doc: Iterable[Optional[dict]],
|
| 46 |
-
) -> Optional[dict]:
|
| 47 |
-
"""Agrège par moteur : somme les compteurs par catégorie et
|
| 48 |
-
recalcule les scores globaux et per-category.
|
| 49 |
-
|
| 50 |
-
Format de sortie identique à ``compute_numerical_sequence_metrics``
|
| 51 |
-
pour faciliter le rendu HTML symétrique.
|
| 52 |
-
"""
|
| 53 |
-
docs = [d for d in per_doc if d]
|
| 54 |
-
if not docs:
|
| 55 |
-
return None
|
| 56 |
-
total_n = 0
|
| 57 |
-
total_strict = 0
|
| 58 |
-
total_value = 0
|
| 59 |
-
per_cat: dict[str, dict] = {}
|
| 60 |
-
for cat in CATEGORIES:
|
| 61 |
-
per_cat[cat] = {
|
| 62 |
-
"n_total": 0,
|
| 63 |
-
"strict": 0,
|
| 64 |
-
"value": 0,
|
| 65 |
-
"lost_items": [],
|
| 66 |
-
}
|
| 67 |
-
for d in docs:
|
| 68 |
-
for cat in CATEGORIES:
|
| 69 |
-
cat_data = (d.get("per_category") or {}).get(cat) or {}
|
| 70 |
-
per_cat[cat]["n_total"] += int(cat_data.get("n_total") or 0)
|
| 71 |
-
per_cat[cat]["strict"] += int(cat_data.get("strict") or 0)
|
| 72 |
-
per_cat[cat]["value"] += int(cat_data.get("value") or 0)
|
| 73 |
-
per_cat[cat]["lost_items"].extend(
|
| 74 |
-
cat_data.get("lost_items") or [],
|
| 75 |
-
)
|
| 76 |
-
total_n += int(d.get("n_total") or 0)
|
| 77 |
-
# Recalcul des scores
|
| 78 |
-
for cat, slot in per_cat.items():
|
| 79 |
-
n = slot["n_total"]
|
| 80 |
-
slot["strict_score"] = slot["strict"] / n if n else 0.0
|
| 81 |
-
slot["value_score"] = slot["value"] / n if n else 0.0
|
| 82 |
-
# Cap des lost_items à 50 par catégorie
|
| 83 |
-
slot["lost_items"] = slot["lost_items"][:50]
|
| 84 |
-
total_strict += slot["strict"]
|
| 85 |
-
total_value += slot["value"]
|
| 86 |
-
return {
|
| 87 |
-
"n_docs": len(docs),
|
| 88 |
-
"n_total": total_n,
|
| 89 |
-
"global_strict_score": (
|
| 90 |
-
total_strict / total_n if total_n else 0.0
|
| 91 |
-
),
|
| 92 |
-
"global_value_score": (
|
| 93 |
-
total_value / total_n if total_n else 0.0
|
| 94 |
-
),
|
| 95 |
-
"per_category": per_cat,
|
| 96 |
-
}
|
| 97 |
-
|
| 98 |
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
"
|
| 102 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.numerical_sequences_runner`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.numerical_sequences_runner import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.numerical_sequences_runner import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.numerical_sequences_runner as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,309 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
- Coût exprimé par **1 000 pages** traitées.
|
| 11 |
-
- Coût local = temps moyen d'inférence × taux horaire (paramétrable).
|
| 12 |
-
- Empreinte carbone optionnelle : kWh × intensité g CO₂/kWh du réseau
|
| 13 |
-
d'exécution (mix France bas carbone par défaut pour le local,
|
| 14 |
-
moyenne cloud hyperscaler pour les APIs).
|
| 15 |
"""
|
| 16 |
|
| 17 |
-
from
|
| 18 |
|
| 19 |
-
import
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
import yaml
|
| 25 |
-
|
| 26 |
-
logger = logging.getLogger(__name__)
|
| 27 |
-
|
| 28 |
-
_DEFAULT_PRICING_PATH = Path(__file__).parent.parent / "data" / "pricing.yaml"
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
@dataclass(frozen=True)
|
| 32 |
-
class PricingDefaults:
|
| 33 |
-
"""Valeurs par défaut du fichier de prix (section ``meta``)."""
|
| 34 |
-
|
| 35 |
-
last_updated: Optional[str] = None
|
| 36 |
-
currency: str = "EUR"
|
| 37 |
-
hourly_rate_local_cpu_eur: float = 0.08
|
| 38 |
-
hourly_rate_local_gpu_eur: float = 1.20
|
| 39 |
-
grid_intensity_local: float = 58.0
|
| 40 |
-
grid_intensity_cloud: float = 380.0
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
@dataclass
|
| 44 |
-
class EngineCost:
|
| 45 |
-
"""Coût estimé d'un moteur sur 1 000 pages, avec traçabilité des hypothèses.
|
| 46 |
-
|
| 47 |
-
La représentation est immuable après construction : une fois que l'utilisateur
|
| 48 |
-
a choisi un taux horaire local, toutes les instances partagent cette
|
| 49 |
-
hypothèse par injection explicite dans ``build_costs_for_benchmark``.
|
| 50 |
-
"""
|
| 51 |
-
|
| 52 |
-
engine_key: str
|
| 53 |
-
"""Nom ou modèle servant de clé dans la table (ex. ``"gpt-4o"``, ``"tesseract"``)."""
|
| 54 |
-
|
| 55 |
-
type: str # "local" | "cloud_api" | "unknown"
|
| 56 |
-
|
| 57 |
-
cost_per_1k_pages_eur: Optional[float] = None
|
| 58 |
-
"""Coût par 1 000 pages en euros. ``None`` si les données sont insuffisantes."""
|
| 59 |
-
|
| 60 |
-
currency: str = "EUR"
|
| 61 |
-
|
| 62 |
-
# Source / date
|
| 63 |
-
pricing_source_url: Optional[str] = None
|
| 64 |
-
pricing_date: Optional[str] = None
|
| 65 |
-
|
| 66 |
-
# Pour les APIs cloud : prix brut
|
| 67 |
-
api_price_per_1k_pages: Optional[float] = None
|
| 68 |
-
|
| 69 |
-
# Pour le local : temps d'inférence et taux horaire utilisés
|
| 70 |
-
local_mean_seconds_per_page: Optional[float] = None
|
| 71 |
-
hourly_rate_eur: Optional[float] = None
|
| 72 |
-
|
| 73 |
-
# Empreinte carbone (estimation — étiquetée "expérimentale" dans le rapport)
|
| 74 |
-
kwh_per_1k_pages: Optional[float] = None
|
| 75 |
-
grid_intensity_g_co2_per_kwh: Optional[float] = None
|
| 76 |
-
co2_per_1k_pages_g: Optional[float] = None
|
| 77 |
-
|
| 78 |
-
notes: Optional[str] = None
|
| 79 |
-
|
| 80 |
-
assumptions: list[str] = field(default_factory=list)
|
| 81 |
-
"""Liste d'hypothèses textuelles à afficher sous le graphique."""
|
| 82 |
-
|
| 83 |
-
def as_dict(self) -> dict:
|
| 84 |
-
return {
|
| 85 |
-
"engine_key": self.engine_key,
|
| 86 |
-
"type": self.type,
|
| 87 |
-
"cost_per_1k_pages_eur": self.cost_per_1k_pages_eur,
|
| 88 |
-
"currency": self.currency,
|
| 89 |
-
"pricing_source_url": self.pricing_source_url,
|
| 90 |
-
"pricing_date": self.pricing_date,
|
| 91 |
-
"api_price_per_1k_pages": self.api_price_per_1k_pages,
|
| 92 |
-
"local_mean_seconds_per_page": self.local_mean_seconds_per_page,
|
| 93 |
-
"hourly_rate_eur": self.hourly_rate_eur,
|
| 94 |
-
"kwh_per_1k_pages": self.kwh_per_1k_pages,
|
| 95 |
-
"grid_intensity_g_co2_per_kwh": self.grid_intensity_g_co2_per_kwh,
|
| 96 |
-
"co2_per_1k_pages_g": self.co2_per_1k_pages_g,
|
| 97 |
-
"notes": self.notes,
|
| 98 |
-
"assumptions": list(self.assumptions),
|
| 99 |
-
}
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
def load_pricing_database(path: Optional[Path] = None) -> tuple[PricingDefaults, dict]:
|
| 103 |
-
"""Charge la table de prix YAML.
|
| 104 |
-
|
| 105 |
-
Retourne ``(defaults, engines_table)`` où ``engines_table`` est un dict
|
| 106 |
-
``{engine_key: raw_entry}``.
|
| 107 |
-
"""
|
| 108 |
-
path = Path(path) if path else _DEFAULT_PRICING_PATH
|
| 109 |
-
if not path.exists():
|
| 110 |
-
logger.warning("[pricing] fichier %s introuvable", path)
|
| 111 |
-
return PricingDefaults(), {}
|
| 112 |
-
try:
|
| 113 |
-
with path.open(encoding="utf-8") as fh:
|
| 114 |
-
data = yaml.safe_load(fh) or {}
|
| 115 |
-
except yaml.YAMLError as e:
|
| 116 |
-
logger.warning("[pricing] échec parsing %s : %s", path, e)
|
| 117 |
-
return PricingDefaults(), {}
|
| 118 |
-
|
| 119 |
-
meta = data.get("meta", {}) or {}
|
| 120 |
-
defaults = PricingDefaults(
|
| 121 |
-
last_updated=meta.get("last_updated"),
|
| 122 |
-
currency=meta.get("currency", "EUR"),
|
| 123 |
-
hourly_rate_local_cpu_eur=float(meta.get("default_hourly_rate_local_cpu_eur", 0.08)),
|
| 124 |
-
hourly_rate_local_gpu_eur=float(meta.get("default_hourly_rate_local_gpu_eur", 1.20)),
|
| 125 |
-
grid_intensity_local=float(meta.get("default_grid_intensity_g_co2_per_kwh", 58.0)),
|
| 126 |
-
grid_intensity_cloud=float(meta.get("cloud_grid_intensity_g_co2_per_kwh", 380.0)),
|
| 127 |
-
)
|
| 128 |
-
engines_table = data.get("engines", {}) or {}
|
| 129 |
-
return defaults, engines_table
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
def _match_key(engine_name: str, llm_model: Optional[str], table: dict) -> Optional[str]:
|
| 133 |
-
"""Cherche la meilleure clé pour ce moteur dans la table.
|
| 134 |
-
|
| 135 |
-
Stratégie : d'abord le nom du modèle LLM (pour les pipelines), puis le
|
| 136 |
-
nom OCR, puis un match partiel (substring) comme filet de sécurité.
|
| 137 |
-
"""
|
| 138 |
-
candidates = [llm_model, engine_name]
|
| 139 |
-
for c in candidates:
|
| 140 |
-
if c and c in table:
|
| 141 |
-
return c
|
| 142 |
-
# Matching partiel — utile pour "tesseract → gpt-4o" ou "gpt-4o-vision"
|
| 143 |
-
for c in candidates:
|
| 144 |
-
if not c:
|
| 145 |
-
continue
|
| 146 |
-
for key in table:
|
| 147 |
-
if key in c:
|
| 148 |
-
return key
|
| 149 |
-
return None
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
def estimate_cost(
|
| 153 |
-
engine_name: str,
|
| 154 |
-
*,
|
| 155 |
-
llm_model: Optional[str] = None,
|
| 156 |
-
is_pipeline: bool = False,
|
| 157 |
-
measured_seconds_per_page: Optional[float] = None,
|
| 158 |
-
table: Optional[dict] = None,
|
| 159 |
-
defaults: Optional[PricingDefaults] = None,
|
| 160 |
-
hourly_rate_override_eur: Optional[float] = None,
|
| 161 |
-
) -> EngineCost:
|
| 162 |
-
"""Calcule le ``EngineCost`` pour un moteur donné.
|
| 163 |
-
|
| 164 |
-
Parameters
|
| 165 |
-
----------
|
| 166 |
-
engine_name:
|
| 167 |
-
Nom public du moteur (ex. ``"tesseract"``, ``"tesseract → gpt-4o"``).
|
| 168 |
-
llm_model:
|
| 169 |
-
Si pipeline OCR+LLM, le modèle LLM utilisé — prioritaire pour la
|
| 170 |
-
lookup car c'est lui qui domine le coût.
|
| 171 |
-
is_pipeline:
|
| 172 |
-
Indique un pipeline OCR+LLM (change la sémantique de lookup).
|
| 173 |
-
measured_seconds_per_page:
|
| 174 |
-
Temps moyen observé sur le benchmark courant. Remplace la valeur
|
| 175 |
-
indicative de la table si fournie (plus fiable).
|
| 176 |
-
table, defaults:
|
| 177 |
-
Overrides pour tests ou usage institutionnel.
|
| 178 |
-
hourly_rate_override_eur:
|
| 179 |
-
Taux horaire à utiliser pour le calcul local (sinon valeur table
|
| 180 |
-
ou défaut).
|
| 181 |
-
"""
|
| 182 |
-
if table is None or defaults is None:
|
| 183 |
-
_defaults, _table = load_pricing_database()
|
| 184 |
-
defaults = defaults or _defaults
|
| 185 |
-
table = table or _table
|
| 186 |
-
|
| 187 |
-
key = _match_key(engine_name, llm_model if is_pipeline else None, table)
|
| 188 |
-
if key is None:
|
| 189 |
-
return EngineCost(
|
| 190 |
-
engine_key=engine_name,
|
| 191 |
-
type="unknown",
|
| 192 |
-
assumptions=["Aucune entrée dans la table de prix pour ce moteur."],
|
| 193 |
-
)
|
| 194 |
-
|
| 195 |
-
entry = table[key]
|
| 196 |
-
etype = str(entry.get("type", "unknown"))
|
| 197 |
-
notes = entry.get("notes")
|
| 198 |
-
assumptions: list[str] = []
|
| 199 |
-
currency = defaults.currency
|
| 200 |
-
|
| 201 |
-
cost_eur: Optional[float] = None
|
| 202 |
-
api_price: Optional[float] = None
|
| 203 |
-
local_seconds = measured_seconds_per_page
|
| 204 |
-
hourly_rate = None
|
| 205 |
-
|
| 206 |
-
if etype == "cloud_api":
|
| 207 |
-
api_price = entry.get("api_price_per_1k_pages")
|
| 208 |
-
if api_price is not None:
|
| 209 |
-
cost_eur = float(api_price)
|
| 210 |
-
assumptions.append(
|
| 211 |
-
f"Prix API indicatif : {cost_eur:.2f} €/1000 pages "
|
| 212 |
-
f"(source : {entry.get('pricing_source_url', '—')}, {entry.get('pricing_date', 'date inconnue')})."
|
| 213 |
-
)
|
| 214 |
-
elif etype == "local":
|
| 215 |
-
indicative_seconds = entry.get("local_mean_seconds_per_page")
|
| 216 |
-
if local_seconds is None and indicative_seconds is not None:
|
| 217 |
-
local_seconds = float(indicative_seconds)
|
| 218 |
-
assumptions.append(
|
| 219 |
-
f"Temps d'inférence indicatif : {local_seconds:.1f} s/page (non mesuré sur ce benchmark)."
|
| 220 |
-
)
|
| 221 |
-
elif local_seconds is not None:
|
| 222 |
-
assumptions.append(
|
| 223 |
-
f"Temps d'inférence mesuré : {local_seconds:.1f} s/page (moyenne sur le corpus)."
|
| 224 |
-
)
|
| 225 |
-
|
| 226 |
-
hourly_rate = (
|
| 227 |
-
hourly_rate_override_eur
|
| 228 |
-
if hourly_rate_override_eur is not None
|
| 229 |
-
else entry.get("hourly_rate_override_eur")
|
| 230 |
-
)
|
| 231 |
-
if hourly_rate is None:
|
| 232 |
-
# Heuristique : si l'entrée précise un override GPU, sinon CPU
|
| 233 |
-
hourly_rate = (
|
| 234 |
-
defaults.hourly_rate_local_gpu_eur
|
| 235 |
-
if "gpu" in str(notes or "").lower()
|
| 236 |
-
else defaults.hourly_rate_local_cpu_eur
|
| 237 |
-
)
|
| 238 |
-
hourly_rate = float(hourly_rate)
|
| 239 |
-
|
| 240 |
-
if local_seconds is not None and hourly_rate is not None:
|
| 241 |
-
cost_eur = (local_seconds / 3600.0) * hourly_rate * 1000.0
|
| 242 |
-
assumptions.append(
|
| 243 |
-
f"Taux horaire appliqué : {hourly_rate:.2f} €/h "
|
| 244 |
-
f"(défaut {'GPU' if hourly_rate >= 0.5 else 'CPU'})."
|
| 245 |
-
)
|
| 246 |
-
|
| 247 |
-
# Empreinte carbone optionnelle
|
| 248 |
-
kwh_1k = entry.get("kwh_per_1k_pages")
|
| 249 |
-
grid = (
|
| 250 |
-
entry.get("grid_intensity_g_co2_per_kwh")
|
| 251 |
-
or (defaults.grid_intensity_cloud if etype == "cloud_api" else defaults.grid_intensity_local)
|
| 252 |
-
)
|
| 253 |
-
co2_g = None
|
| 254 |
-
if kwh_1k is not None and grid is not None:
|
| 255 |
-
co2_g = float(kwh_1k) * float(grid)
|
| 256 |
-
|
| 257 |
-
return EngineCost(
|
| 258 |
-
engine_key=key,
|
| 259 |
-
type=etype,
|
| 260 |
-
cost_per_1k_pages_eur=cost_eur,
|
| 261 |
-
currency=currency,
|
| 262 |
-
pricing_source_url=entry.get("pricing_source_url"),
|
| 263 |
-
pricing_date=entry.get("pricing_date"),
|
| 264 |
-
api_price_per_1k_pages=api_price,
|
| 265 |
-
local_mean_seconds_per_page=local_seconds,
|
| 266 |
-
hourly_rate_eur=hourly_rate,
|
| 267 |
-
kwh_per_1k_pages=float(kwh_1k) if kwh_1k is not None else None,
|
| 268 |
-
grid_intensity_g_co2_per_kwh=float(grid) if grid is not None else None,
|
| 269 |
-
co2_per_1k_pages_g=co2_g,
|
| 270 |
-
notes=notes,
|
| 271 |
-
assumptions=assumptions,
|
| 272 |
-
)
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
def build_costs_for_benchmark(
|
| 276 |
-
engines_summary: list[dict],
|
| 277 |
-
durations_by_engine: dict[str, float],
|
| 278 |
-
*,
|
| 279 |
-
hourly_rate_local_eur: Optional[float] = None,
|
| 280 |
-
pricing_path: Optional[Path] = None,
|
| 281 |
-
) -> dict[str, dict]:
|
| 282 |
-
"""Calcule le coût de chaque moteur d'un benchmark.
|
| 283 |
-
|
| 284 |
-
Returns
|
| 285 |
-
-------
|
| 286 |
-
dict ``{engine_name: EngineCost.as_dict()}``.
|
| 287 |
-
"""
|
| 288 |
-
defaults, table = load_pricing_database(pricing_path)
|
| 289 |
-
out: dict[str, dict] = {}
|
| 290 |
-
for e in engines_summary:
|
| 291 |
-
name = e.get("name")
|
| 292 |
-
if not name:
|
| 293 |
-
continue
|
| 294 |
-
measured = durations_by_engine.get(name)
|
| 295 |
-
llm_model = None
|
| 296 |
-
pipeline_info = e.get("pipeline_info") or {}
|
| 297 |
-
if pipeline_info:
|
| 298 |
-
llm_model = pipeline_info.get("llm_model")
|
| 299 |
-
cost = estimate_cost(
|
| 300 |
-
engine_name=name,
|
| 301 |
-
llm_model=llm_model,
|
| 302 |
-
is_pipeline=bool(e.get("is_pipeline")),
|
| 303 |
-
measured_seconds_per_page=measured,
|
| 304 |
-
table=table,
|
| 305 |
-
defaults=defaults,
|
| 306 |
-
hourly_rate_override_eur=hourly_rate_local_eur,
|
| 307 |
-
)
|
| 308 |
-
out[name] = cost.as_dict()
|
| 309 |
-
return out
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.pricing`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.pricing import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.pricing import * # noqa: F401, F403
|
| 15 |
|
| 16 |
+
import picarones.measurements.pricing as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,254 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
récurrentes mais pas dominantes. Pour un usage prosopographique
|
| 9 |
-
(indexation de noms, recherche généalogique), ce sont précisément
|
| 10 |
-
ces tokens-là qui comptent.
|
| 11 |
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
Hypothèse à valider expérimentalement
|
| 17 |
-
-------------------------------------
|
| 18 |
-
La conjecture du plan A.I.1 : *« cette métrique discrimine plus
|
| 19 |
-
les moteurs que le CER global »*. Si confirmée sur un corpus
|
| 20 |
-
patrimonial réel, elle gagne sa place dans le tableau de
|
| 21 |
-
classement principal — décision laissée au chercheur après
|
| 22 |
-
observation.
|
| 23 |
-
|
| 24 |
-
Stratégie de découpage
|
| 25 |
-
----------------------
|
| 26 |
-
Cohérente avec NER (38), Flesch (52), philologie (55-60) : couche
|
| 27 |
-
de calcul pure d'abord, sans intégration runner. La vue HTML
|
| 28 |
-
« worst lines / rare tokens manqués » suit dans un sprint dédié.
|
| 29 |
-
|
| 30 |
-
Pas d'enregistrement dans le registre typé Sprint 34
|
| 31 |
-
----------------------------------------------------
|
| 32 |
-
La métrique exige **trois entrées** (reference, hypothesis, set
|
| 33 |
-
des tokens rares) et le set des rares est calculé corpus-wide
|
| 34 |
-
(donc connu seulement après itération sur tout le corpus). La
|
| 35 |
-
signature ne rentre pas dans ``(TEXT, TEXT)``. L'utilisateur
|
| 36 |
-
appelle explicitement ``compute_rare_token_recall`` avec le set
|
| 37 |
-
qu'il a calculé.
|
| 38 |
"""
|
| 39 |
|
| 40 |
-
from
|
| 41 |
-
|
| 42 |
-
import logging
|
| 43 |
-
import re
|
| 44 |
-
from collections import Counter
|
| 45 |
-
from typing import Iterable, Optional
|
| 46 |
-
|
| 47 |
-
logger = logging.getLogger(__name__)
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 51 |
-
# Tokenisation Unicode-aware
|
| 52 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 53 |
-
|
| 54 |
-
# Token = séquence maximale de caractères de mot Unicode (\w en
|
| 55 |
-
# Python 3 utilise déjà la table Unicode), incluant l'apostrophe
|
| 56 |
-
# typographique '’' à l'intérieur (« l'an », « d’une ») et les
|
| 57 |
-
# tirets internes (« peut-être »). La ponctuation isolée et les
|
| 58 |
-
# espaces sont des séparateurs.
|
| 59 |
-
|
| 60 |
-
_TOKEN_RE = re.compile(
|
| 61 |
-
r"\w+(?:[’'\-]\w+)*",
|
| 62 |
-
flags=re.UNICODE,
|
| 63 |
-
)
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
def tokenize(text: Optional[str]) -> list[str]:
|
| 67 |
-
"""Tokenisation Unicode-aware.
|
| 68 |
-
|
| 69 |
-
Conserve les contractions (``l'an``, ``d’une``) et les mots
|
| 70 |
-
composés (``peut-être``, ``c'est-à-dire``) comme un seul token.
|
| 71 |
-
Casse préservée — l'utilisateur normalise lui-même via
|
| 72 |
-
``case_sensitive=False`` dans les fonctions aval s'il le veut.
|
| 73 |
-
"""
|
| 74 |
-
if not text:
|
| 75 |
-
return []
|
| 76 |
-
return _TOKEN_RE.findall(text)
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 80 |
-
# Distribution de fréquence corpus-wide
|
| 81 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
def frequency_distribution(
|
| 85 |
-
documents: Iterable[str],
|
| 86 |
-
*,
|
| 87 |
-
case_sensitive: bool = False,
|
| 88 |
-
) -> Counter[str]:
|
| 89 |
-
"""Calcule ``{token: count}`` sur l'ensemble du corpus.
|
| 90 |
-
|
| 91 |
-
Parameters
|
| 92 |
-
----------
|
| 93 |
-
documents:
|
| 94 |
-
Itérable de textes (typiquement les ``ground_truth`` des
|
| 95 |
-
documents du corpus).
|
| 96 |
-
case_sensitive:
|
| 97 |
-
Si ``False`` (défaut), tous les tokens sont mis en
|
| 98 |
-
minuscule avant comptage.
|
| 99 |
-
"""
|
| 100 |
-
counter: Counter[str] = Counter()
|
| 101 |
-
for doc in documents:
|
| 102 |
-
tokens = tokenize(doc)
|
| 103 |
-
if not case_sensitive:
|
| 104 |
-
tokens = [t.lower() for t in tokens]
|
| 105 |
-
counter.update(tokens)
|
| 106 |
-
return counter
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
def extract_rare_tokens(
|
| 110 |
-
documents: Iterable[str],
|
| 111 |
-
*,
|
| 112 |
-
max_freq: int = 2,
|
| 113 |
-
case_sensitive: bool = False,
|
| 114 |
-
) -> frozenset[str]:
|
| 115 |
-
"""Retourne l'ensemble des tokens dont la fréquence
|
| 116 |
-
corpus-wide est ``≤ max_freq``.
|
| 117 |
-
|
| 118 |
-
Convention de lexicométrie : ``max_freq=1`` retourne uniquement
|
| 119 |
-
les hapax legomena (1 occurrence) ; ``max_freq=2`` retourne
|
| 120 |
-
hapax + dis legomena (≤ 2 occurrences) — défaut.
|
| 121 |
-
|
| 122 |
-
Les tokens qui n'apparaissent **jamais** dans le corpus ne sont
|
| 123 |
-
évidemment pas inclus (le ``Counter`` ne les liste pas).
|
| 124 |
-
"""
|
| 125 |
-
if max_freq < 1:
|
| 126 |
-
raise ValueError("max_freq doit être ≥ 1")
|
| 127 |
-
counter = frequency_distribution(
|
| 128 |
-
documents, case_sensitive=case_sensitive,
|
| 129 |
-
)
|
| 130 |
-
return frozenset(t for t, c in counter.items() if c <= max_freq)
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 134 |
-
# Calcul du rappel par document
|
| 135 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
def compute_rare_token_recall(
|
| 139 |
-
reference: Optional[str],
|
| 140 |
-
hypothesis: Optional[str],
|
| 141 |
-
rare_tokens: Iterable[str],
|
| 142 |
-
*,
|
| 143 |
-
case_sensitive: bool = False,
|
| 144 |
-
) -> dict:
|
| 145 |
-
"""Calcule le rappel sur les tokens rares présents dans la GT.
|
| 146 |
-
|
| 147 |
-
Parameters
|
| 148 |
-
----------
|
| 149 |
-
reference:
|
| 150 |
-
Texte GT du document.
|
| 151 |
-
hypothesis:
|
| 152 |
-
Texte produit par l'OCR.
|
| 153 |
-
rare_tokens:
|
| 154 |
-
Itérable des tokens rares — typiquement le résultat de
|
| 155 |
-
``extract_rare_tokens`` sur le corpus complet.
|
| 156 |
-
case_sensitive:
|
| 157 |
-
Si ``False`` (défaut), la comparaison se fait sur les
|
| 158 |
-
formes minuscules.
|
| 159 |
-
|
| 160 |
-
Returns
|
| 161 |
-
-------
|
| 162 |
-
dict
|
| 163 |
-
``{
|
| 164 |
-
"n_rare_tokens_in_reference": int,
|
| 165 |
-
# nombre d'**occurrences** de tokens rares dans la GT
|
| 166 |
-
# (multiplicité préservée — un token rare présent 2
|
| 167 |
-
# fois compte 2)
|
| 168 |
-
"n_rare_tokens_recalled": int,
|
| 169 |
-
# nombre d'occurrences correctement présentes dans hyp
|
| 170 |
-
# (alignement bag-of-tokens : min(count_ref, count_hyp))
|
| 171 |
-
"recall": float,
|
| 172 |
-
# ratio dans [0, 1], ou 0.0 si aucun rare en GT
|
| 173 |
-
"missed_tokens": list[str],
|
| 174 |
-
# liste des tokens rares **manqués** (avec multiplicité,
|
| 175 |
-
# ex. "Dupont" présent 2 fois en GT et 1 fois en hyp →
|
| 176 |
-
# missed_tokens contient ["Dupont"] une fois)
|
| 177 |
-
}``
|
| 178 |
-
|
| 179 |
-
Cas dégénérés
|
| 180 |
-
-------------
|
| 181 |
-
- GT vide ou aucun token rare présent → recall = 0.0, listes
|
| 182 |
-
vides (convention : on ne récompense pas l'absence de
|
| 183 |
-
tokens rares).
|
| 184 |
-
- Hyp vide avec rares en GT → tous manqués, recall = 0.0.
|
| 185 |
-
"""
|
| 186 |
-
ref = reference or ""
|
| 187 |
-
hyp = hypothesis or ""
|
| 188 |
-
|
| 189 |
-
if case_sensitive:
|
| 190 |
-
rare_set = frozenset(rare_tokens)
|
| 191 |
-
ref_tokens = tokenize(ref)
|
| 192 |
-
hyp_tokens = tokenize(hyp)
|
| 193 |
-
else:
|
| 194 |
-
rare_set = frozenset(t.lower() for t in rare_tokens)
|
| 195 |
-
ref_tokens = [t.lower() for t in tokenize(ref)]
|
| 196 |
-
hyp_tokens = [t.lower() for t in tokenize(hyp)]
|
| 197 |
-
|
| 198 |
-
# Multiplicité : on compte uniquement les rares présents dans la GT
|
| 199 |
-
ref_rare_counts: Counter[str] = Counter(
|
| 200 |
-
t for t in ref_tokens if t in rare_set
|
| 201 |
-
)
|
| 202 |
-
n_rare_in_ref = sum(ref_rare_counts.values())
|
| 203 |
-
if n_rare_in_ref == 0:
|
| 204 |
-
return {
|
| 205 |
-
"n_rare_tokens_in_reference": 0,
|
| 206 |
-
"n_rare_tokens_recalled": 0,
|
| 207 |
-
"recall": 0.0,
|
| 208 |
-
"missed_tokens": [],
|
| 209 |
-
}
|
| 210 |
-
|
| 211 |
-
# Bag-of-tokens dans hyp pour les tokens rares uniquement
|
| 212 |
-
hyp_rare_counts: Counter[str] = Counter(
|
| 213 |
-
t for t in hyp_tokens if t in rare_set
|
| 214 |
-
)
|
| 215 |
-
# Recall multiplicitaire : pour chaque token, min(ref_count, hyp_count)
|
| 216 |
-
n_recalled = 0
|
| 217 |
-
missed: list[str] = []
|
| 218 |
-
for token, ref_count in ref_rare_counts.items():
|
| 219 |
-
hyp_count = hyp_rare_counts.get(token, 0)
|
| 220 |
-
recalled = min(ref_count, hyp_count)
|
| 221 |
-
n_recalled += recalled
|
| 222 |
-
missed_count = ref_count - recalled
|
| 223 |
-
if missed_count > 0:
|
| 224 |
-
missed.extend([token] * missed_count)
|
| 225 |
-
|
| 226 |
-
return {
|
| 227 |
-
"n_rare_tokens_in_reference": n_rare_in_ref,
|
| 228 |
-
"n_rare_tokens_recalled": n_recalled,
|
| 229 |
-
"recall": n_recalled / n_rare_in_ref,
|
| 230 |
-
"missed_tokens": missed,
|
| 231 |
-
}
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
def rare_token_recall(
|
| 235 |
-
reference: Optional[str],
|
| 236 |
-
hypothesis: Optional[str],
|
| 237 |
-
rare_tokens: Iterable[str],
|
| 238 |
-
*,
|
| 239 |
-
case_sensitive: bool = False,
|
| 240 |
-
) -> float:
|
| 241 |
-
"""Raccourci : retourne uniquement le rappel ∈ [0, 1]."""
|
| 242 |
-
return compute_rare_token_recall(
|
| 243 |
-
reference, hypothesis, rare_tokens,
|
| 244 |
-
case_sensitive=case_sensitive,
|
| 245 |
-
)["recall"]
|
| 246 |
-
|
| 247 |
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
"
|
| 251 |
-
|
| 252 |
-
"compute_rare_token_recall",
|
| 253 |
-
"rare_token_recall",
|
| 254 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.rare_tokens`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.rare_tokens import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.rare_tokens import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.rare_tokens as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
@@ -1,252 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
historiques. Cette tendance à la modernisation est mesurable par la
|
| 10 |
-
différence de score de lisibilité entre la GT et la sortie OCR/LLM —
|
| 11 |
-
**indépendamment des classes taxonomiques** et **sans alignement
|
| 12 |
-
caractère/mot**. C'est l'avantage clé du score Flesch : il fonctionne
|
| 13 |
-
même quand l'OCR est très dégradé (cas d'un LLM qui invente du texte
|
| 14 |
-
moderne plausible mais déconnecté de la GT).
|
| 15 |
-
|
| 16 |
-
Stratégie de découpage
|
| 17 |
-
----------------------
|
| 18 |
-
Comme pour le NER (Sprint 38) et la calibration (Sprint 39), on
|
| 19 |
-
découpe :
|
| 20 |
-
|
| 21 |
-
- **Sprint 52** (ici) — couche de calcul pure : ``flesch_score`` et
|
| 22 |
-
``flesch_delta``. Aucune dépendance externe ; les heuristiques de
|
| 23 |
-
comptage de syllabes sont en pur Python, déterministes, testées.
|
| 24 |
-
- **Sprints suivants** — câblage runner pour calculer
|
| 25 |
-
``flesch_delta`` par document et l'agréger au moteur, puis vue HTML.
|
| 26 |
-
|
| 27 |
-
Formules
|
| 28 |
-
--------
|
| 29 |
-
- **Anglais** (Flesch original 1948) :
|
| 30 |
-
``206.835 - 1.015 × (mots/phrases) - 84.6 × (syllabes/mots)``
|
| 31 |
-
- **Français** (Kandel-Moles 1958) :
|
| 32 |
-
``207 - 1.015 × (mots/phrases) - 73.6 × (syllabes/mots)``
|
| 33 |
-
|
| 34 |
-
Le score est borné dans ``[0, 100]`` — 100 ↔ « très facile à lire »,
|
| 35 |
-
0 ↔ « très difficile ». Une **augmentation** du score quand on passe
|
| 36 |
-
de la GT à l'OCR signale une simplification (typique des LLM
|
| 37 |
-
modernisants). Une **chute** signale une dégradation OCR.
|
| 38 |
-
|
| 39 |
-
Limites documentées
|
| 40 |
-
-------------------
|
| 41 |
-
- Le comptage de syllabes est heuristique. En français, des règles
|
| 42 |
-
comme « -ier non final = 2 syllabes » ne sont pas appliquées
|
| 43 |
-
finement. Acceptable pour une métrique de **comparaison relative**
|
| 44 |
-
(delta GT vs OCR), pas pour publier une absolue.
|
| 45 |
-
- Sur des textes très courts (< 20 mots), la formule perd en
|
| 46 |
-
fiabilité. Le seuil minimal est documenté.
|
| 47 |
"""
|
| 48 |
|
| 49 |
-
from
|
| 50 |
-
|
| 51 |
-
import logging
|
| 52 |
-
import re
|
| 53 |
-
from typing import Literal
|
| 54 |
-
|
| 55 |
-
from picarones.core.metric_registry import register_metric
|
| 56 |
-
from picarones.core.modules import ArtifactType
|
| 57 |
-
|
| 58 |
-
logger = logging.getLogger(__name__)
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
Language = Literal["fr", "en"]
|
| 62 |
-
|
| 63 |
-
# Coefficients de la formule Flesch selon la langue.
|
| 64 |
-
_FLESCH_COEFFS: dict[str, tuple[float, float, float]] = {
|
| 65 |
-
"en": (206.835, 1.015, 84.6), # Flesch 1948
|
| 66 |
-
"fr": (207.0, 1.015, 73.6), # Kandel-Moles 1958
|
| 67 |
-
}
|
| 68 |
-
|
| 69 |
-
# Voyelles utilisées pour l'heuristique de comptage de syllabes.
|
| 70 |
-
# On utilise un set qui inclut les diacritiques courantes en FR/EN.
|
| 71 |
-
_VOWELS = set("aeiouyàâäéèêëîïôöùûüÿæœAEIOUYÀÂÄÉÈÊËÎÏÔÖÙÛÜŸÆŒ")
|
| 72 |
-
|
| 73 |
-
# Regex de découpage en phrases : ponctuation finale + espace ou fin.
|
| 74 |
-
# Tolère les multiples points (« ... ») et garde un découpage robuste.
|
| 75 |
-
_SENTENCE_SPLIT_RE = re.compile(r"[.!?…]+(?:\s+|$)")
|
| 76 |
-
|
| 77 |
-
# Regex de tokenisation simple (mots) : séquences de caractères "lettres".
|
| 78 |
-
_WORD_RE = re.compile(r"[\w'-]+", re.UNICODE)
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 82 |
-
# Compteurs de base
|
| 83 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
def count_words(text: str) -> int:
|
| 87 |
-
"""Nombre de mots (tokens alphanumériques) dans ``text``."""
|
| 88 |
-
if not text:
|
| 89 |
-
return 0
|
| 90 |
-
return len(_WORD_RE.findall(text))
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
def count_sentences(text: str) -> int:
|
| 94 |
-
"""Nombre de phrases dans ``text``.
|
| 95 |
-
|
| 96 |
-
Découpage par ponctuation finale (``.``, ``!``, ``?``, ``…``).
|
| 97 |
-
Renvoie au minimum 1 si ``text`` contient au moins un mot, pour
|
| 98 |
-
éviter une division par zéro dans la formule de Flesch sur les
|
| 99 |
-
textes sans ponctuation finale.
|
| 100 |
-
"""
|
| 101 |
-
if not text:
|
| 102 |
-
return 0
|
| 103 |
-
parts = [p for p in _SENTENCE_SPLIT_RE.split(text) if p.strip()]
|
| 104 |
-
n = len(parts)
|
| 105 |
-
if n == 0 and count_words(text) > 0:
|
| 106 |
-
return 1
|
| 107 |
-
return n
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
def count_syllables_word(word: str) -> int:
|
| 111 |
-
"""Heuristique de comptage de syllabes pour un mot isolé.
|
| 112 |
-
|
| 113 |
-
Règle : on compte les **groupes de voyelles consécutives** (en
|
| 114 |
-
incluant ``y`` et les diacritiques courantes). C'est une
|
| 115 |
-
approximation grossière mais déterministe et testable.
|
| 116 |
-
|
| 117 |
-
Cas limites :
|
| 118 |
-
- mot vide → 0
|
| 119 |
-
- mot sans voyelle → 1 (par convention, ex. acronymes ``BNF``)
|
| 120 |
-
- mot d'une seule voyelle isolée → 1
|
| 121 |
-
"""
|
| 122 |
-
if not word:
|
| 123 |
-
return 0
|
| 124 |
-
word = word.lower()
|
| 125 |
-
in_vowel_group = False
|
| 126 |
-
count = 0
|
| 127 |
-
for ch in word:
|
| 128 |
-
if ch in _VOWELS:
|
| 129 |
-
if not in_vowel_group:
|
| 130 |
-
count += 1
|
| 131 |
-
in_vowel_group = True
|
| 132 |
-
else:
|
| 133 |
-
in_vowel_group = False
|
| 134 |
-
return count or 1
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
def count_syllables(text: str) -> int:
|
| 138 |
-
"""Somme des syllabes de tous les mots de ``text``."""
|
| 139 |
-
if not text:
|
| 140 |
-
return 0
|
| 141 |
-
return sum(count_syllables_word(w) for w in _WORD_RE.findall(text))
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 145 |
-
# Score Flesch
|
| 146 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
def flesch_score(text: str, lang: Language = "fr") -> float:
|
| 150 |
-
"""Calcule le score de lisibilité Flesch pour ``text``.
|
| 151 |
-
|
| 152 |
-
Parameters
|
| 153 |
-
----------
|
| 154 |
-
text:
|
| 155 |
-
Texte à évaluer. Peut contenir ponctuation, accents, etc.
|
| 156 |
-
lang:
|
| 157 |
-
``"fr"`` (Kandel-Moles 1958, défaut) ou ``"en"`` (Flesch 1948).
|
| 158 |
-
|
| 159 |
-
Returns
|
| 160 |
-
-------
|
| 161 |
-
float
|
| 162 |
-
Score borné dans ``[0, 100]``. Renvoie ``0.0`` sur un texte
|
| 163 |
-
vide ou sans mot exploitable.
|
| 164 |
-
|
| 165 |
-
Notes
|
| 166 |
-
-----
|
| 167 |
-
Le score chute fortement avec :
|
| 168 |
-
- longues phrases (mots/phrases élevé)
|
| 169 |
-
- mots polysyllabiques (syllabes/mots élevé)
|
| 170 |
-
Une montée du score lors du passage GT → OCR signale qu'un LLM a
|
| 171 |
-
« lissé » la langue (phrases plus courtes, mots plus communs).
|
| 172 |
-
"""
|
| 173 |
-
if lang not in _FLESCH_COEFFS:
|
| 174 |
-
raise ValueError(f"Langue non supportée : {lang!r}. Choisir 'fr' ou 'en'.")
|
| 175 |
-
|
| 176 |
-
n_words = count_words(text)
|
| 177 |
-
if n_words == 0:
|
| 178 |
-
return 0.0
|
| 179 |
-
n_sentences = max(1, count_sentences(text))
|
| 180 |
-
n_syllables = count_syllables(text)
|
| 181 |
-
if n_syllables == 0:
|
| 182 |
-
return 0.0
|
| 183 |
-
|
| 184 |
-
base, k_words, k_syll = _FLESCH_COEFFS[lang]
|
| 185 |
-
raw = base - k_words * (n_words / n_sentences) - k_syll * (n_syllables / n_words)
|
| 186 |
-
return max(0.0, min(100.0, raw))
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
def flesch_delta(
|
| 190 |
-
reference: str,
|
| 191 |
-
hypothesis: str,
|
| 192 |
-
lang: Language = "fr",
|
| 193 |
-
) -> float:
|
| 194 |
-
"""Différence ``flesch_score(hypothesis) - flesch_score(reference)``.
|
| 195 |
-
|
| 196 |
-
Interprétation
|
| 197 |
-
--------------
|
| 198 |
-
- **Positif** : l'hypothèse OCR est plus lisible que la GT —
|
| 199 |
-
signal d'**over-normalisation** (typique des LLM qui modernisent
|
| 200 |
-
des textes anciens).
|
| 201 |
-
- **Négatif** : l'OCR est moins lisible — signal de dégradation
|
| 202 |
-
(caractères mal reconnus brisent la fluidité).
|
| 203 |
-
- **≈ 0** : OCR fidèle à la GT en termes de complexité linguistique.
|
| 204 |
-
|
| 205 |
-
Borné dans ``[-100, +100]``.
|
| 206 |
-
"""
|
| 207 |
-
return flesch_score(hypothesis, lang=lang) - flesch_score(reference, lang=lang)
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 211 |
-
# Enregistrement dans le registre typé (Sprint 34)
|
| 212 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
@register_metric(
|
| 216 |
-
name="flesch_delta_fr",
|
| 217 |
-
input_types=(ArtifactType.TEXT, ArtifactType.TEXT),
|
| 218 |
-
description=(
|
| 219 |
-
"Différence de score Flesch (Kandel-Moles, FR) entre la sortie "
|
| 220 |
-
"OCR et la GT. Positif = OCR plus lisible (signal "
|
| 221 |
-
"d'over-normalisation LLM). Aucun alignement requis."
|
| 222 |
-
),
|
| 223 |
-
higher_is_better=False, # un delta proche de 0 = fidélité ; positif = LLM lissant
|
| 224 |
-
tags={"text", "readability", "over_normalization"},
|
| 225 |
-
)
|
| 226 |
-
def _registered_flesch_delta_fr(reference: str, hypothesis: str) -> float:
|
| 227 |
-
return flesch_delta(reference, hypothesis, lang="fr")
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
@register_metric(
|
| 231 |
-
name="flesch_delta_en",
|
| 232 |
-
input_types=(ArtifactType.TEXT, ArtifactType.TEXT),
|
| 233 |
-
description=(
|
| 234 |
-
"Flesch reading ease delta (Flesch 1948, EN) between OCR and GT. "
|
| 235 |
-
"Positive = OCR easier to read than GT (LLM smoothing signal). "
|
| 236 |
-
"No alignment required."
|
| 237 |
-
),
|
| 238 |
-
higher_is_better=False,
|
| 239 |
-
tags={"text", "readability", "over_normalization"},
|
| 240 |
-
)
|
| 241 |
-
def _registered_flesch_delta_en(reference: str, hypothesis: str) -> float:
|
| 242 |
-
return flesch_delta(reference, hypothesis, lang="en")
|
| 243 |
-
|
| 244 |
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
"
|
| 248 |
-
|
| 249 |
-
"count_sentences",
|
| 250 |
-
"count_syllables",
|
| 251 |
-
"count_syllables_word",
|
| 252 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.readability`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.readability import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.readability import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.readability as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,114 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
GT et la sortie OCR. Un score positif signale une *over-
|
| 10 |
-
normalisation* typique des LLM/VLM qui modernisent un texte
|
| 11 |
-
ancien (le Flesch monte parce que les mots sont plus simples) ;
|
| 12 |
-
un score négatif signale une dégradation OCR brutale.
|
| 13 |
-
|
| 14 |
-
Cette métrique est calculée **automatiquement** par le runner
|
| 15 |
-
sur chaque document, agrégée par moteur, et présentée dans le
|
| 16 |
-
rapport.
|
| 17 |
-
|
| 18 |
-
Adaptive masking
|
| 19 |
-
----------------
|
| 20 |
-
On ne calcule que si la GT contient ≥ 5 mots — en dessous, le
|
| 21 |
-
Flesch est trop instable pour être informatif.
|
| 22 |
-
|
| 23 |
-
Langue
|
| 24 |
-
------
|
| 25 |
-
Lecture depuis ``corpus.metadata.get("language", "fr")``. Pour
|
| 26 |
-
les corpus mixtes, l'utilisateur peut passer une langue
|
| 27 |
-
explicite à l'orchestrateur.
|
| 28 |
"""
|
| 29 |
|
| 30 |
-
from
|
| 31 |
-
|
| 32 |
-
import logging
|
| 33 |
-
import statistics
|
| 34 |
-
from typing import Iterable, Optional
|
| 35 |
-
|
| 36 |
-
from picarones.core.readability import (
|
| 37 |
-
Language,
|
| 38 |
-
count_words,
|
| 39 |
-
flesch_delta,
|
| 40 |
-
flesch_score,
|
| 41 |
-
)
|
| 42 |
-
|
| 43 |
-
logger = logging.getLogger(__name__)
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
_MIN_WORDS_FOR_FLESCH = 5
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
def compute_readability_metrics(
|
| 50 |
-
reference: Optional[str],
|
| 51 |
-
hypothesis: Optional[str],
|
| 52 |
-
*,
|
| 53 |
-
lang: Language = "fr",
|
| 54 |
-
) -> Optional[dict]:
|
| 55 |
-
"""Calcule le delta Flesch d'un document avec adaptive masking.
|
| 56 |
-
|
| 57 |
-
Retourne ``None`` si la GT contient moins de
|
| 58 |
-
``_MIN_WORDS_FOR_FLESCH`` mots.
|
| 59 |
-
"""
|
| 60 |
-
ref = reference or ""
|
| 61 |
-
n_ref_words = count_words(ref)
|
| 62 |
-
if n_ref_words < _MIN_WORDS_FOR_FLESCH:
|
| 63 |
-
return None
|
| 64 |
-
hyp = hypothesis or ""
|
| 65 |
-
flesch_ref = flesch_score(ref, lang=lang)
|
| 66 |
-
flesch_hyp = flesch_score(hyp, lang=lang) if hyp else None
|
| 67 |
-
delta = (
|
| 68 |
-
flesch_delta(ref, hyp, lang=lang) if hyp else None
|
| 69 |
-
)
|
| 70 |
-
return {
|
| 71 |
-
"lang": lang,
|
| 72 |
-
"flesch_reference": flesch_ref,
|
| 73 |
-
"flesch_hypothesis": flesch_hyp,
|
| 74 |
-
"flesch_delta": delta,
|
| 75 |
-
"n_words_reference": n_ref_words,
|
| 76 |
-
}
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
def aggregate_readability_metrics(
|
| 80 |
-
per_doc: Iterable[Optional[dict]],
|
| 81 |
-
) -> Optional[dict]:
|
| 82 |
-
"""Agrège : moyenne/médiane des deltas + part de docs
|
| 83 |
-
« over-normalisés » (delta > +5 points).
|
| 84 |
-
"""
|
| 85 |
-
docs = [d for d in per_doc if d]
|
| 86 |
-
if not docs:
|
| 87 |
-
return None
|
| 88 |
-
deltas = [
|
| 89 |
-
float(d["flesch_delta"]) for d in docs
|
| 90 |
-
if isinstance(d.get("flesch_delta"), (int, float))
|
| 91 |
-
]
|
| 92 |
-
if not deltas:
|
| 93 |
-
return None
|
| 94 |
-
over_norm = sum(1 for d in deltas if d > 5.0)
|
| 95 |
-
under_norm = sum(1 for d in deltas if d < -5.0)
|
| 96 |
-
lang = docs[0].get("lang") or "fr"
|
| 97 |
-
return {
|
| 98 |
-
"lang": lang,
|
| 99 |
-
"n_docs": len(docs),
|
| 100 |
-
"n_docs_with_delta": len(deltas),
|
| 101 |
-
"delta_mean": statistics.fmean(deltas),
|
| 102 |
-
"delta_median": statistics.median(deltas),
|
| 103 |
-
"delta_min": min(deltas),
|
| 104 |
-
"delta_max": max(deltas),
|
| 105 |
-
"n_over_normalized": over_norm,
|
| 106 |
-
"n_under_normalized": under_norm,
|
| 107 |
-
"over_normalized_rate": over_norm / len(deltas),
|
| 108 |
-
}
|
| 109 |
-
|
| 110 |
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
"
|
| 114 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.readability_runner`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.readability_runner import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.readability_runner import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.readability_runner as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,196 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
paroissial complexe, le **classement des moteurs en CER** peut être
|
| 9 |
-
trompeur : un moteur peut avoir un excellent CER caractère et un
|
| 10 |
-
**ordre de lecture catastrophique**. Le résultat est inutilisable
|
| 11 |
-
pour la recherche plein texte (Elastic, Solr) ou pour reconstituer
|
| 12 |
-
une narration linéaire.
|
| 13 |
-
|
| 14 |
-
La métrique standard est définie par Antonacopoulos et al. dans
|
| 15 |
-
ICDAR 2015 — F1 sur les **paires d'ordre relatif** entre régions
|
| 16 |
-
ALTO/PAGE. Pour chaque paire ``(a, b)`` telle que ``a`` précède
|
| 17 |
-
``b`` dans la GT :
|
| 18 |
-
|
| 19 |
-
- **TP** si ``a`` précède aussi ``b`` dans l'hypothèse,
|
| 20 |
-
- **FN** si la paire est manquante (régions absentes ou ordre
|
| 21 |
-
inversé) côté hypothèse,
|
| 22 |
-
- **FP** si une paire ``(a, b)`` apparaît dans l'hypothèse alors que
|
| 23 |
-
la GT n'a pas cet ordre (régions hallucinées ou inversion).
|
| 24 |
-
|
| 25 |
-
Le F1 est la moyenne harmonique des deux.
|
| 26 |
-
|
| 27 |
-
Stratégie de découpage
|
| 28 |
-
----------------------
|
| 29 |
-
Cohérent avec NER (Sprint 38), calibration (Sprint 39), Flesch
|
| 30 |
-
(Sprint 52) : couche de calcul pure d'abord. L'utilisateur fournit
|
| 31 |
-
deux listes ordonnées d'IDs de régions (typiquement extraites de
|
| 32 |
-
ALTO/PAGE par un parser amont). Le câblage runner et la vue HTML
|
| 33 |
-
suivent dans des sprints dédiés.
|
| 34 |
-
|
| 35 |
-
Compatible directement avec ``ReadingOrderGT`` du Sprint 32 :
|
| 36 |
-
``ReadingOrderGT.region_order`` est exactement le format attendu.
|
| 37 |
-
|
| 38 |
-
Convention sur les régions
|
| 39 |
-
--------------------------
|
| 40 |
-
- Les IDs sont des chaînes (``"r_1"``, ``"region_main"``, etc.).
|
| 41 |
-
- Les **doublons** sont ignorés au calcul des paires ordonnées
|
| 42 |
-
(chaque ID compte une fois par séquence).
|
| 43 |
-
- Une région présente dans la GT mais absente de l'hypothèse
|
| 44 |
-
contribue aux paires FN.
|
| 45 |
-
- Une région présente dans l'hypothèse mais absente de la GT
|
| 46 |
-
contribue aux paires FP.
|
| 47 |
-
- Si une séquence a < 2 régions distinctes, aucune paire n'est
|
| 48 |
-
émise — le F1 retourne ``0.0`` ou ``1.0`` selon que les deux
|
| 49 |
-
séquences soient identiques.
|
| 50 |
"""
|
| 51 |
|
| 52 |
-
from
|
| 53 |
-
|
| 54 |
-
import logging
|
| 55 |
-
from itertools import combinations
|
| 56 |
-
from typing import Iterable
|
| 57 |
-
|
| 58 |
-
from picarones.core.metric_registry import register_metric
|
| 59 |
-
from picarones.core.modules import ArtifactType
|
| 60 |
-
|
| 61 |
-
logger = logging.getLogger(__name__)
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 65 |
-
# Helpers
|
| 66 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
def _ordered_pairs(sequence: list[str]) -> set[tuple[str, str]]:
|
| 70 |
-
"""Retourne l'ensemble des paires ``(a, b)`` telles que ``a``
|
| 71 |
-
précède strictement ``b`` dans ``sequence``.
|
| 72 |
-
|
| 73 |
-
Doublons : chaque ID est traité une seule fois (première occurrence
|
| 74 |
-
dans la séquence). Cohérent avec ICDAR 2015 où les régions ont
|
| 75 |
-
des IDs uniques.
|
| 76 |
-
"""
|
| 77 |
-
seen: list[str] = []
|
| 78 |
-
seen_set: set[str] = set()
|
| 79 |
-
for r in sequence:
|
| 80 |
-
if r not in seen_set:
|
| 81 |
-
seen.append(r)
|
| 82 |
-
seen_set.add(r)
|
| 83 |
-
return set(combinations(seen, 2))
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
def _normalize_input(value: Iterable[str] | None) -> list[str]:
|
| 87 |
-
"""Coerce une entrée en list[str], en filtrant les valeurs vides."""
|
| 88 |
-
if value is None:
|
| 89 |
-
return []
|
| 90 |
-
return [str(v) for v in value if v is not None and str(v).strip()]
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 94 |
-
# Métrique principale
|
| 95 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
def compute_reading_order_metrics(
|
| 99 |
-
reference_order: Iterable[str] | None,
|
| 100 |
-
hypothesis_order: Iterable[str] | None,
|
| 101 |
-
) -> dict:
|
| 102 |
-
"""Calcule precision / recall / F1 sur l'ordre relatif des régions.
|
| 103 |
-
|
| 104 |
-
Parameters
|
| 105 |
-
----------
|
| 106 |
-
reference_order:
|
| 107 |
-
Séquence ordonn��e d'IDs de régions issue de la GT (typiquement
|
| 108 |
-
``ReadingOrderGT.region_order`` du Sprint 32).
|
| 109 |
-
hypothesis_order:
|
| 110 |
-
Séquence ordonnée d'IDs de régions produite par un moteur
|
| 111 |
-
OCR/HTR ou un reconstructeur ALTO.
|
| 112 |
-
|
| 113 |
-
Returns
|
| 114 |
-
-------
|
| 115 |
-
dict
|
| 116 |
-
``{"precision", "recall", "f1", "true_positives",
|
| 117 |
-
"false_positives", "false_negatives", "n_ref_pairs",
|
| 118 |
-
"n_hyp_pairs", "common_regions", "ref_only_regions",
|
| 119 |
-
"hyp_only_regions"}``.
|
| 120 |
-
|
| 121 |
-
Comportements aux bornes
|
| 122 |
-
------------------------
|
| 123 |
-
- Deux séquences identiques (mêmes régions, même ordre) → F1 = 1.0.
|
| 124 |
-
- Ordre strictement inversé → F1 = 0.0 (toutes les paires
|
| 125 |
-
relatives sont fausses).
|
| 126 |
-
- Une séquence vide vs une séquence non vide → F1 = 0.0.
|
| 127 |
-
- Deux séquences vides → F1 = 0.0 et tous les compteurs à 0
|
| 128 |
-
(convention : on ne récompense pas l'absence).
|
| 129 |
-
"""
|
| 130 |
-
ref = _normalize_input(reference_order)
|
| 131 |
-
hyp = _normalize_input(hypothesis_order)
|
| 132 |
-
|
| 133 |
-
ref_pairs = _ordered_pairs(ref)
|
| 134 |
-
hyp_pairs = _ordered_pairs(hyp)
|
| 135 |
-
|
| 136 |
-
tp = len(ref_pairs & hyp_pairs)
|
| 137 |
-
fn = len(ref_pairs - hyp_pairs)
|
| 138 |
-
fp = len(hyp_pairs - ref_pairs)
|
| 139 |
-
|
| 140 |
-
precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
|
| 141 |
-
recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
|
| 142 |
-
f1 = (
|
| 143 |
-
2 * precision * recall / (precision + recall)
|
| 144 |
-
if (precision + recall) > 0
|
| 145 |
-
else 0.0
|
| 146 |
-
)
|
| 147 |
-
|
| 148 |
-
ref_set = set(ref)
|
| 149 |
-
hyp_set = set(hyp)
|
| 150 |
-
return {
|
| 151 |
-
"precision": precision,
|
| 152 |
-
"recall": recall,
|
| 153 |
-
"f1": f1,
|
| 154 |
-
"true_positives": tp,
|
| 155 |
-
"false_positives": fp,
|
| 156 |
-
"false_negatives": fn,
|
| 157 |
-
"n_ref_pairs": len(ref_pairs),
|
| 158 |
-
"n_hyp_pairs": len(hyp_pairs),
|
| 159 |
-
"common_regions": sorted(ref_set & hyp_set),
|
| 160 |
-
"ref_only_regions": sorted(ref_set - hyp_set),
|
| 161 |
-
"hyp_only_regions": sorted(hyp_set - ref_set),
|
| 162 |
-
}
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 166 |
-
# Enregistrement dans le registre typé (Sprint 34)
|
| 167 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
@register_metric(
|
| 171 |
-
name="reading_order_f1",
|
| 172 |
-
input_types=(ArtifactType.READING_ORDER, ArtifactType.READING_ORDER),
|
| 173 |
-
description=(
|
| 174 |
-
"F1 sur l'ordre relatif des régions ALTO/PAGE (ICDAR 2015, "
|
| 175 |
-
"Antonacopoulos). Pour chaque paire (a,b) où a précède b dans "
|
| 176 |
-
"la GT, vérifie que a précède aussi b dans l'hypothèse."
|
| 177 |
-
),
|
| 178 |
-
higher_is_better=True,
|
| 179 |
-
tags={"structure", "icdar", "alto", "page"},
|
| 180 |
-
)
|
| 181 |
-
def reading_order_f1(
|
| 182 |
-
reference: Iterable[str] | None,
|
| 183 |
-
hypothesis: Iterable[str] | None,
|
| 184 |
-
) -> float:
|
| 185 |
-
"""Raccourci : retourne uniquement le F1 global.
|
| 186 |
-
|
| 187 |
-
Pour les détails par paire (TP/FP/FN, régions communes, etc.),
|
| 188 |
-
appeler ``compute_reading_order_metrics`` directement.
|
| 189 |
-
"""
|
| 190 |
-
return compute_reading_order_metrics(reference, hypothesis)["f1"]
|
| 191 |
-
|
| 192 |
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
"
|
| 196 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.reading_order`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.reading_order import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.reading_order import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.reading_order as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,360 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
stabilité est méthodologiquement faible. Et un benchmark qui
|
| 9 |
-
ignore le plafond humain (« deux paléographes ne sont pas même
|
| 10 |
-
d'accord ») crée des classements faussement optimistes. Ce
|
| 11 |
-
module livre deux familles complémentaires :
|
| 12 |
-
|
| 13 |
-
1. **Inter-annotator agreement (IAA)** — quand un document a
|
| 14 |
-
plusieurs GT (deux paléographes, par ex.), Cohen κ et
|
| 15 |
-
Krippendorff α mesurent l'accord au niveau caractère.
|
| 16 |
-
Lecture : *« le CER de Pero (4,2 %) approche le plafond
|
| 17 |
-
humain (κ = 0,89). »*
|
| 18 |
-
|
| 19 |
-
2. **Stabilité multi-runs** — quand on relance la même
|
| 20 |
-
pipeline LLM N fois sur les mêmes documents, on mesure :
|
| 21 |
-
variance du CER, taux de tokens divergents entre runs,
|
| 22 |
-
CER pairwise moyen.
|
| 23 |
-
|
| 24 |
-
Périmètre Sprint 83
|
| 25 |
-
-------------------
|
| 26 |
-
**Couche de calcul uniquement** — fonctions pures, pas
|
| 27 |
-
d'intégration runner ni de vue HTML. L'extension du loader
|
| 28 |
-
pour accepter ``doc_001.gt.A.txt`` / ``doc_001.gt.B.txt`` est
|
| 29 |
-
documentée comme dépendance future ; en attendant le sprint
|
| 30 |
-
dédié, on prend deux strings GT en entrée.
|
| 31 |
-
|
| 32 |
-
Méthode
|
| 33 |
-
-------
|
| 34 |
-
*IAA caractère par caractère.* On aligne les deux GT par
|
| 35 |
-
``difflib.SequenceMatcher`` au niveau caractère et on construit
|
| 36 |
-
une table de contingence ``(annotator_a_char, annotator_b_char)``
|
| 37 |
-
sur les positions ``equal`` ou ``replace``. Cohen κ utilise
|
| 38 |
-
cette table directement. Krippendorff α utilise la version
|
| 39 |
-
matricielle (différence binaire pour le mode nominal).
|
| 40 |
-
|
| 41 |
-
*Stabilité multi-runs.* ``compute_multirun_stability(runs)``
|
| 42 |
-
prend une liste de N transcriptions du **même** document et
|
| 43 |
-
renvoie variance/écart-type/coefficient de variation du CER si
|
| 44 |
-
référence fournie ; sinon, taux pairwise de divergence
|
| 45 |
-
(intersection-vs-union des tokens).
|
| 46 |
"""
|
| 47 |
|
| 48 |
-
from
|
| 49 |
-
|
| 50 |
-
import logging
|
| 51 |
-
import statistics
|
| 52 |
-
from typing import Optional, Sequence
|
| 53 |
-
|
| 54 |
-
logger = logging.getLogger(__name__)
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 58 |
-
# Helpers d'alignement caractère par caractère
|
| 59 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
def _aligned_char_pairs(
|
| 63 |
-
text_a: str, text_b: str,
|
| 64 |
-
) -> list[tuple[str, str]]:
|
| 65 |
-
"""Aligne ``text_a`` et ``text_b`` caractère par caractère.
|
| 66 |
-
|
| 67 |
-
Retourne la liste des paires alignées sur les segments
|
| 68 |
-
``equal`` et ``replace`` de ``SequenceMatcher`` (les ``insert``
|
| 69 |
-
et ``delete`` sont ignorés — pas d'alignement valide).
|
| 70 |
-
"""
|
| 71 |
-
if not text_a and not text_b:
|
| 72 |
-
return []
|
| 73 |
-
import difflib
|
| 74 |
-
matcher = difflib.SequenceMatcher(None, text_a, text_b, autojunk=False)
|
| 75 |
-
pairs: list[tuple[str, str]] = []
|
| 76 |
-
for tag, i1, i2, j1, j2 in matcher.get_opcodes():
|
| 77 |
-
if tag == "equal":
|
| 78 |
-
for k in range(i2 - i1):
|
| 79 |
-
pairs.append((text_a[i1 + k], text_b[j1 + k]))
|
| 80 |
-
elif tag == "replace":
|
| 81 |
-
paired = min(i2 - i1, j2 - j1)
|
| 82 |
-
for k in range(paired):
|
| 83 |
-
pairs.append((text_a[i1 + k], text_b[j1 + k]))
|
| 84 |
-
# insert/delete : pas d'alignement bilatéral exploitable
|
| 85 |
-
return pairs
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
__all__: list[str] = []
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 92 |
-
# 1. Cohen's kappa (deux annotateurs, accord nominal)
|
| 93 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
def cohen_kappa(
|
| 97 |
-
annotations_a: Sequence,
|
| 98 |
-
annotations_b: Sequence,
|
| 99 |
-
) -> Optional[float]:
|
| 100 |
-
"""Cohen's κ entre deux annotateurs sur des observations
|
| 101 |
-
appariées.
|
| 102 |
-
|
| 103 |
-
Définition :
|
| 104 |
-
|
| 105 |
-
κ = (po - pe) / (1 - pe)
|
| 106 |
-
|
| 107 |
-
où ``po`` est l'accord observé (proportion de paires égales)
|
| 108 |
-
et ``pe`` l'accord attendu par hasard (somme sur les classes
|
| 109 |
-
de p_a(c) × p_b(c)).
|
| 110 |
-
|
| 111 |
-
Conventions :
|
| 112 |
-
- retourne ``None`` si les deux séquences sont vides ou de
|
| 113 |
-
tailles incompatibles ;
|
| 114 |
-
- κ = 1.0 quand l'accord est parfait, 0.0 quand il égale le
|
| 115 |
-
hasard, négatif si pire que le hasard ;
|
| 116 |
-
- quand ``pe == 1`` (un seul label dans les deux séquences),
|
| 117 |
-
retourne 1.0 si les séquences sont identiques, 0.0 sinon
|
| 118 |
-
(κ est mathématiquement indéfini, on choisit une
|
| 119 |
-
convention transparente documentée).
|
| 120 |
-
"""
|
| 121 |
-
if len(annotations_a) != len(annotations_b):
|
| 122 |
-
return None
|
| 123 |
-
n = len(annotations_a)
|
| 124 |
-
if n == 0:
|
| 125 |
-
return None
|
| 126 |
-
# Accord observé
|
| 127 |
-
agree = sum(1 for a, b in zip(annotations_a, annotations_b) if a == b)
|
| 128 |
-
p_o = agree / n
|
| 129 |
-
# Accord attendu par hasard
|
| 130 |
-
from collections import Counter
|
| 131 |
-
count_a = Counter(annotations_a)
|
| 132 |
-
count_b = Counter(annotations_b)
|
| 133 |
-
classes = set(count_a) | set(count_b)
|
| 134 |
-
p_e = sum(
|
| 135 |
-
(count_a.get(c, 0) / n) * (count_b.get(c, 0) / n)
|
| 136 |
-
for c in classes
|
| 137 |
-
)
|
| 138 |
-
if p_e >= 1.0 - 1e-12:
|
| 139 |
-
# Indéfini ; convention : 1 si identité totale, 0 sinon
|
| 140 |
-
return 1.0 if p_o >= 1.0 - 1e-12 else 0.0
|
| 141 |
-
return (p_o - p_e) / (1.0 - p_e)
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
__all__.append("cohen_kappa")
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 148 |
-
# 2. Krippendorff's alpha (généralisation à N annotateurs)
|
| 149 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
def krippendorff_alpha(
|
| 153 |
-
annotations_per_unit: Sequence[Sequence],
|
| 154 |
-
) -> Optional[float]:
|
| 155 |
-
"""Krippendorff's α en mode nominal pour N annotateurs.
|
| 156 |
-
|
| 157 |
-
Parameters
|
| 158 |
-
----------
|
| 159 |
-
annotations_per_unit:
|
| 160 |
-
Liste d'unités, chaque unité étant la liste des
|
| 161 |
-
annotations produites par les différents annotateurs sur
|
| 162 |
-
cette unité. ``None`` dans une cellule = annotation
|
| 163 |
-
manquante (autorisée).
|
| 164 |
-
|
| 165 |
-
Définition (Krippendorff 1980, équation pour métrique
|
| 166 |
-
nominale) :
|
| 167 |
-
|
| 168 |
-
α = 1 - D_o / D_e
|
| 169 |
-
|
| 170 |
-
où ``D_o`` est le désaccord observé (paires en désaccord
|
| 171 |
-
intra-unité, normalisées) et ``D_e`` le désaccord attendu
|
| 172 |
-
par hasard. ``α = 1`` accord parfait, ``α = 0`` hasard,
|
| 173 |
-
négatif si pire.
|
| 174 |
-
|
| 175 |
-
Conventions :
|
| 176 |
-
- unités avec moins de 2 annotations valides : ignorées
|
| 177 |
-
(Krippendorff convention) ;
|
| 178 |
-
- retourne ``None`` si moins d'une unité utilisable ou
|
| 179 |
-
``D_e == 0`` (un seul label dans tout le corpus).
|
| 180 |
-
"""
|
| 181 |
-
from collections import Counter
|
| 182 |
-
# Valeurs observées au niveau corpus
|
| 183 |
-
value_counts: Counter = Counter()
|
| 184 |
-
pair_disagree = 0.0
|
| 185 |
-
pair_total = 0.0
|
| 186 |
-
for unit in annotations_per_unit:
|
| 187 |
-
valid = [v for v in unit if v is not None]
|
| 188 |
-
m = len(valid)
|
| 189 |
-
if m < 2:
|
| 190 |
-
continue
|
| 191 |
-
# paires intra-unité (sans repetition, ordonné)
|
| 192 |
-
for i in range(m):
|
| 193 |
-
for j in range(m):
|
| 194 |
-
if i == j:
|
| 195 |
-
continue
|
| 196 |
-
pair_total += 1.0 / (m - 1)
|
| 197 |
-
if valid[i] != valid[j]:
|
| 198 |
-
pair_disagree += 1.0 / (m - 1)
|
| 199 |
-
for v in valid:
|
| 200 |
-
value_counts[v] += 1
|
| 201 |
-
if pair_total == 0:
|
| 202 |
-
return None
|
| 203 |
-
n_total = sum(value_counts.values())
|
| 204 |
-
if n_total < 2:
|
| 205 |
-
return None
|
| 206 |
-
# Désaccord attendu (sur paires aléatoires sans remise)
|
| 207 |
-
expected_disagree = 0.0
|
| 208 |
-
for v_a, c_a in value_counts.items():
|
| 209 |
-
for v_b, c_b in value_counts.items():
|
| 210 |
-
if v_a != v_b:
|
| 211 |
-
expected_disagree += c_a * c_b
|
| 212 |
-
expected_disagree /= n_total * (n_total - 1)
|
| 213 |
-
if expected_disagree <= 1e-12:
|
| 214 |
-
return None
|
| 215 |
-
d_o = pair_disagree / pair_total
|
| 216 |
-
return 1.0 - (d_o / expected_disagree)
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
__all__.append("krippendorff_alpha")
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 223 |
-
# 3. Helpers IAA caractère
|
| 224 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
def compute_iaa(
|
| 228 |
-
transcription_a: str,
|
| 229 |
-
transcription_b: str,
|
| 230 |
-
) -> Optional[dict]:
|
| 231 |
-
"""Calcule κ et α au niveau caractère entre deux
|
| 232 |
-
transcriptions du même document.
|
| 233 |
-
|
| 234 |
-
Aligne via ``_aligned_char_pairs`` puis :
|
| 235 |
-
- κ : sur la liste des paires alignées ;
|
| 236 |
-
- α : sur les unités à 2 annotations (équivalent à κ sur ce
|
| 237 |
-
cas, mais le cadre généralise à N annotateurs).
|
| 238 |
-
|
| 239 |
-
Retourne ``None`` si pas d'alignement possible (transcriptions
|
| 240 |
-
vides ou totalement disjointes).
|
| 241 |
-
"""
|
| 242 |
-
pairs = _aligned_char_pairs(transcription_a, transcription_b)
|
| 243 |
-
if not pairs:
|
| 244 |
-
return None
|
| 245 |
-
kappa = cohen_kappa([a for a, _ in pairs], [b for _, b in pairs])
|
| 246 |
-
alpha = krippendorff_alpha([[a, b] for a, b in pairs])
|
| 247 |
-
return {
|
| 248 |
-
"n_aligned_chars": len(pairs),
|
| 249 |
-
"cohen_kappa": kappa,
|
| 250 |
-
"krippendorff_alpha": alpha,
|
| 251 |
-
"agreement_rate": (
|
| 252 |
-
sum(1 for a, b in pairs if a == b) / len(pairs)
|
| 253 |
-
),
|
| 254 |
-
}
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
__all__.append("compute_iaa")
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 261 |
-
# 4. Stabilité multi-runs (variance CER, divergence pairwise)
|
| 262 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
def _split_words(text: str) -> list[str]:
|
| 266 |
-
return text.split() if text else []
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
def compute_multirun_stability(
|
| 270 |
-
runs: Sequence[str],
|
| 271 |
-
*,
|
| 272 |
-
reference: Optional[str] = None,
|
| 273 |
-
) -> Optional[dict]:
|
| 274 |
-
"""Mesure la stabilité de N runs successifs d'une même
|
| 275 |
-
pipeline (typiquement LLM/VLM non déterministe) sur un
|
| 276 |
-
document.
|
| 277 |
-
|
| 278 |
-
Parameters
|
| 279 |
-
----------
|
| 280 |
-
runs:
|
| 281 |
-
Liste des transcriptions produites à chaque run (≥ 2).
|
| 282 |
-
reference:
|
| 283 |
-
Transcription de référence (GT). Si fournie, on calcule
|
| 284 |
-
``cer_per_run``, leur variance et leur coefficient de
|
| 285 |
-
variation.
|
| 286 |
-
|
| 287 |
-
Returns
|
| 288 |
-
-------
|
| 289 |
-
dict | None
|
| 290 |
-
``{
|
| 291 |
-
"n_runs": int,
|
| 292 |
-
"pairwise_disagreement_mean": float, # divergence moyenne
|
| 293 |
-
"pairwise_disagreement_max": float,
|
| 294 |
-
"identical_run_rate": float, # paires identiques / total
|
| 295 |
-
"cer_per_run": Optional[list[float]],
|
| 296 |
-
"cer_mean": Optional[float],
|
| 297 |
-
"cer_stdev": Optional[float],
|
| 298 |
-
"cer_cv": Optional[float], # cv = stdev / mean
|
| 299 |
-
"n_distinct_outputs": int,
|
| 300 |
-
}``
|
| 301 |
-
ou ``None`` si moins de 2 runs.
|
| 302 |
-
"""
|
| 303 |
-
if len(runs) < 2:
|
| 304 |
-
return None
|
| 305 |
-
runs_list = list(runs)
|
| 306 |
-
# Divergence pairwise (token-level Jaccard distance)
|
| 307 |
-
n = len(runs_list)
|
| 308 |
-
n_pairs = 0
|
| 309 |
-
sum_disagree = 0.0
|
| 310 |
-
max_disagree = 0.0
|
| 311 |
-
n_identical = 0
|
| 312 |
-
for i in range(n):
|
| 313 |
-
for j in range(i + 1, n):
|
| 314 |
-
n_pairs += 1
|
| 315 |
-
tokens_i = set(_split_words(runs_list[i]))
|
| 316 |
-
tokens_j = set(_split_words(runs_list[j]))
|
| 317 |
-
union = tokens_i | tokens_j
|
| 318 |
-
if not union:
|
| 319 |
-
disagree = 0.0
|
| 320 |
-
else:
|
| 321 |
-
disagree = 1.0 - len(tokens_i & tokens_j) / len(union)
|
| 322 |
-
sum_disagree += disagree
|
| 323 |
-
if disagree > max_disagree:
|
| 324 |
-
max_disagree = disagree
|
| 325 |
-
if runs_list[i] == runs_list[j]:
|
| 326 |
-
n_identical += 1
|
| 327 |
-
pairwise_mean = sum_disagree / n_pairs if n_pairs else 0.0
|
| 328 |
-
identical_rate = n_identical / n_pairs if n_pairs else 0.0
|
| 329 |
-
distinct = len(set(runs_list))
|
| 330 |
-
|
| 331 |
-
cer_per_run: Optional[list[float]] = None
|
| 332 |
-
cer_mean: Optional[float] = None
|
| 333 |
-
cer_stdev: Optional[float] = None
|
| 334 |
-
cer_cv: Optional[float] = None
|
| 335 |
-
if reference is not None:
|
| 336 |
-
from picarones.core.metrics import _cer_from_strings
|
| 337 |
-
cer_per_run = [_cer_from_strings(reference, r) for r in runs_list]
|
| 338 |
-
cer_per_run = [v for v in cer_per_run if v is not None]
|
| 339 |
-
if cer_per_run:
|
| 340 |
-
cer_mean = statistics.fmean(cer_per_run)
|
| 341 |
-
if len(cer_per_run) >= 2:
|
| 342 |
-
cer_stdev = statistics.stdev(cer_per_run)
|
| 343 |
-
cer_cv = (
|
| 344 |
-
cer_stdev / cer_mean if cer_mean and cer_mean > 0
|
| 345 |
-
else None
|
| 346 |
-
)
|
| 347 |
-
return {
|
| 348 |
-
"n_runs": n,
|
| 349 |
-
"pairwise_disagreement_mean": pairwise_mean,
|
| 350 |
-
"pairwise_disagreement_max": max_disagree,
|
| 351 |
-
"identical_run_rate": identical_rate,
|
| 352 |
-
"n_distinct_outputs": distinct,
|
| 353 |
-
"cer_per_run": cer_per_run,
|
| 354 |
-
"cer_mean": cer_mean,
|
| 355 |
-
"cer_stdev": cer_stdev,
|
| 356 |
-
"cer_cv": cer_cv,
|
| 357 |
-
}
|
| 358 |
-
|
| 359 |
|
| 360 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.reliability`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.reliability import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.reliability import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.reliability as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,731 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
- Rotation (angle croissant)
|
| 9 |
-
- Réduction de résolution (facteur de downscaling)
|
| 10 |
-
- Binarisation (seuillage Otsu ou fixe)
|
| 11 |
-
2. Exécution du moteur OCR sur chaque version dégradée
|
| 12 |
-
3. Calcul du CER pour chaque niveau de dégradation
|
| 13 |
-
4. Génération de courbes de robustesse (CER en fonction du niveau)
|
| 14 |
-
5. Identification du seuil critique (niveau à partir duquel CER > seuil)
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
>>> analyzer = RobustnessAnalyzer(engine, degradation_types=["noise", "blur"])
|
| 20 |
-
>>> report = analyzer.analyze(corpus)
|
| 21 |
-
>>> print(report.critical_thresholds)
|
| 22 |
"""
|
| 23 |
|
| 24 |
-
from
|
| 25 |
|
| 26 |
-
import
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
from dataclasses import dataclass, field
|
| 31 |
-
from pathlib import Path
|
| 32 |
-
from typing import TYPE_CHECKING, Optional
|
| 33 |
-
|
| 34 |
-
if TYPE_CHECKING:
|
| 35 |
-
from picarones.core.corpus import Corpus, Document
|
| 36 |
-
from picarones.engines.base import BaseOCREngine
|
| 37 |
-
|
| 38 |
-
logger = logging.getLogger(__name__)
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
# ---------------------------------------------------------------------------
|
| 42 |
-
# Paramètres de dégradation
|
| 43 |
-
# ---------------------------------------------------------------------------
|
| 44 |
-
|
| 45 |
-
# Niveaux de dégradation pour chaque type
|
| 46 |
-
DEGRADATION_LEVELS: dict[str, list] = {
|
| 47 |
-
"noise": [0, 5, 15, 30, 50, 80], # sigma du bruit gaussien
|
| 48 |
-
"blur": [0, 1, 2, 3, 5, 8], # rayon du flou gaussien (pixels)
|
| 49 |
-
"rotation": [0, 1, 2, 5, 10, 20], # angle de rotation (degrés)
|
| 50 |
-
"resolution": [1.0, 0.75, 0.5, 0.33, 0.25, 0.1], # facteur de résolution
|
| 51 |
-
"binarization": [0, 64, 96, 128, 160, 192], # seuil de binarisation (0 = Otsu)
|
| 52 |
-
}
|
| 53 |
-
|
| 54 |
-
DEGRADATION_LABELS: dict[str, list[str]] = {
|
| 55 |
-
"noise": ["original", "σ=5", "σ=15", "σ=30", "σ=50", "σ=80"],
|
| 56 |
-
"blur": ["original", "r=1", "r=2", "r=3", "r=5", "r=8"],
|
| 57 |
-
"rotation": ["0°", "1°", "2°", "5°", "10°", "20°"],
|
| 58 |
-
"resolution": ["100%", "75%", "50%", "33%", "25%", "10%"],
|
| 59 |
-
"binarization": ["original", "seuil=64", "seuil=96", "seuil=128", "seuil=160", "seuil=192"],
|
| 60 |
-
}
|
| 61 |
-
|
| 62 |
-
ALL_DEGRADATION_TYPES = list(DEGRADATION_LEVELS.keys())
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
# ---------------------------------------------------------------------------
|
| 66 |
-
# Dégradation d'image (pure Python + stdlib, optionnellement Pillow/NumPy)
|
| 67 |
-
# ---------------------------------------------------------------------------
|
| 68 |
-
|
| 69 |
-
def _apply_gaussian_noise(pixels: list[list[list[int]]], sigma: float, rng_seed: int = 0) -> list[list[list[int]]]:
|
| 70 |
-
"""Applique du bruit gaussien (pure Python)."""
|
| 71 |
-
import random
|
| 72 |
-
rng = random.Random(rng_seed)
|
| 73 |
-
h = len(pixels)
|
| 74 |
-
w = len(pixels[0]) if h > 0 else 0
|
| 75 |
-
result = []
|
| 76 |
-
for y in range(h):
|
| 77 |
-
row = []
|
| 78 |
-
for x in range(w):
|
| 79 |
-
pixel = []
|
| 80 |
-
for c in pixels[y][x]:
|
| 81 |
-
noise = rng.gauss(0, sigma)
|
| 82 |
-
val = int(c + noise)
|
| 83 |
-
pixel.append(max(0, min(255, val)))
|
| 84 |
-
row.append(pixel)
|
| 85 |
-
result.append(row)
|
| 86 |
-
return result
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
def _apply_box_blur(pixels: list[list[list[int]]], radius: int) -> list[list[list[int]]]:
|
| 90 |
-
"""Applique un flou de boîte (approximation du flou gaussien, pure Python)."""
|
| 91 |
-
if radius <= 0:
|
| 92 |
-
return pixels
|
| 93 |
-
h = len(pixels)
|
| 94 |
-
w = len(pixels[0]) if h > 0 else 0
|
| 95 |
-
channels = len(pixels[0][0]) if h > 0 and w > 0 else 3
|
| 96 |
-
|
| 97 |
-
def blur_pass(data: list[list[list[int]]]) -> list[list[list[int]]]:
|
| 98 |
-
out = []
|
| 99 |
-
for y in range(h):
|
| 100 |
-
row = []
|
| 101 |
-
for x in range(w):
|
| 102 |
-
totals = [0] * channels
|
| 103 |
-
count = 0
|
| 104 |
-
for dy in range(-radius, radius + 1):
|
| 105 |
-
for dx in range(-radius, radius + 1):
|
| 106 |
-
ny, nx = y + dy, x + dx
|
| 107 |
-
if 0 <= ny < h and 0 <= nx < w:
|
| 108 |
-
for c in range(channels):
|
| 109 |
-
totals[c] += data[ny][nx][c]
|
| 110 |
-
count += 1
|
| 111 |
-
row.append([t // count for t in totals])
|
| 112 |
-
out.append(row)
|
| 113 |
-
return out
|
| 114 |
-
|
| 115 |
-
return blur_pass(pixels)
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
def _apply_rotation_simple(pixels: list[list[list[int]]], angle_deg: float) -> list[list[list[int]]]:
|
| 119 |
-
"""Rotation avec interpolation au plus proche voisin (pure Python).
|
| 120 |
-
|
| 121 |
-
Pour des angles faibles, l'effet est réaliste.
|
| 122 |
-
"""
|
| 123 |
-
if angle_deg == 0:
|
| 124 |
-
return pixels
|
| 125 |
-
h = len(pixels)
|
| 126 |
-
w = len(pixels[0]) if h > 0 else 0
|
| 127 |
-
channels = len(pixels[0][0]) if h > 0 and w > 0 else 3
|
| 128 |
-
|
| 129 |
-
angle_rad = math.radians(angle_deg)
|
| 130 |
-
cos_a = math.cos(angle_rad)
|
| 131 |
-
sin_a = math.sin(angle_rad)
|
| 132 |
-
cx, cy = w / 2, h / 2
|
| 133 |
-
|
| 134 |
-
result = [[[245, 240, 232][:channels] for _ in range(w)] for _ in range(h)]
|
| 135 |
-
for y in range(h):
|
| 136 |
-
for x in range(w):
|
| 137 |
-
# Coordonnées source
|
| 138 |
-
sx = cos_a * (x - cx) + sin_a * (y - cy) + cx
|
| 139 |
-
sy = -sin_a * (x - cx) + cos_a * (y - cy) + cy
|
| 140 |
-
ix, iy = int(round(sx)), int(round(sy))
|
| 141 |
-
if 0 <= ix < w and 0 <= iy < h:
|
| 142 |
-
result[y][x] = list(pixels[iy][ix])
|
| 143 |
-
return result
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
def _apply_resolution_reduction(
|
| 147 |
-
pixels: list[list[list[int]]], factor: float
|
| 148 |
-
) -> list[list[list[int]]]:
|
| 149 |
-
"""Réduit la résolution puis remonte à la taille originale (pixelisation)."""
|
| 150 |
-
if factor >= 1.0:
|
| 151 |
-
return pixels
|
| 152 |
-
h = len(pixels)
|
| 153 |
-
w = len(pixels[0]) if h > 0 else 0
|
| 154 |
-
new_h = max(1, int(h * factor))
|
| 155 |
-
new_w = max(1, int(w * factor))
|
| 156 |
-
|
| 157 |
-
# Downscale
|
| 158 |
-
small = []
|
| 159 |
-
for y in range(new_h):
|
| 160 |
-
row = []
|
| 161 |
-
src_y = int(y / factor)
|
| 162 |
-
for x in range(new_w):
|
| 163 |
-
src_x = int(x / factor)
|
| 164 |
-
row.append(list(pixels[min(src_y, h - 1)][min(src_x, w - 1)]))
|
| 165 |
-
small.append(row)
|
| 166 |
-
|
| 167 |
-
# Upscale (nearest-neighbor)
|
| 168 |
-
result = []
|
| 169 |
-
for y in range(h):
|
| 170 |
-
row = []
|
| 171 |
-
src_y = min(int(y * factor), new_h - 1)
|
| 172 |
-
for x in range(w):
|
| 173 |
-
src_x = min(int(x * factor), new_w - 1)
|
| 174 |
-
row.append(list(small[src_y][src_x]))
|
| 175 |
-
result.append(row)
|
| 176 |
-
return result
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
def _apply_binarization(
|
| 180 |
-
pixels: list[list[list[int]]], threshold: int
|
| 181 |
-
) -> list[list[list[int]]]:
|
| 182 |
-
"""Binarise l'image (seuillage fixe sur luminosité)."""
|
| 183 |
-
h = len(pixels)
|
| 184 |
-
w = len(pixels[0]) if h > 0 else 0
|
| 185 |
-
result = []
|
| 186 |
-
|
| 187 |
-
# Calculer le seuil Otsu si threshold == 0
|
| 188 |
-
if threshold == 0:
|
| 189 |
-
histogram = [0] * 256
|
| 190 |
-
total = h * w
|
| 191 |
-
for y in range(h):
|
| 192 |
-
for x in range(w):
|
| 193 |
-
p = pixels[y][x]
|
| 194 |
-
lum = int(0.299 * p[0] + 0.587 * p[1] + 0.114 * p[2]) if len(p) >= 3 else p[0]
|
| 195 |
-
histogram[lum] += 1
|
| 196 |
-
# Otsu simplifié
|
| 197 |
-
best_thresh = 128
|
| 198 |
-
best_var = -1.0
|
| 199 |
-
total_sum = sum(i * histogram[i] for i in range(256))
|
| 200 |
-
w0, w1, sum0 = 0, total, 0.0
|
| 201 |
-
for t in range(256):
|
| 202 |
-
w0 += histogram[t]
|
| 203 |
-
if w0 == 0:
|
| 204 |
-
continue
|
| 205 |
-
w1 = total - w0
|
| 206 |
-
if w1 == 0:
|
| 207 |
-
break
|
| 208 |
-
sum0 += t * histogram[t]
|
| 209 |
-
mean0 = sum0 / w0
|
| 210 |
-
mean1 = (total_sum - sum0) / w1
|
| 211 |
-
var = w0 * w1 * (mean0 - mean1) ** 2
|
| 212 |
-
if var > best_var:
|
| 213 |
-
best_var = var
|
| 214 |
-
best_thresh = t
|
| 215 |
-
threshold = best_thresh
|
| 216 |
-
|
| 217 |
-
for y in range(h):
|
| 218 |
-
row = []
|
| 219 |
-
for x in range(w):
|
| 220 |
-
p = pixels[y][x]
|
| 221 |
-
lum = int(0.299 * p[0] + 0.587 * p[1] + 0.114 * p[2]) if len(p) >= 3 else p[0]
|
| 222 |
-
val = 255 if lum >= threshold else 0
|
| 223 |
-
row.append([val] * len(p))
|
| 224 |
-
result.append(row)
|
| 225 |
-
return result
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
def degrade_image_bytes(
|
| 229 |
-
png_bytes: bytes,
|
| 230 |
-
degradation_type: str,
|
| 231 |
-
level: float,
|
| 232 |
-
) -> bytes:
|
| 233 |
-
"""Dégrade une image PNG et retourne les bytes PNG modifiés.
|
| 234 |
-
|
| 235 |
-
Utilise Pillow si disponible, sinon utilise l'implémentation pure Python.
|
| 236 |
-
|
| 237 |
-
Parameters
|
| 238 |
-
----------
|
| 239 |
-
png_bytes:
|
| 240 |
-
Bytes de l'image PNG source.
|
| 241 |
-
degradation_type:
|
| 242 |
-
Type de dégradation (``"noise"``, ``"blur"``, ``"rotation"``,
|
| 243 |
-
``"resolution"``, ``"binarization"``).
|
| 244 |
-
level:
|
| 245 |
-
Niveau de dégradation (valeur numérique selon le type).
|
| 246 |
-
|
| 247 |
-
Returns
|
| 248 |
-
-------
|
| 249 |
-
bytes
|
| 250 |
-
Bytes de l'image PNG dégradée.
|
| 251 |
-
"""
|
| 252 |
-
try:
|
| 253 |
-
return _degrade_pillow(png_bytes, degradation_type, level)
|
| 254 |
-
except ImportError:
|
| 255 |
-
return _degrade_pure_python(png_bytes, degradation_type, level)
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
def _degrade_pillow(png_bytes: bytes, degradation_type: str, level: float) -> bytes:
|
| 259 |
-
"""Dégradation avec Pillow (meilleure qualité)."""
|
| 260 |
-
import io
|
| 261 |
-
from PIL import Image, ImageFilter
|
| 262 |
-
|
| 263 |
-
img = Image.open(io.BytesIO(png_bytes)).convert("RGB")
|
| 264 |
-
|
| 265 |
-
if degradation_type == "noise":
|
| 266 |
-
if level > 0:
|
| 267 |
-
import random
|
| 268 |
-
# RGB : 3 octets par pixel, tobytes() reste stable Pillow 10 → 14+
|
| 269 |
-
raw = img.tobytes()
|
| 270 |
-
rng = random.Random(0)
|
| 271 |
-
noisy = []
|
| 272 |
-
for i in range(0, len(raw), 3):
|
| 273 |
-
r, g, b = raw[i], raw[i + 1], raw[i + 2]
|
| 274 |
-
noisy.append((
|
| 275 |
-
max(0, min(255, int(r + rng.gauss(0, level)))),
|
| 276 |
-
max(0, min(255, int(g + rng.gauss(0, level)))),
|
| 277 |
-
max(0, min(255, int(b + rng.gauss(0, level)))),
|
| 278 |
-
))
|
| 279 |
-
img.putdata(noisy)
|
| 280 |
-
|
| 281 |
-
elif degradation_type == "blur":
|
| 282 |
-
if level > 0:
|
| 283 |
-
img = img.filter(ImageFilter.GaussianBlur(radius=level))
|
| 284 |
-
|
| 285 |
-
elif degradation_type == "rotation":
|
| 286 |
-
if level != 0:
|
| 287 |
-
img = img.rotate(-level, expand=False, fillcolor=(245, 240, 232))
|
| 288 |
-
|
| 289 |
-
elif degradation_type == "resolution":
|
| 290 |
-
if level < 1.0:
|
| 291 |
-
w, h = img.size
|
| 292 |
-
new_w, new_h = max(1, int(w * level)), max(1, int(h * level))
|
| 293 |
-
img = img.resize((new_w, new_h), Image.NEAREST)
|
| 294 |
-
img = img.resize((w, h), Image.NEAREST)
|
| 295 |
-
|
| 296 |
-
elif degradation_type == "binarization":
|
| 297 |
-
img = img.convert("L") # niveaux de gris
|
| 298 |
-
if level == 0:
|
| 299 |
-
# Seuillage Otsu : calcul du seuil optimal
|
| 300 |
-
histogram = img.histogram()
|
| 301 |
-
total = img.size[0] * img.size[1]
|
| 302 |
-
best_thresh, best_var = 128, -1.0
|
| 303 |
-
total_sum = sum(i * histogram[i] for i in range(256))
|
| 304 |
-
w0, sum0 = 0, 0.0
|
| 305 |
-
for t in range(256):
|
| 306 |
-
w0 += histogram[t]
|
| 307 |
-
if w0 == 0:
|
| 308 |
-
continue
|
| 309 |
-
w1 = total - w0
|
| 310 |
-
if w1 == 0:
|
| 311 |
-
break
|
| 312 |
-
sum0 += t * histogram[t]
|
| 313 |
-
var = w0 * w1 * (sum0 / w0 - (total_sum - sum0) / w1) ** 2
|
| 314 |
-
if var > best_var:
|
| 315 |
-
best_var = var
|
| 316 |
-
best_thresh = t
|
| 317 |
-
threshold = best_thresh
|
| 318 |
-
else:
|
| 319 |
-
threshold = int(level)
|
| 320 |
-
img = img.point(lambda p: 255 if p >= threshold else 0, "1").convert("RGB")
|
| 321 |
-
|
| 322 |
-
buf = io.BytesIO()
|
| 323 |
-
img.save(buf, format="PNG")
|
| 324 |
-
return buf.getvalue()
|
| 325 |
-
|
| 326 |
-
|
| 327 |
-
def _degrade_pure_python(png_bytes: bytes, degradation_type: str, level: float) -> bytes:
|
| 328 |
-
"""Dégradation en pur Python (sans Pillow).
|
| 329 |
-
|
| 330 |
-
Décode le PNG, applique la transformation, ré-encode en PNG.
|
| 331 |
-
Note : n'implémente pas le décodage PNG complet — utilise des stubs.
|
| 332 |
-
"""
|
| 333 |
-
# Pour l'implémentation pure Python, on applique des transformations
|
| 334 |
-
# minimales sur les bytes bruts en créant une image de test synthétique.
|
| 335 |
-
# En pratique, Pillow est presque toujours disponible dans l'environnement Picarones.
|
| 336 |
-
logger.warning(
|
| 337 |
-
"Pillow non disponible : dégradation '%s' appliquée en mode dégradé (stub)",
|
| 338 |
-
degradation_type,
|
| 339 |
-
)
|
| 340 |
-
# Retourner l'image originale légèrement modifiée (simulation)
|
| 341 |
-
return png_bytes
|
| 342 |
-
|
| 343 |
-
|
| 344 |
-
# ---------------------------------------------------------------------------
|
| 345 |
-
# Structures de résultats
|
| 346 |
-
# ---------------------------------------------------------------------------
|
| 347 |
-
|
| 348 |
-
@dataclass
|
| 349 |
-
class DegradationCurve:
|
| 350 |
-
"""Courbe CER vs niveau de dégradation pour un moteur et un type de dégradation."""
|
| 351 |
-
engine_name: str
|
| 352 |
-
degradation_type: str
|
| 353 |
-
levels: list[float]
|
| 354 |
-
labels: list[str]
|
| 355 |
-
cer_values: list[Optional[float]]
|
| 356 |
-
"""CER moyen (0-1) à chaque niveau. None si calcul impossible."""
|
| 357 |
-
critical_threshold_level: Optional[float] = None
|
| 358 |
-
"""Niveau à partir duquel CER > cer_threshold."""
|
| 359 |
-
cer_threshold: float = 0.20
|
| 360 |
-
"""Seuil de CER utilisé pour déterminer le niveau critique."""
|
| 361 |
-
|
| 362 |
-
def as_dict(self) -> dict:
|
| 363 |
-
return {
|
| 364 |
-
"engine_name": self.engine_name,
|
| 365 |
-
"degradation_type": self.degradation_type,
|
| 366 |
-
"levels": self.levels,
|
| 367 |
-
"labels": self.labels,
|
| 368 |
-
"cer_values": self.cer_values,
|
| 369 |
-
"critical_threshold_level": self.critical_threshold_level,
|
| 370 |
-
"cer_threshold": self.cer_threshold,
|
| 371 |
-
}
|
| 372 |
-
|
| 373 |
-
|
| 374 |
-
@dataclass
|
| 375 |
-
class RobustnessReport:
|
| 376 |
-
"""Rapport complet d'analyse de robustesse pour un ou plusieurs moteurs."""
|
| 377 |
-
engine_names: list[str]
|
| 378 |
-
corpus_name: str
|
| 379 |
-
degradation_types: list[str]
|
| 380 |
-
curves: list[DegradationCurve]
|
| 381 |
-
summary: dict = field(default_factory=dict)
|
| 382 |
-
"""Résumé : moteur le plus robuste par type de dégradation, seuils critiques…"""
|
| 383 |
-
|
| 384 |
-
def get_curves_for_engine(self, engine_name: str) -> list[DegradationCurve]:
|
| 385 |
-
return [c for c in self.curves if c.engine_name == engine_name]
|
| 386 |
-
|
| 387 |
-
def get_curves_for_type(self, degradation_type: str) -> list[DegradationCurve]:
|
| 388 |
-
return [c for c in self.curves if c.degradation_type == degradation_type]
|
| 389 |
-
|
| 390 |
-
def as_dict(self) -> dict:
|
| 391 |
-
return {
|
| 392 |
-
"engine_names": self.engine_names,
|
| 393 |
-
"corpus_name": self.corpus_name,
|
| 394 |
-
"degradation_types": self.degradation_types,
|
| 395 |
-
"curves": [c.as_dict() for c in self.curves],
|
| 396 |
-
"summary": self.summary,
|
| 397 |
-
}
|
| 398 |
-
|
| 399 |
-
|
| 400 |
-
# ---------------------------------------------------------------------------
|
| 401 |
-
# Analyseur de robustesse
|
| 402 |
-
# ---------------------------------------------------------------------------
|
| 403 |
-
|
| 404 |
-
class RobustnessAnalyzer:
|
| 405 |
-
"""Lance une analyse de robustesse sur un corpus.
|
| 406 |
-
|
| 407 |
-
Parameters
|
| 408 |
-
----------
|
| 409 |
-
engines:
|
| 410 |
-
Un ou plusieurs moteurs OCR (``BaseOCREngine``).
|
| 411 |
-
degradation_types:
|
| 412 |
-
Liste des types de dégradation à tester.
|
| 413 |
-
Par défaut : tous (``"noise"``, ``"blur"``, ``"rotation"``,
|
| 414 |
-
``"resolution"``, ``"binarization"``).
|
| 415 |
-
cer_threshold:
|
| 416 |
-
Seuil de CER pour définir le niveau critique (défaut : 0.20 = 20%).
|
| 417 |
-
custom_levels:
|
| 418 |
-
Niveaux personnalisés par type (remplace les valeurs par défaut).
|
| 419 |
-
|
| 420 |
-
Examples
|
| 421 |
-
--------
|
| 422 |
-
>>> from picarones.engines.tesseract import TesseractEngine
|
| 423 |
-
>>> from picarones.core.robustness import RobustnessAnalyzer
|
| 424 |
-
>>> engine = TesseractEngine(config={"lang": "fra"})
|
| 425 |
-
>>> analyzer = RobustnessAnalyzer([engine], degradation_types=["noise", "blur"])
|
| 426 |
-
>>> report = analyzer.analyze(corpus)
|
| 427 |
-
"""
|
| 428 |
-
|
| 429 |
-
def __init__(
|
| 430 |
-
self,
|
| 431 |
-
engines: "list[BaseOCREngine]",
|
| 432 |
-
degradation_types: Optional[list[str]] = None,
|
| 433 |
-
cer_threshold: float = 0.20,
|
| 434 |
-
custom_levels: Optional[dict[str, list]] = None,
|
| 435 |
-
) -> None:
|
| 436 |
-
if not isinstance(engines, list):
|
| 437 |
-
engines = [engines]
|
| 438 |
-
self.engines = engines
|
| 439 |
-
self.degradation_types = degradation_types or ALL_DEGRADATION_TYPES
|
| 440 |
-
self.cer_threshold = cer_threshold
|
| 441 |
-
self.levels = dict(DEGRADATION_LEVELS)
|
| 442 |
-
if custom_levels:
|
| 443 |
-
self.levels.update(custom_levels)
|
| 444 |
-
|
| 445 |
-
def analyze(
|
| 446 |
-
self,
|
| 447 |
-
corpus: "Corpus",
|
| 448 |
-
show_progress: bool = True,
|
| 449 |
-
max_docs: int = 10,
|
| 450 |
-
) -> RobustnessReport:
|
| 451 |
-
"""Lance l'analyse de robustesse sur le corpus.
|
| 452 |
-
|
| 453 |
-
Parameters
|
| 454 |
-
----------
|
| 455 |
-
corpus:
|
| 456 |
-
Corpus Picarones avec images et GT.
|
| 457 |
-
show_progress:
|
| 458 |
-
Affiche la progression.
|
| 459 |
-
max_docs:
|
| 460 |
-
Nombre maximum de documents à traiter (pour la rapidité).
|
| 461 |
-
|
| 462 |
-
Returns
|
| 463 |
-
-------
|
| 464 |
-
RobustnessReport
|
| 465 |
-
"""
|
| 466 |
-
from picarones.core.metrics import compute_metrics
|
| 467 |
-
|
| 468 |
-
docs = corpus.documents[:max_docs]
|
| 469 |
-
curves: list[DegradationCurve] = []
|
| 470 |
-
|
| 471 |
-
for engine in self.engines:
|
| 472 |
-
for deg_type in self.degradation_types:
|
| 473 |
-
levels = self.levels[deg_type]
|
| 474 |
-
labels = DEGRADATION_LABELS.get(deg_type, [str(lv) for lv in levels])
|
| 475 |
-
|
| 476 |
-
cer_per_level: list[Optional[float]] = []
|
| 477 |
-
|
| 478 |
-
if show_progress:
|
| 479 |
-
try:
|
| 480 |
-
from tqdm import tqdm
|
| 481 |
-
level_iter = tqdm(
|
| 482 |
-
list(enumerate(levels)),
|
| 483 |
-
desc=f"{engine.name} / {deg_type}",
|
| 484 |
-
)
|
| 485 |
-
except ImportError:
|
| 486 |
-
level_iter = enumerate(levels)
|
| 487 |
-
else:
|
| 488 |
-
level_iter = enumerate(levels)
|
| 489 |
-
|
| 490 |
-
for lvl_idx, level in level_iter:
|
| 491 |
-
doc_cers: list[float] = []
|
| 492 |
-
|
| 493 |
-
for doc in docs:
|
| 494 |
-
gt = doc.ground_truth.strip()
|
| 495 |
-
if not gt:
|
| 496 |
-
continue
|
| 497 |
-
|
| 498 |
-
# Obtenir l'image (fichier ou data URI)
|
| 499 |
-
degraded_bytes = self._get_degraded_image(
|
| 500 |
-
doc, deg_type, level
|
| 501 |
-
)
|
| 502 |
-
if degraded_bytes is None:
|
| 503 |
-
continue
|
| 504 |
-
|
| 505 |
-
# Sauvegarder temporairement et OCR
|
| 506 |
-
with tempfile.NamedTemporaryFile(
|
| 507 |
-
suffix=".png", delete=False
|
| 508 |
-
) as tmp:
|
| 509 |
-
tmp.write(degraded_bytes)
|
| 510 |
-
tmp_path = tmp.name
|
| 511 |
-
|
| 512 |
-
try:
|
| 513 |
-
ocr_result = engine.run(tmp_path)
|
| 514 |
-
hypothesis = ocr_result.text
|
| 515 |
-
metrics = compute_metrics(gt, hypothesis)
|
| 516 |
-
doc_cers.append(metrics.cer)
|
| 517 |
-
except Exception as exc:
|
| 518 |
-
logger.debug(
|
| 519 |
-
"Erreur OCR %s niveau %s=%s: %s",
|
| 520 |
-
engine.name, deg_type, level, exc
|
| 521 |
-
)
|
| 522 |
-
finally:
|
| 523 |
-
try:
|
| 524 |
-
os.unlink(tmp_path)
|
| 525 |
-
except OSError:
|
| 526 |
-
pass
|
| 527 |
-
|
| 528 |
-
if doc_cers:
|
| 529 |
-
cer_per_level.append(sum(doc_cers) / len(doc_cers))
|
| 530 |
-
else:
|
| 531 |
-
cer_per_level.append(None)
|
| 532 |
-
|
| 533 |
-
# Calculer le niveau critique
|
| 534 |
-
critical = self._find_critical_level(
|
| 535 |
-
levels, cer_per_level, self.cer_threshold
|
| 536 |
-
)
|
| 537 |
-
|
| 538 |
-
curves.append(DegradationCurve(
|
| 539 |
-
engine_name=engine.name,
|
| 540 |
-
degradation_type=deg_type,
|
| 541 |
-
levels=levels,
|
| 542 |
-
labels=labels[:len(levels)],
|
| 543 |
-
cer_values=cer_per_level,
|
| 544 |
-
critical_threshold_level=critical,
|
| 545 |
-
cer_threshold=self.cer_threshold,
|
| 546 |
-
))
|
| 547 |
-
|
| 548 |
-
summary = self._build_summary(curves)
|
| 549 |
-
|
| 550 |
-
return RobustnessReport(
|
| 551 |
-
engine_names=[e.name for e in self.engines],
|
| 552 |
-
corpus_name=corpus.name,
|
| 553 |
-
degradation_types=self.degradation_types,
|
| 554 |
-
curves=curves,
|
| 555 |
-
summary=summary,
|
| 556 |
-
)
|
| 557 |
-
|
| 558 |
-
def _get_degraded_image(
|
| 559 |
-
self,
|
| 560 |
-
doc: "Document",
|
| 561 |
-
degradation_type: str,
|
| 562 |
-
level: float,
|
| 563 |
-
) -> Optional[bytes]:
|
| 564 |
-
"""Retourne les bytes PNG de l'image dégradée."""
|
| 565 |
-
# Charger l'image originale
|
| 566 |
-
original_bytes = self._load_image(doc)
|
| 567 |
-
if original_bytes is None:
|
| 568 |
-
return None
|
| 569 |
-
|
| 570 |
-
# Niveau 0 = image originale (sauf binarisation à 0 = Otsu)
|
| 571 |
-
if (degradation_type == "noise" and level == 0) or \
|
| 572 |
-
(degradation_type == "blur" and level == 0) or \
|
| 573 |
-
(degradation_type == "rotation" and level == 0) or \
|
| 574 |
-
(degradation_type == "resolution" and level >= 1.0):
|
| 575 |
-
return original_bytes
|
| 576 |
-
|
| 577 |
-
return degrade_image_bytes(original_bytes, degradation_type, level)
|
| 578 |
-
|
| 579 |
-
def _load_image(self, doc: "Document") -> Optional[bytes]:
|
| 580 |
-
"""Charge les bytes PNG de l'image d'un document."""
|
| 581 |
-
img_path = doc.image_path
|
| 582 |
-
|
| 583 |
-
# Data URI (base64)
|
| 584 |
-
if img_path.startswith("data:image/"):
|
| 585 |
-
import base64
|
| 586 |
-
try:
|
| 587 |
-
_, b64 = img_path.split(",", 1)
|
| 588 |
-
return base64.b64decode(b64)
|
| 589 |
-
except Exception as exc:
|
| 590 |
-
logger.debug("Impossible de décoder data URI: %s", exc)
|
| 591 |
-
return None
|
| 592 |
-
|
| 593 |
-
# Fichier local
|
| 594 |
-
path = Path(img_path)
|
| 595 |
-
if path.exists():
|
| 596 |
-
return path.read_bytes()
|
| 597 |
-
|
| 598 |
-
logger.debug("Image introuvable : %s", img_path)
|
| 599 |
-
return None
|
| 600 |
-
|
| 601 |
-
@staticmethod
|
| 602 |
-
def _find_critical_level(
|
| 603 |
-
levels: list[float],
|
| 604 |
-
cer_values: list[Optional[float]],
|
| 605 |
-
threshold: float,
|
| 606 |
-
) -> Optional[float]:
|
| 607 |
-
"""Trouve le niveau à partir duquel CER dépasse le seuil."""
|
| 608 |
-
for level, cer in zip(levels, cer_values):
|
| 609 |
-
if cer is not None and cer > threshold:
|
| 610 |
-
return level
|
| 611 |
-
return None
|
| 612 |
-
|
| 613 |
-
@staticmethod
|
| 614 |
-
def _build_summary(curves: list[DegradationCurve]) -> dict:
|
| 615 |
-
"""Construit le résumé de l'analyse."""
|
| 616 |
-
summary: dict = {}
|
| 617 |
-
|
| 618 |
-
# Par type de dégradation : moteur le plus robuste
|
| 619 |
-
by_type: dict[str, dict[str, list]] = {}
|
| 620 |
-
for curve in curves:
|
| 621 |
-
dt = curve.degradation_type
|
| 622 |
-
if dt not in by_type:
|
| 623 |
-
by_type[dt] = {}
|
| 624 |
-
valid_cers = [c for c in curve.cer_values if c is not None]
|
| 625 |
-
if valid_cers:
|
| 626 |
-
by_type[dt][curve.engine_name] = valid_cers
|
| 627 |
-
|
| 628 |
-
for dt, engine_cers in by_type.items():
|
| 629 |
-
if not engine_cers:
|
| 630 |
-
continue
|
| 631 |
-
# Robustesse = CER moyen sur tous les niveaux (plus bas = plus robuste)
|
| 632 |
-
best_engine = min(engine_cers, key=lambda e: sum(engine_cers[e]) / len(engine_cers[e]))
|
| 633 |
-
summary[f"most_robust_{dt}"] = best_engine
|
| 634 |
-
|
| 635 |
-
# Seuils critiques par moteur
|
| 636 |
-
for curve in curves:
|
| 637 |
-
key = f"critical_{curve.engine_name}_{curve.degradation_type}"
|
| 638 |
-
summary[key] = curve.critical_threshold_level
|
| 639 |
-
|
| 640 |
-
return summary
|
| 641 |
-
|
| 642 |
-
|
| 643 |
-
# ---------------------------------------------------------------------------
|
| 644 |
-
# Données de démonstration de robustesse
|
| 645 |
-
# ---------------------------------------------------------------------------
|
| 646 |
-
|
| 647 |
-
def generate_demo_robustness_report(
|
| 648 |
-
engine_names: Optional[list[str]] = None,
|
| 649 |
-
seed: int = 42,
|
| 650 |
-
) -> RobustnessReport:
|
| 651 |
-
"""Génère un rapport de robustesse fictif mais réaliste pour la démo.
|
| 652 |
-
|
| 653 |
-
Parameters
|
| 654 |
-
----------
|
| 655 |
-
engine_names:
|
| 656 |
-
Noms des moteurs à simuler (défaut : tesseract, pero_ocr).
|
| 657 |
-
seed:
|
| 658 |
-
Graine aléatoire.
|
| 659 |
-
|
| 660 |
-
Returns
|
| 661 |
-
-------
|
| 662 |
-
RobustnessReport
|
| 663 |
-
"""
|
| 664 |
-
import random
|
| 665 |
-
rng = random.Random(seed)
|
| 666 |
-
|
| 667 |
-
if engine_names is None:
|
| 668 |
-
engine_names = ["tesseract", "pero_ocr"]
|
| 669 |
-
|
| 670 |
-
# CER de base par moteur
|
| 671 |
-
base_cer = {
|
| 672 |
-
"tesseract": 0.12,
|
| 673 |
-
"pero_ocr": 0.07,
|
| 674 |
-
"ancien_moteur": 0.25,
|
| 675 |
-
}
|
| 676 |
-
|
| 677 |
-
# Sensibilité par type de dégradation (facteur multiplicatif par niveau)
|
| 678 |
-
sensitivity = {
|
| 679 |
-
"tesseract": {
|
| 680 |
-
"noise": 0.04, "blur": 0.05, "rotation": 0.06,
|
| 681 |
-
"resolution": 0.12, "binarization": 0.03,
|
| 682 |
-
},
|
| 683 |
-
"pero_ocr": {
|
| 684 |
-
"noise": 0.02, "blur": 0.03, "rotation": 0.04,
|
| 685 |
-
"resolution": 0.08, "binarization": 0.02,
|
| 686 |
-
},
|
| 687 |
-
"ancien_moteur": {
|
| 688 |
-
"noise": 0.06, "blur": 0.08, "rotation": 0.10,
|
| 689 |
-
"resolution": 0.15, "binarization": 0.05,
|
| 690 |
-
},
|
| 691 |
-
}
|
| 692 |
-
|
| 693 |
-
deg_types = ALL_DEGRADATION_TYPES
|
| 694 |
-
curves: list[DegradationCurve] = []
|
| 695 |
-
|
| 696 |
-
for engine_name in engine_names:
|
| 697 |
-
cer_base = base_cer.get(engine_name, 0.15)
|
| 698 |
-
sens = sensitivity.get(engine_name, {dt: 0.05 for dt in deg_types})
|
| 699 |
-
|
| 700 |
-
for deg_type in deg_types:
|
| 701 |
-
levels = DEGRADATION_LEVELS[deg_type]
|
| 702 |
-
labels = DEGRADATION_LABELS[deg_type]
|
| 703 |
-
s = sens.get(deg_type, 0.05)
|
| 704 |
-
|
| 705 |
-
cer_values = []
|
| 706 |
-
for i, level in enumerate(levels):
|
| 707 |
-
noise = rng.gauss(0, 0.005)
|
| 708 |
-
cer = min(1.0, cer_base + s * i + noise)
|
| 709 |
-
cer_values.append(round(max(0.0, cer), 4))
|
| 710 |
-
|
| 711 |
-
critical = RobustnessAnalyzer._find_critical_level(levels, cer_values, 0.20)
|
| 712 |
-
|
| 713 |
-
curves.append(DegradationCurve(
|
| 714 |
-
engine_name=engine_name,
|
| 715 |
-
degradation_type=deg_type,
|
| 716 |
-
levels=list(levels),
|
| 717 |
-
labels=labels[:len(levels)],
|
| 718 |
-
cer_values=cer_values,
|
| 719 |
-
critical_threshold_level=critical,
|
| 720 |
-
cer_threshold=0.20,
|
| 721 |
-
))
|
| 722 |
-
|
| 723 |
-
summary = RobustnessAnalyzer._build_summary(curves)
|
| 724 |
-
|
| 725 |
-
return RobustnessReport(
|
| 726 |
-
engine_names=engine_names,
|
| 727 |
-
corpus_name="Corpus de démonstration — Chroniques médiévales",
|
| 728 |
-
degradation_types=deg_types,
|
| 729 |
-
curves=curves,
|
| 730 |
-
summary=summary,
|
| 731 |
-
)
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.robustness`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.robustness import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.robustness import * # noqa: F401, F403
|
| 15 |
|
| 16 |
+
import picarones.measurements.robustness as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,287 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Sprint 81 (A.I.8).
|
| 3 |
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
courbes CER vs niveau de dégradation **synthétique** (bruit, flou,
|
| 10 |
-
rotation, résolution). ``picarones/core/image_quality.py`` mesure
|
| 11 |
-
le bruit/flou/contraste **réels** des images du corpus. Ce
|
| 12 |
-
sprint **projette** les caractéristiques réelles sur les courbes
|
| 13 |
-
synthétiques pour estimer le **déficit attendu de CER** sur le
|
| 14 |
-
corpus dans son état actuel.
|
| 15 |
-
|
| 16 |
-
Lecture concrète
|
| 17 |
-
----------------
|
| 18 |
-
*« 30 % de vos documents ont un bruit équivalent à σ=15 où
|
| 19 |
-
Tesseract perd 8 points de CER — soit un déficit attendu global
|
| 20 |
-
de 2,4 points (30 % × 8 points). »*
|
| 21 |
-
|
| 22 |
-
Méthode
|
| 23 |
-
-------
|
| 24 |
-
1. Pour chaque document, on extrait la valeur de qualité réelle
|
| 25 |
-
(``noise_level``, ``blur_score``, ``contrast_score``…) depuis
|
| 26 |
-
``ImageQualityResult``.
|
| 27 |
-
2. Pour chaque type de dégradation, on interpole linéairement la
|
| 28 |
-
``DegradationCurve`` synthétique : CER attendu à ce niveau.
|
| 29 |
-
3. On agrège : CER moyen attendu, % docs au-dessus du seuil
|
| 30 |
-
critique de la courbe, déficit projeté = CER_attendu -
|
| 31 |
-
CER_baseline (niveau nul).
|
| 32 |
-
|
| 33 |
-
Sortie
|
| 34 |
-
------
|
| 35 |
-
``project_robustness_on_corpus(curves, image_qualities)`` retourne
|
| 36 |
-
``{engine_name: {degradation_type: {expected_cer_mean,
|
| 37 |
-
deficit_vs_baseline, n_docs_above_critical, n_docs}}}``.
|
| 38 |
-
|
| 39 |
-
Limites
|
| 40 |
-
-------
|
| 41 |
-
- Mapping ``image_quality → degradation level`` : on suppose que
|
| 42 |
-
``noise_level`` (ImageQualityResult) correspond à σ
|
| 43 |
-
(DegradationCurve), et idem pour ``blur_score`` ↔ rayon de
|
| 44 |
-
flou. Si un corpus expose ces valeurs avec une échelle
|
| 45 |
-
différente, le mapping est documenté et l'utilisateur peut
|
| 46 |
-
passer ``quality_to_level`` custom.
|
| 47 |
-
- Interpolation **linéaire** entre les points de la courbe. Au-
|
| 48 |
-
delà des bornes, on **clip** au point extrême (pas
|
| 49 |
-
d'extrapolation hasardeuse).
|
| 50 |
"""
|
| 51 |
|
| 52 |
-
from
|
| 53 |
-
|
| 54 |
-
import logging
|
| 55 |
-
import statistics
|
| 56 |
-
from typing import Callable, Iterable, Optional
|
| 57 |
-
|
| 58 |
-
logger = logging.getLogger(__name__)
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
# Mapping par défaut entre attributs ImageQualityResult et types
|
| 62 |
-
# de dégradation synthétique. L'utilisateur peut passer un dict
|
| 63 |
-
# custom pour modifier ce mapping.
|
| 64 |
-
_DEFAULT_QUALITY_FIELD: dict[str, str] = {
|
| 65 |
-
"noise": "noise_level", # σ
|
| 66 |
-
"blur": "blur_score", # Variance laplacienne (inverse)
|
| 67 |
-
"contrast": "contrast_score",
|
| 68 |
-
"rotation": "rotation_angle",
|
| 69 |
-
"resolution": "resolution_score", # peut être absent
|
| 70 |
-
}
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
def _interpolate_cer(
|
| 74 |
-
levels: list[float],
|
| 75 |
-
cer_values: list[Optional[float]],
|
| 76 |
-
target_level: float,
|
| 77 |
-
) -> Optional[float]:
|
| 78 |
-
"""Interpolation linéaire : retourne CER attendu à
|
| 79 |
-
``target_level``.
|
| 80 |
-
|
| 81 |
-
- Si ``target_level`` est en-dessous du minimum de levels,
|
| 82 |
-
retourne le CER au minimum (clip).
|
| 83 |
-
- Si au-dessus du maximum, retourne le CER au maximum.
|
| 84 |
-
- Sinon, interpolation linéaire entre les deux points
|
| 85 |
-
encadrants.
|
| 86 |
-
- Retourne ``None`` si aucun ``cer_value`` valide.
|
| 87 |
-
"""
|
| 88 |
-
if not levels:
|
| 89 |
-
return None
|
| 90 |
-
# Filtrer les paires (level, cer) où cer est None
|
| 91 |
-
pairs = [
|
| 92 |
-
(lvl, cer) for lvl, cer in zip(levels, cer_values)
|
| 93 |
-
if cer is not None
|
| 94 |
-
]
|
| 95 |
-
if not pairs:
|
| 96 |
-
return None
|
| 97 |
-
pairs.sort(key=lambda p: p[0])
|
| 98 |
-
# Clip
|
| 99 |
-
if target_level <= pairs[0][0]:
|
| 100 |
-
return pairs[0][1]
|
| 101 |
-
if target_level >= pairs[-1][0]:
|
| 102 |
-
return pairs[-1][1]
|
| 103 |
-
# Interpolation
|
| 104 |
-
for i in range(len(pairs) - 1):
|
| 105 |
-
lo_lvl, lo_cer = pairs[i]
|
| 106 |
-
hi_lvl, hi_cer = pairs[i + 1]
|
| 107 |
-
if lo_lvl <= target_level <= hi_lvl:
|
| 108 |
-
if hi_lvl == lo_lvl:
|
| 109 |
-
return lo_cer
|
| 110 |
-
ratio = (target_level - lo_lvl) / (hi_lvl - lo_lvl)
|
| 111 |
-
return lo_cer + (hi_cer - lo_cer) * ratio
|
| 112 |
-
return None # ne devrait pas arriver
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
def _extract_quality_value(
|
| 116 |
-
quality: dict, degradation_type: str,
|
| 117 |
-
custom_mapping: Optional[dict[str, str]] = None,
|
| 118 |
-
) -> Optional[float]:
|
| 119 |
-
"""Extrait la valeur de qualité pertinente pour un type de
|
| 120 |
-
dégradation depuis un ``ImageQualityResult.as_dict()``."""
|
| 121 |
-
mapping = custom_mapping or _DEFAULT_QUALITY_FIELD
|
| 122 |
-
field = mapping.get(degradation_type)
|
| 123 |
-
if field is None:
|
| 124 |
-
return None
|
| 125 |
-
value = quality.get(field)
|
| 126 |
-
if value is None:
|
| 127 |
-
return None
|
| 128 |
-
try:
|
| 129 |
-
return float(value)
|
| 130 |
-
except (TypeError, ValueError):
|
| 131 |
-
return None
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
def project_robustness_on_corpus(
|
| 135 |
-
curves: Iterable,
|
| 136 |
-
image_qualities: list[dict],
|
| 137 |
-
*,
|
| 138 |
-
quality_to_level: Optional[Callable[[dict, str], Optional[float]]] = None,
|
| 139 |
-
critical_threshold: Optional[float] = None,
|
| 140 |
-
) -> dict:
|
| 141 |
-
"""Projette les courbes de robustesse sur les qualités réelles.
|
| 142 |
-
|
| 143 |
-
Parameters
|
| 144 |
-
----------
|
| 145 |
-
curves:
|
| 146 |
-
Itérable de ``DegradationCurve`` (ou dicts compatibles
|
| 147 |
-
avec ``engine_name``, ``degradation_type``, ``levels``,
|
| 148 |
-
``cer_values``, ``critical_threshold_level``).
|
| 149 |
-
image_qualities:
|
| 150 |
-
Liste de dicts ``ImageQualityResult.as_dict()`` (un par
|
| 151 |
-
document). Si vide, retourne une projection vide.
|
| 152 |
-
quality_to_level:
|
| 153 |
-
Fonction custom ``(quality_dict, degradation_type) →
|
| 154 |
-
Optional[float]`` pour adapter le mapping qualité→niveau.
|
| 155 |
-
Par défaut, utilise ``_DEFAULT_QUALITY_FIELD``.
|
| 156 |
-
critical_threshold:
|
| 157 |
-
Override pour le seuil critique de CER (défaut : utilise
|
| 158 |
-
``DegradationCurve.cer_threshold``).
|
| 159 |
-
|
| 160 |
-
Returns
|
| 161 |
-
-------
|
| 162 |
-
dict
|
| 163 |
-
``{
|
| 164 |
-
engine_name: {
|
| 165 |
-
degradation_type: {
|
| 166 |
-
"n_docs": int,
|
| 167 |
-
"n_docs_with_data": int, # qualité disponible
|
| 168 |
-
"expected_cer_mean": float, # moyenne CER attendu
|
| 169 |
-
"expected_cer_median": float,
|
| 170 |
-
"baseline_cer": float, # CER à niveau min
|
| 171 |
-
"deficit_vs_baseline": float,
|
| 172 |
-
"n_docs_above_critical": int,
|
| 173 |
-
"critical_threshold_level": float | None,
|
| 174 |
-
"critical_threshold_cer": float,
|
| 175 |
-
},
|
| 176 |
-
},
|
| 177 |
-
}``
|
| 178 |
-
"""
|
| 179 |
-
extractor = quality_to_level or (
|
| 180 |
-
lambda q, dt: _extract_quality_value(q, dt)
|
| 181 |
-
)
|
| 182 |
-
out: dict[str, dict] = {}
|
| 183 |
-
|
| 184 |
-
for curve in curves:
|
| 185 |
-
# Accepter dict ou DegradationCurve
|
| 186 |
-
if hasattr(curve, "as_dict"):
|
| 187 |
-
data = curve.as_dict()
|
| 188 |
-
else:
|
| 189 |
-
data = curve
|
| 190 |
-
engine = data.get("engine_name")
|
| 191 |
-
deg_type = data.get("degradation_type")
|
| 192 |
-
levels = data.get("levels") or []
|
| 193 |
-
cer_values = data.get("cer_values") or []
|
| 194 |
-
crit_lvl = data.get("critical_threshold_level")
|
| 195 |
-
crit_cer = (
|
| 196 |
-
critical_threshold
|
| 197 |
-
if critical_threshold is not None
|
| 198 |
-
else data.get("cer_threshold", 0.20)
|
| 199 |
-
)
|
| 200 |
-
if not engine or not deg_type:
|
| 201 |
-
continue
|
| 202 |
-
|
| 203 |
-
per_doc_cer: list[float] = []
|
| 204 |
-
n_docs_with_data = 0
|
| 205 |
-
n_above_critical = 0
|
| 206 |
-
for quality in image_qualities:
|
| 207 |
-
level = extractor(quality, deg_type)
|
| 208 |
-
if level is None:
|
| 209 |
-
continue
|
| 210 |
-
n_docs_with_data += 1
|
| 211 |
-
cer = _interpolate_cer(levels, cer_values, level)
|
| 212 |
-
if cer is None:
|
| 213 |
-
continue
|
| 214 |
-
per_doc_cer.append(cer)
|
| 215 |
-
if cer > crit_cer:
|
| 216 |
-
n_above_critical += 1
|
| 217 |
-
|
| 218 |
-
if not per_doc_cer:
|
| 219 |
-
continue
|
| 220 |
-
|
| 221 |
-
# Baseline = CER au niveau minimum (sans dégradation)
|
| 222 |
-
baseline = _interpolate_cer(
|
| 223 |
-
levels, cer_values,
|
| 224 |
-
min(levels) if levels else 0.0,
|
| 225 |
-
)
|
| 226 |
-
expected_mean = statistics.fmean(per_doc_cer)
|
| 227 |
-
expected_median = statistics.median(per_doc_cer)
|
| 228 |
-
deficit = (
|
| 229 |
-
expected_mean - baseline
|
| 230 |
-
if baseline is not None else None
|
| 231 |
-
)
|
| 232 |
-
|
| 233 |
-
out.setdefault(engine, {})[deg_type] = {
|
| 234 |
-
"n_docs": len(image_qualities),
|
| 235 |
-
"n_docs_with_data": n_docs_with_data,
|
| 236 |
-
"expected_cer_mean": expected_mean,
|
| 237 |
-
"expected_cer_median": expected_median,
|
| 238 |
-
"baseline_cer": baseline,
|
| 239 |
-
"deficit_vs_baseline": deficit,
|
| 240 |
-
"n_docs_above_critical": n_above_critical,
|
| 241 |
-
"critical_threshold_level": crit_lvl,
|
| 242 |
-
"critical_threshold_cer": crit_cer,
|
| 243 |
-
}
|
| 244 |
-
return out
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
def aggregate_projection_per_engine(projection: dict) -> dict:
|
| 248 |
-
"""Pour chaque moteur, agrège le déficit projeté en sommant
|
| 249 |
-
sur tous les types de dégradation.
|
| 250 |
-
|
| 251 |
-
Lecture : *« déficit total attendu pour Tesseract = 5,2 points
|
| 252 |
-
de CER si on considère les 4 dégradations indépendamment »*.
|
| 253 |
-
|
| 254 |
-
Note : la sommation **suppose l'indépendance** des
|
| 255 |
-
dégradations, ce qui n'est pas strictement vrai mais reste
|
| 256 |
-
une approximation utile pour le diagnostic.
|
| 257 |
-
"""
|
| 258 |
-
out: dict[str, dict] = {}
|
| 259 |
-
for engine, per_type in projection.items():
|
| 260 |
-
total_deficit = 0.0
|
| 261 |
-
n_types_with_data = 0
|
| 262 |
-
max_deficit_type: Optional[tuple[str, float]] = None
|
| 263 |
-
for deg_type, stats in per_type.items():
|
| 264 |
-
deficit = stats.get("deficit_vs_baseline")
|
| 265 |
-
if deficit is None:
|
| 266 |
-
continue
|
| 267 |
-
total_deficit += deficit
|
| 268 |
-
n_types_with_data += 1
|
| 269 |
-
if max_deficit_type is None or deficit > max_deficit_type[1]:
|
| 270 |
-
max_deficit_type = (deg_type, deficit)
|
| 271 |
-
out[engine] = {
|
| 272 |
-
"total_expected_deficit": total_deficit,
|
| 273 |
-
"n_degradation_types": n_types_with_data,
|
| 274 |
-
"worst_degradation_type": (
|
| 275 |
-
max_deficit_type[0] if max_deficit_type else None
|
| 276 |
-
),
|
| 277 |
-
"worst_degradation_deficit": (
|
| 278 |
-
max_deficit_type[1] if max_deficit_type else None
|
| 279 |
-
),
|
| 280 |
-
}
|
| 281 |
-
return out
|
| 282 |
-
|
| 283 |
|
| 284 |
-
|
| 285 |
-
|
| 286 |
-
"
|
| 287 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.robustness_projection`.
|
|
|
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.robustness_projection import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.robustness_projection import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.robustness_projection as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,225 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
un usage *recherche plein-texte* (ce que font Elastic, Solr en
|
| 9 |
-
mode fuzzy, ou la recherche full-text de Gallica), la question
|
| 10 |
-
réelle est :
|
| 11 |
-
|
| 12 |
-
*« Combien de mots de ma GT sont retrouvables dans la
|
| 13 |
-
sortie OCR, à orthographe approchée près ? »*
|
| 14 |
-
|
| 15 |
-
Un CER de 8 % peut donner 95 % de findability si les erreurs
|
| 16 |
-
sont concentrées sur des caractères non-significatifs ou sur
|
| 17 |
-
quelques mots aberrants ; à l'inverse, 4 % de CER mais
|
| 18 |
-
distribué sur tous les noms propres rend le corpus inutilisable
|
| 19 |
-
pour l'indexation prosopographique.
|
| 20 |
-
|
| 21 |
-
Méthode
|
| 22 |
-
-------
|
| 23 |
-
Pour chaque token GT, on regarde s'il existe au moins un token
|
| 24 |
-
hypothèse à distance de Levenshtein ≤ ``max_distance`` (défaut
|
| 25 |
-
2, valeur Elastic ``fuzziness: AUTO`` standard pour mots ≥ 5
|
| 26 |
-
caractères). Le **rappel** est la proportion de tokens GT
|
| 27 |
-
ainsi retrouvés.
|
| 28 |
-
|
| 29 |
-
Multiplicité
|
| 30 |
-
------------
|
| 31 |
-
Si la GT contient *« le »* deux fois et l'hypothèse une fois,
|
| 32 |
-
seul un token GT est compté comme retrouvé (alignement
|
| 33 |
-
multi-set, comme ``rare_token_recall`` Sprint 71).
|
| 34 |
-
|
| 35 |
-
Sortie
|
| 36 |
-
------
|
| 37 |
-
``compute_searchability(reference, hypothesis)`` retourne
|
| 38 |
-
``{n_gt_tokens, n_searchable, recall, missed_tokens}``.
|
| 39 |
-
|
| 40 |
-
Limites documentées
|
| 41 |
-
-------------------
|
| 42 |
-
- Tokenisation par split sur whitespace (cohérent avec le reste
|
| 43 |
-
du codebase). Pas de stemming ni de lemmatisation.
|
| 44 |
-
- Levenshtein non pondéré — substitution = insertion = suppression
|
| 45 |
-
= 1. Pour un poids différent (par ex. faute classique
|
| 46 |
-
diacritique = 0,5), passer une fonction custom.
|
| 47 |
-
- Pas de sémantique : *« roi »* ≠ *« souverain »*. Pour la
|
| 48 |
-
similarité sémantique, voir des modules futurs (BERTScore).
|
| 49 |
"""
|
| 50 |
|
| 51 |
-
from
|
| 52 |
-
|
| 53 |
-
import logging
|
| 54 |
-
from typing import Optional
|
| 55 |
-
|
| 56 |
-
from picarones.core.metric_registry import register_metric
|
| 57 |
-
from picarones.core.modules import ArtifactType
|
| 58 |
-
|
| 59 |
-
logger = logging.getLogger(__name__)
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 63 |
-
# Tokenisation et distance d'édition
|
| 64 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
def _split_words(text: Optional[str]) -> list[str]:
|
| 68 |
-
"""Tokenisation par whitespace — cohérent avec
|
| 69 |
-
``lexical_modernization.py``, ``rare_tokens.py``, etc."""
|
| 70 |
-
if not text:
|
| 71 |
-
return []
|
| 72 |
-
return text.split()
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
def levenshtein_distance(a: str, b: str) -> int:
|
| 76 |
-
"""Distance de Levenshtein (substitution=insertion=suppression=1).
|
| 77 |
-
|
| 78 |
-
Implémentation DP O(|a|·|b|) en mémoire O(min(|a|,|b|)).
|
| 79 |
-
"""
|
| 80 |
-
if a == b:
|
| 81 |
-
return 0
|
| 82 |
-
if len(a) < len(b):
|
| 83 |
-
a, b = b, a
|
| 84 |
-
# |a| ≥ |b|
|
| 85 |
-
if not b:
|
| 86 |
-
return len(a)
|
| 87 |
-
previous = list(range(len(b) + 1))
|
| 88 |
-
for i, ca in enumerate(a, start=1):
|
| 89 |
-
current = [i] + [0] * len(b)
|
| 90 |
-
for j, cb in enumerate(b, start=1):
|
| 91 |
-
cost = 0 if ca == cb else 1
|
| 92 |
-
current[j] = min(
|
| 93 |
-
current[j - 1] + 1, # insertion
|
| 94 |
-
previous[j] + 1, # suppression
|
| 95 |
-
previous[j - 1] + cost, # substitution
|
| 96 |
-
)
|
| 97 |
-
previous = current
|
| 98 |
-
return previous[-1]
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 102 |
-
# Calcul principal
|
| 103 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
def compute_searchability(
|
| 107 |
-
reference: Optional[str],
|
| 108 |
-
hypothesis: Optional[str],
|
| 109 |
-
*,
|
| 110 |
-
max_distance: int = 2,
|
| 111 |
-
case_sensitive: bool = False,
|
| 112 |
-
) -> dict:
|
| 113 |
-
"""Recherchabilité fuzzy de ``reference`` dans ``hypothesis``.
|
| 114 |
-
|
| 115 |
-
Parameters
|
| 116 |
-
----------
|
| 117 |
-
reference, hypothesis:
|
| 118 |
-
Transcriptions GT et OCR.
|
| 119 |
-
max_distance:
|
| 120 |
-
Seuil de distance de Levenshtein (≤ pour considérer un
|
| 121 |
-
token comme retrouvé). Défaut 2 — convention
|
| 122 |
-
``fuzziness: AUTO`` d'Elastic pour mots ≥ 5 caractères.
|
| 123 |
-
case_sensitive:
|
| 124 |
-
Si False (défaut), casse insensible côté match — la
|
| 125 |
-
sortie ``missed_tokens`` reste avec la casse GT
|
| 126 |
-
originale.
|
| 127 |
-
|
| 128 |
-
Returns
|
| 129 |
-
-------
|
| 130 |
-
dict
|
| 131 |
-
``{
|
| 132 |
-
"n_gt_tokens": int,
|
| 133 |
-
"n_searchable": int,
|
| 134 |
-
"recall": float | None, # None si n_gt_tokens == 0
|
| 135 |
-
"missed_tokens": list[str],
|
| 136 |
-
"max_distance": int,
|
| 137 |
-
}``
|
| 138 |
-
"""
|
| 139 |
-
if max_distance < 0:
|
| 140 |
-
raise ValueError(f"max_distance doit être ≥ 0, reçu {max_distance}")
|
| 141 |
-
gt_tokens = _split_words(reference)
|
| 142 |
-
hyp_tokens = _split_words(hypothesis)
|
| 143 |
-
n_gt = len(gt_tokens)
|
| 144 |
-
if n_gt == 0:
|
| 145 |
-
return {
|
| 146 |
-
"n_gt_tokens": 0,
|
| 147 |
-
"n_searchable": 0,
|
| 148 |
-
"recall": None,
|
| 149 |
-
"missed_tokens": [],
|
| 150 |
-
"max_distance": max_distance,
|
| 151 |
-
}
|
| 152 |
-
# Multi-set : un token hypothèse ne peut servir qu'une fois.
|
| 153 |
-
# Tri par longueur croissante pour matcher d'abord les
|
| 154 |
-
# tokens GT les plus courts (où ε-fautes sont plus rares).
|
| 155 |
-
if case_sensitive:
|
| 156 |
-
gt_for_match = list(gt_tokens)
|
| 157 |
-
hyp_for_match = list(hyp_tokens)
|
| 158 |
-
else:
|
| 159 |
-
gt_for_match = [t.lower() for t in gt_tokens]
|
| 160 |
-
hyp_for_match = [t.lower() for t in hyp_tokens]
|
| 161 |
-
|
| 162 |
-
hyp_used = [False] * len(hyp_for_match)
|
| 163 |
-
n_searchable = 0
|
| 164 |
-
missed: list[str] = []
|
| 165 |
-
for gi, gt_match in enumerate(gt_for_match):
|
| 166 |
-
# Court-circuit si match exact disponible
|
| 167 |
-
best_idx = -1
|
| 168 |
-
best_dist = max_distance + 1
|
| 169 |
-
for hi, used in enumerate(hyp_used):
|
| 170 |
-
if used:
|
| 171 |
-
continue
|
| 172 |
-
hyp_match = hyp_for_match[hi]
|
| 173 |
-
# Court-circuit longueur (Levenshtein ≥ |Δlen|)
|
| 174 |
-
if abs(len(hyp_match) - len(gt_match)) > max_distance:
|
| 175 |
-
continue
|
| 176 |
-
d = levenshtein_distance(gt_match, hyp_match)
|
| 177 |
-
if d < best_dist:
|
| 178 |
-
best_dist = d
|
| 179 |
-
best_idx = hi
|
| 180 |
-
if d == 0:
|
| 181 |
-
break # match exact, inutile de chercher mieux
|
| 182 |
-
if best_idx >= 0 and best_dist <= max_distance:
|
| 183 |
-
hyp_used[best_idx] = True
|
| 184 |
-
n_searchable += 1
|
| 185 |
-
else:
|
| 186 |
-
missed.append(gt_tokens[gi])
|
| 187 |
-
recall = n_searchable / n_gt
|
| 188 |
-
return {
|
| 189 |
-
"n_gt_tokens": n_gt,
|
| 190 |
-
"n_searchable": n_searchable,
|
| 191 |
-
"recall": recall,
|
| 192 |
-
"missed_tokens": missed,
|
| 193 |
-
"max_distance": max_distance,
|
| 194 |
-
}
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 198 |
-
# Enregistrement registre typé (Sprint 34)
|
| 199 |
-
# ──────────────────────────────────────────────────────────────────────────
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
@register_metric(
|
| 203 |
-
name="searchability_recall",
|
| 204 |
-
input_types=(ArtifactType.TEXT, ArtifactType.TEXT),
|
| 205 |
-
description=(
|
| 206 |
-
"Recherchabilité fuzzy : proportion de tokens GT retrouvés "
|
| 207 |
-
"dans l'OCR à distance de Levenshtein ≤ 2. Proxy direct de "
|
| 208 |
-
"la qualité pour la recherche plein-texte (Elastic, Solr)."
|
| 209 |
-
),
|
| 210 |
-
)
|
| 211 |
-
def searchability_recall_metric(reference: str, hypothesis: str) -> float:
|
| 212 |
-
"""Variante scalaire pour le registre typé : retourne le
|
| 213 |
-
rappel en [0, 1], ou ``0.0`` si la GT est vide (convention
|
| 214 |
-
cohérente avec rare_token_recall Sprint 71).
|
| 215 |
-
"""
|
| 216 |
-
result = compute_searchability(reference, hypothesis)
|
| 217 |
-
recall = result.get("recall")
|
| 218 |
-
return 0.0 if recall is None else recall
|
| 219 |
-
|
| 220 |
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
"
|
| 224 |
-
|
| 225 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.searchability`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.searchability import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.searchability import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.searchability as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
@@ -1,81 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
Adaptive masking
|
| 10 |
-
----------------
|
| 11 |
-
Comme pour les modules philologiques (Sprint 61), on ne calcule
|
| 12 |
-
le rappel que si la GT contient au moins un token — pas de
|
| 13 |
-
calcul vide qui produirait du bruit dans le rapport.
|
| 14 |
"""
|
| 15 |
|
| 16 |
-
from
|
| 17 |
-
|
| 18 |
-
import logging
|
| 19 |
-
from typing import Iterable, Optional
|
| 20 |
-
|
| 21 |
-
from picarones.core.searchability import (
|
| 22 |
-
_split_words,
|
| 23 |
-
compute_searchability,
|
| 24 |
-
)
|
| 25 |
-
|
| 26 |
-
logger = logging.getLogger(__name__)
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
def compute_searchability_metrics(
|
| 30 |
-
reference: Optional[str],
|
| 31 |
-
hypothesis: Optional[str],
|
| 32 |
-
*,
|
| 33 |
-
max_distance: int = 2,
|
| 34 |
-
) -> Optional[dict]:
|
| 35 |
-
"""Recherchabilité d'un document (adaptive).
|
| 36 |
-
|
| 37 |
-
Retourne ``None`` si la GT est vide ou ne contient aucun
|
| 38 |
-
token — ce qui déclenche l'adaptive masking côté HTML.
|
| 39 |
-
"""
|
| 40 |
-
if not reference or not _split_words(reference):
|
| 41 |
-
return None
|
| 42 |
-
return compute_searchability(
|
| 43 |
-
reference, hypothesis or "", max_distance=max_distance,
|
| 44 |
-
)
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
def aggregate_searchability_metrics(
|
| 48 |
-
per_doc: Iterable[Optional[dict]],
|
| 49 |
-
) -> Optional[dict]:
|
| 50 |
-
"""Agrège les métriques par-doc en un score corpus-wide.
|
| 51 |
-
|
| 52 |
-
Convention : on somme les ``n_gt_tokens`` et ``n_searchable``
|
| 53 |
-
et on recalcule un rappel **micro** (cohérent avec ECE/MCE
|
| 54 |
-
Sprint 39 et NER Sprint 38).
|
| 55 |
-
"""
|
| 56 |
-
docs = [d for d in per_doc if d]
|
| 57 |
-
if not docs:
|
| 58 |
-
return None
|
| 59 |
-
n_gt = sum(int(d.get("n_gt_tokens") or 0) for d in docs)
|
| 60 |
-
n_search = sum(int(d.get("n_searchable") or 0) for d in docs)
|
| 61 |
-
if n_gt == 0:
|
| 62 |
-
return None
|
| 63 |
-
# On garde l'union des missed_tokens (capped pour ne pas
|
| 64 |
-
# exploser le JSON sur de gros corpus)
|
| 65 |
-
missed: list[str] = []
|
| 66 |
-
for d in docs:
|
| 67 |
-
missed.extend(d.get("missed_tokens") or [])
|
| 68 |
-
return {
|
| 69 |
-
"n_docs": len(docs),
|
| 70 |
-
"n_gt_tokens": n_gt,
|
| 71 |
-
"n_searchable": n_search,
|
| 72 |
-
"recall": n_search / n_gt,
|
| 73 |
-
"missed_tokens_sample": missed[:50],
|
| 74 |
-
"max_distance": docs[0].get("max_distance", 2),
|
| 75 |
-
}
|
| 76 |
-
|
| 77 |
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
"
|
| 81 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.searchability_runner`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.searchability_runner import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.searchability_runner import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.searchability_runner as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
@@ -1,187 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
``inter_engine.taxonomy_divergence_matrix``) répond à *« à quel
|
| 9 |
-
point ces moteurs se trompent-ils différemment ? »*. Ce
|
| 10 |
-
sprint la transforme en un **score de spécialisation** lisible
|
| 11 |
-
et complète la lecture par :
|
| 12 |
-
|
| 13 |
-
- une **classification** discrète (similar / distinct /
|
| 14 |
-
highly_specialized) que le chercheur peut consommer sans
|
| 15 |
-
avoir à interpréter une distance ;
|
| 16 |
-
- un **top-N des paires** les plus spécialisées, qui répond
|
| 17 |
-
directement à la question *« quels moteurs sont les meilleurs
|
| 18 |
-
candidats pour un voting ensemble ? »*.
|
| 19 |
-
|
| 20 |
-
Ce module **ne recommande pas** de pipeline d'ensemble — il
|
| 21 |
-
fournit l'observation factuelle et laisse le chercheur arbitrer.
|
| 22 |
-
|
| 23 |
-
Convention de score
|
| 24 |
-
-------------------
|
| 25 |
-
On utilise la **Jensen-Shannon divergence** déjà calculée par
|
| 26 |
-
``inter_engine.jensen_shannon_divergence`` : elle est
|
| 27 |
-
symétrique, bornée dans [0, 1], et son interprétation est
|
| 28 |
-
intuitive :
|
| 29 |
-
|
| 30 |
-
- ≈ 0 → profils taxonomiques identiques
|
| 31 |
-
- 1 → distributions totalement disjointes
|
| 32 |
-
|
| 33 |
-
Dépendances
|
| 34 |
-
-----------
|
| 35 |
-
S'appuie strictement sur ``picarones.core.inter_engine`` (Sprint
|
| 36 |
-
35) — pas de double calcul, pas de logique nouvelle de
|
| 37 |
-
divergence.
|
| 38 |
"""
|
| 39 |
|
| 40 |
-
from
|
| 41 |
-
|
| 42 |
-
import logging
|
| 43 |
-
from typing import Optional
|
| 44 |
-
|
| 45 |
-
from picarones.core.inter_engine import jensen_shannon_divergence
|
| 46 |
-
|
| 47 |
-
logger = logging.getLogger(__name__)
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
# Seuils par convention éditoriale. La roadmap ne fixe rien :
|
| 51 |
-
# ces seuils sont des **guides de lecture**, pas des verdicts.
|
| 52 |
-
# Le chercheur peut les surcharger via ``classify_specialization``.
|
| 53 |
-
DEFAULT_THRESHOLDS = (
|
| 54 |
-
("similar", 0.10),
|
| 55 |
-
("distinct", 0.30),
|
| 56 |
-
("highly_specialized", 1.01), # tout score ≥ 0.30
|
| 57 |
-
)
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
def compute_specialization_score(
|
| 61 |
-
taxonomy_a: dict[str, float],
|
| 62 |
-
taxonomy_b: dict[str, float],
|
| 63 |
-
) -> float:
|
| 64 |
-
"""Score de spécialisation entre deux moteurs ∈ [0, 1].
|
| 65 |
-
|
| 66 |
-
0 = mêmes erreurs, 1 = erreurs totalement disjointes.
|
| 67 |
-
Délègue à ``jensen_shannon_divergence`` (Sprint 35).
|
| 68 |
-
"""
|
| 69 |
-
return jensen_shannon_divergence(taxonomy_a, taxonomy_b)
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
def classify_specialization(
|
| 73 |
-
score: float,
|
| 74 |
-
thresholds: Optional[tuple[tuple[str, float], ...]] = None,
|
| 75 |
-
) -> str:
|
| 76 |
-
"""Classe le score en catégorie discrète.
|
| 77 |
-
|
| 78 |
-
Convention :
|
| 79 |
-
- score < 0.10 → ``similar``
|
| 80 |
-
- 0.10 ≤ score < 0.30 → ``distinct``
|
| 81 |
-
- score ≥ 0.30 → ``highly_specialized``
|
| 82 |
-
|
| 83 |
-
L'utilisateur peut passer ses propres ``thresholds`` (liste
|
| 84 |
-
triée par valeur croissante de tuples ``(label, max_score)``).
|
| 85 |
-
"""
|
| 86 |
-
rules = thresholds or DEFAULT_THRESHOLDS
|
| 87 |
-
for label, max_score in rules:
|
| 88 |
-
if score < max_score:
|
| 89 |
-
return label
|
| 90 |
-
# Garde-fou : si aucun seuil ne match, dernière catégorie
|
| 91 |
-
return rules[-1][0]
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
def compute_specialization_matrix(
|
| 95 |
-
taxonomies: dict[str, dict[str, float]],
|
| 96 |
-
) -> Optional[dict]:
|
| 97 |
-
"""Matrice de spécialisation symétrique entre tous les moteurs.
|
| 98 |
-
|
| 99 |
-
Parameters
|
| 100 |
-
----------
|
| 101 |
-
taxonomies:
|
| 102 |
-
Map ``{engine_name: {error_class: count_or_proportion}}``.
|
| 103 |
-
|
| 104 |
-
Returns
|
| 105 |
-
-------
|
| 106 |
-
dict | None
|
| 107 |
-
``{
|
| 108 |
-
"engines": list[str],
|
| 109 |
-
"matrix": list[list[float]], # carrée, symétrique
|
| 110 |
-
"n_pairs": int, # paires distinctes
|
| 111 |
-
"max_score": float,
|
| 112 |
-
"max_pair": (str, str) | None,
|
| 113 |
-
}`` ; ``None`` si moins de 2 moteurs.
|
| 114 |
-
"""
|
| 115 |
-
if not taxonomies or len(taxonomies) < 2:
|
| 116 |
-
return None
|
| 117 |
-
engines = sorted(taxonomies.keys())
|
| 118 |
-
n = len(engines)
|
| 119 |
-
matrix = [[0.0] * n for _ in range(n)]
|
| 120 |
-
n_pairs = 0
|
| 121 |
-
max_score = 0.0
|
| 122 |
-
max_pair: Optional[tuple[str, str]] = None
|
| 123 |
-
for i in range(n):
|
| 124 |
-
for j in range(i + 1, n):
|
| 125 |
-
score = compute_specialization_score(
|
| 126 |
-
taxonomies[engines[i]], taxonomies[engines[j]],
|
| 127 |
-
)
|
| 128 |
-
matrix[i][j] = score
|
| 129 |
-
matrix[j][i] = score
|
| 130 |
-
n_pairs += 1
|
| 131 |
-
if score > max_score:
|
| 132 |
-
max_score = score
|
| 133 |
-
max_pair = (engines[i], engines[j])
|
| 134 |
-
return {
|
| 135 |
-
"engines": engines,
|
| 136 |
-
"matrix": matrix,
|
| 137 |
-
"n_pairs": n_pairs,
|
| 138 |
-
"max_score": max_score,
|
| 139 |
-
"max_pair": max_pair,
|
| 140 |
-
}
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
def top_specialized_pairs(
|
| 144 |
-
matrix_data: Optional[dict],
|
| 145 |
-
n: int = 5,
|
| 146 |
-
*,
|
| 147 |
-
min_score: float = 0.0,
|
| 148 |
-
) -> list[dict]:
|
| 149 |
-
"""Top-N paires de moteurs triées par score décroissant.
|
| 150 |
-
|
| 151 |
-
Returns
|
| 152 |
-
-------
|
| 153 |
-
list[dict]
|
| 154 |
-
Une liste de ``{
|
| 155 |
-
"engine_a": str, "engine_b": str,
|
| 156 |
-
"score": float, "category": str,
|
| 157 |
-
}`` triée par score décroissant. Liste vide si
|
| 158 |
-
``matrix_data`` est ``None`` ou que toutes les paires
|
| 159 |
-
sont sous ``min_score``.
|
| 160 |
-
"""
|
| 161 |
-
if not matrix_data:
|
| 162 |
-
return []
|
| 163 |
-
engines = matrix_data["engines"]
|
| 164 |
-
matrix = matrix_data["matrix"]
|
| 165 |
-
pairs: list[dict] = []
|
| 166 |
-
for i, engine_a in enumerate(engines):
|
| 167 |
-
for j in range(i + 1, len(engines)):
|
| 168 |
-
score = matrix[i][j]
|
| 169 |
-
if score < min_score:
|
| 170 |
-
continue
|
| 171 |
-
pairs.append({
|
| 172 |
-
"engine_a": engine_a,
|
| 173 |
-
"engine_b": engines[j],
|
| 174 |
-
"score": score,
|
| 175 |
-
"category": classify_specialization(score),
|
| 176 |
-
})
|
| 177 |
-
pairs.sort(key=lambda p: -p["score"])
|
| 178 |
-
return pairs[:n]
|
| 179 |
-
|
| 180 |
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
"
|
| 184 |
-
|
| 185 |
-
"compute_specialization_matrix",
|
| 186 |
-
"top_specialized_pairs",
|
| 187 |
-
]
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.specialization`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.specialization import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"""
|
| 13 |
|
| 14 |
+
from picarones.measurements.specialization import * # noqa: F401, F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
import picarones.measurements.specialization as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
@@ -1,1127 +1,19 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
- friedman_test(engine_cer_map) : Friedman (k moteurs, n documents) [Sprint 17]
|
| 9 |
-
- nemenyi_posthoc(engine_cer_map) : post-hoc Nemenyi avec critical distance [Sprint 17]
|
| 10 |
-
- build_critical_difference_svg(...) : rendu SVG du CDD (Demšar 2006) [Sprint 17]
|
| 11 |
-
- compute_pareto_front(points, ...) : frontière de Pareto multi-objectifs [Sprint 19]
|
| 12 |
-
- cluster_errors(...) : regroupement des patterns d'erreurs
|
| 13 |
-
- compute_correlation_matrix(...) : matrice de corrélation des métriques
|
| 14 |
-
- compute_reliability_curve(...) : courbe CER vs. % docs les plus faciles
|
| 15 |
-
- compute_venn_data(...) : diagramme de Venn 2/3 moteurs
|
| 16 |
-
"""
|
| 17 |
-
|
| 18 |
-
from __future__ import annotations
|
| 19 |
-
|
| 20 |
-
import math
|
| 21 |
-
import random
|
| 22 |
-
import re
|
| 23 |
-
from collections import defaultdict
|
| 24 |
-
from dataclasses import dataclass
|
| 25 |
-
from typing import Optional
|
| 26 |
-
|
| 27 |
-
# Import optionnel de scipy — utilisé pour le test de Wilcoxon si disponible
|
| 28 |
-
# (méthode exacte pour n ≤ 25, approximation normale pour n > 25).
|
| 29 |
-
# En son absence, l'implémentation native (approximation normale pour n ≥ 10)
|
| 30 |
-
# est utilisée automatiquement.
|
| 31 |
-
try:
|
| 32 |
-
from scipy.stats import wilcoxon as _scipy_wilcoxon # type: ignore[import-untyped]
|
| 33 |
-
_SCIPY_AVAILABLE = True
|
| 34 |
-
except ImportError:
|
| 35 |
-
_SCIPY_AVAILABLE = False
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
# ---------------------------------------------------------------------------
|
| 39 |
-
# Bootstrap CI
|
| 40 |
-
# ---------------------------------------------------------------------------
|
| 41 |
-
|
| 42 |
-
def bootstrap_ci(
|
| 43 |
-
values: list[float],
|
| 44 |
-
n_iter: int = 1000,
|
| 45 |
-
ci: float = 0.95,
|
| 46 |
-
seed: int = 42,
|
| 47 |
-
) -> tuple[float, float]:
|
| 48 |
-
"""Intervalle de confiance par bootstrap.
|
| 49 |
-
|
| 50 |
-
Parameters
|
| 51 |
-
----------
|
| 52 |
-
values : liste des valeurs (ex. CER par document)
|
| 53 |
-
n_iter : nombre d'itérations bootstrap (défaut 1000)
|
| 54 |
-
ci : niveau de confiance (défaut 0.95 → 95 %)
|
| 55 |
-
seed : graine RNG pour reproductibilité
|
| 56 |
-
|
| 57 |
-
Returns
|
| 58 |
-
-------
|
| 59 |
-
(lower, upper) — les bornes de l'IC à ``ci`` %
|
| 60 |
-
"""
|
| 61 |
-
if not values:
|
| 62 |
-
return (0.0, 0.0)
|
| 63 |
-
rng = random.Random(seed)
|
| 64 |
-
n = len(values)
|
| 65 |
-
means = []
|
| 66 |
-
for _ in range(n_iter):
|
| 67 |
-
sample = [values[rng.randint(0, n - 1)] for _ in range(n)]
|
| 68 |
-
means.append(sum(sample) / n)
|
| 69 |
-
means.sort()
|
| 70 |
-
alpha = (1.0 - ci) / 2.0
|
| 71 |
-
lo_idx = max(0, int(alpha * n_iter))
|
| 72 |
-
hi_idx = min(n_iter - 1, int((1.0 - alpha) * n_iter))
|
| 73 |
-
return (means[lo_idx], means[hi_idx])
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
# ---------------------------------------------------------------------------
|
| 77 |
-
# Test de Wilcoxon signé-rangé (implémentation pure Python)
|
| 78 |
-
# ---------------------------------------------------------------------------
|
| 79 |
-
|
| 80 |
-
def wilcoxon_test(
|
| 81 |
-
a: list[float],
|
| 82 |
-
b: list[float],
|
| 83 |
-
zero_method: str = "wilcox",
|
| 84 |
-
) -> dict:
|
| 85 |
-
"""Test de Wilcoxon signé-rangé entre deux séries de CER appariées.
|
| 86 |
-
|
| 87 |
-
Retourne un dict avec :
|
| 88 |
-
- statistic : W = min(W⁺, W⁻)
|
| 89 |
-
- p_value : p-value bilatérale
|
| 90 |
-
- significant : bool (p < 0.05)
|
| 91 |
-
- interpretation : phrase lisible
|
| 92 |
-
- n_pairs : nombre de paires utilisées (après retrait des zéros)
|
| 93 |
-
- W_plus : somme des rangs des différences positives
|
| 94 |
-
- W_minus : somme des rangs des différences négatives
|
| 95 |
-
|
| 96 |
-
Hypothèses et limites
|
| 97 |
-
---------------------
|
| 98 |
-
* Les observations sont appariées (même corpus, deux moteurs différents).
|
| 99 |
-
* Le test est non-paramétrique : aucune hypothèse de normalité des CER.
|
| 100 |
-
* ``zero_method="wilcox"`` (défaut) : les paires sans différence (aᵢ = bᵢ)
|
| 101 |
-
sont simplement exclues. Les autres méthodes (``"pratt"``, ``"zsplit"``)
|
| 102 |
-
nécessitent scipy.
|
| 103 |
-
* **Approximation normale** (implémentation native, n ≥ 10) :
|
| 104 |
-
L'approximation est raisonnable pour n ≥ 10 et converge vers la
|
| 105 |
-
distribution exacte. Pour n < 10, une table critique simplifiée est
|
| 106 |
-
utilisée (p ∈ {0.04, 0.20}) — résultat **conservateur**.
|
| 107 |
-
* **scipy** (si installé) : ``scipy.stats.wilcoxon`` est utilisé à la place
|
| 108 |
-
de l'approximation native. scipy utilise la méthode exacte pour n ≤ 25
|
| 109 |
-
et l'approximation normale pour n > 25, ce qui est plus précis.
|
| 110 |
-
* **Validité** : le test suppose la symétrie de la distribution des
|
| 111 |
-
différences. Avec de très petits n (< 5), les résultats sont peu fiables
|
| 112 |
-
quelle que soit la méthode.
|
| 113 |
-
|
| 114 |
-
Parameters
|
| 115 |
-
----------
|
| 116 |
-
a, b : séries de CER (même longueur, même ordre de documents)
|
| 117 |
-
zero_method : gestion des paires nulles (défaut : ``"wilcox"``)
|
| 118 |
-
"""
|
| 119 |
-
if len(a) != len(b):
|
| 120 |
-
raise ValueError("Les deux listes doivent avoir la même longueur")
|
| 121 |
-
|
| 122 |
-
diffs = [x - y for x, y in zip(a, b)]
|
| 123 |
-
|
| 124 |
-
# Retirer les zéros (méthode "wilcox")
|
| 125 |
-
if zero_method == "wilcox":
|
| 126 |
-
diffs = [d for d in diffs if d != 0.0]
|
| 127 |
-
|
| 128 |
-
n = len(diffs)
|
| 129 |
-
if n == 0:
|
| 130 |
-
return {
|
| 131 |
-
"statistic": 0.0,
|
| 132 |
-
"p_value": 1.0,
|
| 133 |
-
"significant": False,
|
| 134 |
-
"interpretation": "Aucune différence entre les deux concurrents.",
|
| 135 |
-
"n_pairs": 0,
|
| 136 |
-
}
|
| 137 |
-
|
| 138 |
-
# Rangs des valeurs absolues
|
| 139 |
-
abs_diffs = [abs(d) for d in diffs]
|
| 140 |
-
indexed = sorted(enumerate(abs_diffs), key=lambda x: x[1])
|
| 141 |
-
|
| 142 |
-
# Gestion des ex-aequo : rang moyen
|
| 143 |
-
ranks = [0.0] * n
|
| 144 |
-
i = 0
|
| 145 |
-
while i < n:
|
| 146 |
-
j = i
|
| 147 |
-
while j < n and abs_diffs[indexed[j][0]] == abs_diffs[indexed[i][0]]:
|
| 148 |
-
j += 1
|
| 149 |
-
avg_rank = (i + j + 1) / 2.0 # rang moyen (1-based)
|
| 150 |
-
for k in range(i, j):
|
| 151 |
-
ranks[indexed[k][0]] = avg_rank
|
| 152 |
-
i = j
|
| 153 |
-
|
| 154 |
-
W_plus = sum(ranks[k] for k in range(n) if diffs[k] > 0)
|
| 155 |
-
W_minus = sum(ranks[k] for k in range(n) if diffs[k] < 0)
|
| 156 |
-
W = min(W_plus, W_minus)
|
| 157 |
-
|
| 158 |
-
# Calcul de la p-value : scipy si disponible, sinon approximation native
|
| 159 |
-
if _SCIPY_AVAILABLE:
|
| 160 |
-
try:
|
| 161 |
-
scipy_res = _scipy_wilcoxon(diffs, zero_method=zero_method)
|
| 162 |
-
p_value = float(scipy_res.pvalue)
|
| 163 |
-
except Exception:
|
| 164 |
-
# Repli sur l'implémentation native en cas d'erreur scipy
|
| 165 |
-
p_value = _native_p_value(n, W)
|
| 166 |
-
else:
|
| 167 |
-
p_value = _native_p_value(n, W)
|
| 168 |
-
|
| 169 |
-
significant = p_value < 0.05
|
| 170 |
-
|
| 171 |
-
if significant:
|
| 172 |
-
better = "premier" if W_plus < W_minus else "second"
|
| 173 |
-
interpretation = (
|
| 174 |
-
f"Différence statistiquement significative (p = {p_value:.4f} < 0.05). "
|
| 175 |
-
f"Le {better} concurrent obtient de meilleurs scores."
|
| 176 |
-
)
|
| 177 |
-
else:
|
| 178 |
-
interpretation = (
|
| 179 |
-
f"Différence non significative (p = {p_value:.4f} ≥ 0.05). "
|
| 180 |
-
"On ne peut pas conclure que l'un surpasse l'autre."
|
| 181 |
-
)
|
| 182 |
-
|
| 183 |
-
return {
|
| 184 |
-
"statistic": round(W, 4),
|
| 185 |
-
"p_value": round(p_value, 6),
|
| 186 |
-
"significant": significant,
|
| 187 |
-
"interpretation": interpretation,
|
| 188 |
-
"n_pairs": n,
|
| 189 |
-
"W_plus": round(W_plus, 4),
|
| 190 |
-
"W_minus": round(W_minus, 4),
|
| 191 |
-
}
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
def _normal_sf(z: float) -> float:
|
| 195 |
-
"""Survival function de la loi normale standard (1 - CDF)."""
|
| 196 |
-
# Approximation Abramowitz & Stegun 26.2.17
|
| 197 |
-
t = 1.0 / (1.0 + 0.2316419 * abs(z))
|
| 198 |
-
poly = t * (0.319381530 + t * (-0.356563782 + t * (1.781477937
|
| 199 |
-
+ t * (-1.821255978 + t * 1.330274429))))
|
| 200 |
-
phi_z = math.exp(-0.5 * z * z) / math.sqrt(2.0 * math.pi)
|
| 201 |
-
p = phi_z * poly
|
| 202 |
-
return p if z >= 0 else 1.0 - p
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
# Table des valeurs critiques de W pour α=0.05 bilatéral (test exact, source : tables de Wilcoxon)
|
| 206 |
-
_W_CRITICAL = {1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 2, 8: 3, 9: 5}
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
def _wilcoxon_exact_p(n: int, w: float) -> float:
|
| 210 |
-
"""P-value approximée pour petits n (< 10) via table critique simplifiée.
|
| 211 |
-
|
| 212 |
-
Note : résultat **conservateur** — seules deux valeurs sont retournées :
|
| 213 |
-
0.04 (significatif à 5 %) ou 0.20 (non significatif).
|
| 214 |
-
Préférer scipy pour des p-values exactes.
|
| 215 |
-
"""
|
| 216 |
-
critical = _W_CRITICAL.get(n, 0)
|
| 217 |
-
if w <= critical:
|
| 218 |
-
return 0.04 # significatif à 5 %
|
| 219 |
-
return 0.20 # non significatif (approximation conservative)
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
def _native_p_value(n: int, W: float) -> float:
|
| 223 |
-
"""Calcule la p-value via l'approximation normale (n ≥ 10) ou la table exacte (n < 10)."""
|
| 224 |
-
if n >= 10:
|
| 225 |
-
mu = n * (n + 1) / 4.0
|
| 226 |
-
sigma2 = n * (n + 1) * (2 * n + 1) / 24.0
|
| 227 |
-
if sigma2 <= 0:
|
| 228 |
-
return 1.0
|
| 229 |
-
z = abs((W + 0.5) - mu) / math.sqrt(sigma2) # correction de continuité
|
| 230 |
-
return 2.0 * _normal_sf(z) # test bilatéral
|
| 231 |
-
return _wilcoxon_exact_p(n, W)
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
# ---------------------------------------------------------------------------
|
| 235 |
-
# Matrice des tests pairwise
|
| 236 |
-
# ---------------------------------------------------------------------------
|
| 237 |
-
|
| 238 |
-
def compute_pairwise_stats(
|
| 239 |
-
engine_cer_map: dict[str, list[float]],
|
| 240 |
-
) -> list[dict]:
|
| 241 |
-
"""Calcule les tests de Wilcoxon entre toutes les paires de concurrents.
|
| 242 |
-
|
| 243 |
-
Parameters
|
| 244 |
-
----------
|
| 245 |
-
engine_cer_map : dict {engine_name → [cer_doc1, cer_doc2, ...]}
|
| 246 |
-
|
| 247 |
-
Returns
|
| 248 |
-
-------
|
| 249 |
-
Liste de dicts, un par paire :
|
| 250 |
-
- engine_a, engine_b, statistic, p_value, significant, interpretation
|
| 251 |
-
"""
|
| 252 |
-
names = list(engine_cer_map.keys())
|
| 253 |
-
results = []
|
| 254 |
-
for i in range(len(names)):
|
| 255 |
-
for j in range(i + 1, len(names)):
|
| 256 |
-
a_name, b_name = names[i], names[j]
|
| 257 |
-
a_vals = engine_cer_map[a_name]
|
| 258 |
-
b_vals = engine_cer_map[b_name]
|
| 259 |
-
# Aligner les longueurs
|
| 260 |
-
min_len = min(len(a_vals), len(b_vals))
|
| 261 |
-
if min_len < 2:
|
| 262 |
-
continue
|
| 263 |
-
res = wilcoxon_test(a_vals[:min_len], b_vals[:min_len])
|
| 264 |
-
results.append({
|
| 265 |
-
"engine_a": a_name,
|
| 266 |
-
"engine_b": b_name,
|
| 267 |
-
**res,
|
| 268 |
-
})
|
| 269 |
-
return results
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
# ---------------------------------------------------------------------------
|
| 273 |
-
# Test de Friedman + post-hoc Nemenyi (Sprint 17)
|
| 274 |
-
# ---------------------------------------------------------------------------
|
| 275 |
-
#
|
| 276 |
-
# Référence : Demšar, J. (2006), "Statistical Comparisons of Classifiers over
|
| 277 |
-
# Multiple Data Sets", Journal of Machine Learning Research 7:1-30. Standard
|
| 278 |
-
# de facto pour comparer plusieurs systèmes sur plusieurs datasets — ici :
|
| 279 |
-
# plusieurs moteurs OCR sur plusieurs documents. Le CDD (critical difference
|
| 280 |
-
# diagram) issu de Nemenyi est le rendu canonique.
|
| 281 |
-
|
| 282 |
-
# Valeurs critiques de la distribution du Studentized Range divisées par √2,
|
| 283 |
-
# pour df = ∞ (approximation usuelle pour Nemenyi). Source : tables de Tukey.
|
| 284 |
-
# Clé : nombre de traitements k ; valeur : q_α pour α ∈ {0.05, 0.01}.
|
| 285 |
-
_NEMENYI_Q_TABLE = {
|
| 286 |
-
# k q_0.05 q_0.01
|
| 287 |
-
2: (1.960, 2.576),
|
| 288 |
-
3: (2.343, 2.913),
|
| 289 |
-
4: (2.569, 3.113),
|
| 290 |
-
5: (2.728, 3.255),
|
| 291 |
-
6: (2.850, 3.364),
|
| 292 |
-
7: (2.949, 3.452),
|
| 293 |
-
8: (3.031, 3.526),
|
| 294 |
-
9: (3.102, 3.590),
|
| 295 |
-
10: (3.164, 3.646),
|
| 296 |
-
11: (3.219, 3.696),
|
| 297 |
-
12: (3.268, 3.741),
|
| 298 |
-
13: (3.313, 3.781),
|
| 299 |
-
14: (3.354, 3.818),
|
| 300 |
-
15: (3.391, 3.853),
|
| 301 |
-
16: (3.426, 3.886),
|
| 302 |
-
17: (3.458, 3.916),
|
| 303 |
-
18: (3.489, 3.944),
|
| 304 |
-
19: (3.517, 3.970),
|
| 305 |
-
20: (3.544, 3.995),
|
| 306 |
-
25: (3.658, 4.095),
|
| 307 |
-
30: (3.739, 4.167),
|
| 308 |
-
40: (3.858, 4.272),
|
| 309 |
-
50: (3.945, 4.349),
|
| 310 |
-
}
|
| 311 |
-
|
| 312 |
-
|
| 313 |
-
def _chi_square_sf(x: float, df: int) -> float:
|
| 314 |
-
"""Survival function de la loi chi², 1 - CDF(x).
|
| 315 |
-
|
| 316 |
-
Utilise scipy si disponible (méthode exacte), sinon Wilson-Hilferty
|
| 317 |
-
(approximation normale précise dès df ≥ 3).
|
| 318 |
-
"""
|
| 319 |
-
if x <= 0 or df <= 0:
|
| 320 |
-
return 1.0
|
| 321 |
-
try:
|
| 322 |
-
from scipy.stats import chi2 as _chi2 # type: ignore[import-untyped]
|
| 323 |
-
return float(_chi2.sf(x, df))
|
| 324 |
-
except ImportError:
|
| 325 |
-
pass
|
| 326 |
-
# Wilson-Hilferty : transforme chi² en approximation normale
|
| 327 |
-
z = (((x / df) ** (1.0 / 3.0)) - (1.0 - 2.0 / (9.0 * df))) / math.sqrt(2.0 / (9.0 * df))
|
| 328 |
-
return _normal_sf(z)
|
| 329 |
-
|
| 330 |
-
|
| 331 |
-
def _rank_row(values: list[float]) -> list[float]:
|
| 332 |
-
"""Rangs d'une ligne — petit = rang 1. Ex-aequo : rangs moyens."""
|
| 333 |
-
n = len(values)
|
| 334 |
-
indexed = sorted(range(n), key=lambda i: values[i])
|
| 335 |
-
ranks = [0.0] * n
|
| 336 |
-
i = 0
|
| 337 |
-
while i < n:
|
| 338 |
-
j = i
|
| 339 |
-
while j < n and values[indexed[j]] == values[indexed[i]]:
|
| 340 |
-
j += 1
|
| 341 |
-
avg_rank = (i + j + 1) / 2.0 # 1-based
|
| 342 |
-
for k in range(i, j):
|
| 343 |
-
ranks[indexed[k]] = avg_rank
|
| 344 |
-
i = j
|
| 345 |
-
return ranks
|
| 346 |
-
|
| 347 |
-
|
| 348 |
-
def _aligned_cer_matrix(
|
| 349 |
-
engine_cer_map: dict[str, list[float]],
|
| 350 |
-
) -> tuple[list[str], list[list[float]]]:
|
| 351 |
-
"""Construit la matrice (k moteurs × n documents) alignée sur la longueur
|
| 352 |
-
minimale. Retourne ``(noms, matrice_colonne_par_moteur)``.
|
| 353 |
-
|
| 354 |
-
Friedman exige des blocs (documents) complets : si les moteurs n'ont pas
|
| 355 |
-
tous été exécutés sur les mêmes documents, on tronque à la longueur
|
| 356 |
-
minimale, documentée dans le résultat via ``n_blocks``.
|
| 357 |
-
"""
|
| 358 |
-
names = list(engine_cer_map.keys())
|
| 359 |
-
if not names:
|
| 360 |
-
return [], []
|
| 361 |
-
min_len = min(len(v) for v in engine_cer_map.values())
|
| 362 |
-
if min_len == 0:
|
| 363 |
-
return names, []
|
| 364 |
-
matrix = [engine_cer_map[n][:min_len] for n in names]
|
| 365 |
-
return names, matrix
|
| 366 |
-
|
| 367 |
-
|
| 368 |
-
def friedman_test(engine_cer_map: dict[str, list[float]]) -> dict:
|
| 369 |
-
"""Test de Friedman — k moteurs sur n documents appariés.
|
| 370 |
-
|
| 371 |
-
Test non-paramétrique équivalent à l'ANOVA à mesures répétées pour des
|
| 372 |
-
données ordinales. Hypothèse nulle : tous les moteurs ont la même
|
| 373 |
-
performance moyenne. Rejet → au moins un moteur diffère des autres.
|
| 374 |
-
|
| 375 |
-
Parameters
|
| 376 |
-
----------
|
| 377 |
-
engine_cer_map:
|
| 378 |
-
Dict ``{engine_name → [cer_doc1, cer_doc2, ...]}``. Tous les moteurs
|
| 379 |
-
doivent avoir été évalués sur les mêmes documents (dans le même ordre).
|
| 380 |
-
|
| 381 |
-
Returns
|
| 382 |
-
-------
|
| 383 |
-
dict avec :
|
| 384 |
-
- ``statistic`` : Q corrigé pour les ex-aequo
|
| 385 |
-
- ``p_value`` : p-value (scipy si dispo, sinon Wilson-Hilferty)
|
| 386 |
-
- ``significant`` : bool, p < 0.05
|
| 387 |
-
- ``df`` : degrés de liberté = k - 1
|
| 388 |
-
- ``n_blocks`` : nombre de documents (blocs) utilisés
|
| 389 |
-
- ``n_engines`` : nombre de moteurs (k)
|
| 390 |
-
- ``mean_ranks`` : dict ``{engine: rang_moyen}``
|
| 391 |
-
- ``interpretation``: phrase lisible
|
| 392 |
-
- ``error`` : message si le test n'est pas applicable
|
| 393 |
-
"""
|
| 394 |
-
names, matrix = _aligned_cer_matrix(engine_cer_map)
|
| 395 |
-
k = len(names)
|
| 396 |
-
n = len(matrix[0]) if matrix else 0
|
| 397 |
-
|
| 398 |
-
if k < 2:
|
| 399 |
-
return {
|
| 400 |
-
"statistic": 0.0, "p_value": 1.0, "significant": False,
|
| 401 |
-
"df": 0, "n_blocks": n, "n_engines": k,
|
| 402 |
-
"mean_ranks": {names[0]: 1.0} if k == 1 else {},
|
| 403 |
-
"interpretation": "Test de Friedman non applicable : il faut au moins 2 moteurs.",
|
| 404 |
-
"error": "not_enough_engines",
|
| 405 |
-
}
|
| 406 |
-
if n < 2:
|
| 407 |
-
return {
|
| 408 |
-
"statistic": 0.0, "p_value": 1.0, "significant": False,
|
| 409 |
-
"df": k - 1, "n_blocks": n, "n_engines": k,
|
| 410 |
-
"mean_ranks": {name: 1.0 for name in names},
|
| 411 |
-
"interpretation": "Test de Friedman non applicable : il faut au moins 2 documents communs.",
|
| 412 |
-
"error": "not_enough_blocks",
|
| 413 |
-
}
|
| 414 |
-
|
| 415 |
-
# Rangs par bloc (document) : pour chaque doc, ranger les k moteurs
|
| 416 |
-
ranks_by_engine: list[list[float]] = [[] for _ in range(k)]
|
| 417 |
-
for j in range(n):
|
| 418 |
-
row = [matrix[i][j] for i in range(k)]
|
| 419 |
-
row_ranks = _rank_row(row)
|
| 420 |
-
for i in range(k):
|
| 421 |
-
ranks_by_engine[i].append(row_ranks[i])
|
| 422 |
-
|
| 423 |
-
rank_sums = [sum(r) for r in ranks_by_engine]
|
| 424 |
-
mean_ranks = {names[i]: rank_sums[i] / n for i in range(k)}
|
| 425 |
-
|
| 426 |
-
# Statistique Q non-corrigée (sans ex-aequo)
|
| 427 |
-
# Q = 12 / (n·k·(k+1)) · Σ R_j² − 3·n·(k+1)
|
| 428 |
-
Q = (12.0 / (n * k * (k + 1))) * sum(rs ** 2 for rs in rank_sums) - 3.0 * n * (k + 1)
|
| 429 |
-
|
| 430 |
-
# Correction pour les ex-aequo (ties factor) — ajuste si des rangs sont
|
| 431 |
-
# partagés dans certains blocs. Formule : Q_corr = Q / (1 - T/(n·(k³−k)))
|
| 432 |
-
# où T = Σ (tⱼ³ − tⱼ) sur tous les groupes d'ex-aequo.
|
| 433 |
-
tie_correction = 0.0
|
| 434 |
-
for j in range(n):
|
| 435 |
-
row = [matrix[i][j] for i in range(k)]
|
| 436 |
-
sorted_row = sorted(row)
|
| 437 |
-
i = 0
|
| 438 |
-
while i < len(sorted_row):
|
| 439 |
-
count = 1
|
| 440 |
-
while i + count < len(sorted_row) and sorted_row[i + count] == sorted_row[i]:
|
| 441 |
-
count += 1
|
| 442 |
-
if count > 1:
|
| 443 |
-
tie_correction += count ** 3 - count
|
| 444 |
-
i += count
|
| 445 |
-
denom = 1.0 - tie_correction / (n * (k ** 3 - k)) if k >= 2 else 1.0
|
| 446 |
-
if denom > 0:
|
| 447 |
-
Q = Q / denom
|
| 448 |
-
|
| 449 |
-
df = k - 1
|
| 450 |
-
p_value = _chi_square_sf(Q, df)
|
| 451 |
-
significant = p_value < 0.05
|
| 452 |
-
|
| 453 |
-
if significant:
|
| 454 |
-
interpretation = (
|
| 455 |
-
f"Test de Friedman significatif (Q = {Q:.3f}, df = {df}, p = {p_value:.4f}). "
|
| 456 |
-
f"Au moins un moteur diffère des autres — utiliser le post-hoc Nemenyi "
|
| 457 |
-
f"pour identifier les paires distinguables."
|
| 458 |
-
)
|
| 459 |
-
else:
|
| 460 |
-
interpretation = (
|
| 461 |
-
f"Test de Friedman non significatif (Q = {Q:.3f}, df = {df}, p = {p_value:.4f}). "
|
| 462 |
-
f"Aucune différence globale détectée entre les moteurs sur ce corpus."
|
| 463 |
-
)
|
| 464 |
-
|
| 465 |
-
return {
|
| 466 |
-
"statistic": round(Q, 4),
|
| 467 |
-
"p_value": round(p_value, 6),
|
| 468 |
-
"significant": significant,
|
| 469 |
-
"df": df,
|
| 470 |
-
"n_blocks": n,
|
| 471 |
-
"n_engines": k,
|
| 472 |
-
"mean_ranks": {k_: round(v, 4) for k_, v in mean_ranks.items()},
|
| 473 |
-
"interpretation": interpretation,
|
| 474 |
-
}
|
| 475 |
-
|
| 476 |
-
|
| 477 |
-
def _nemenyi_critical_value(k: int, alpha: float = 0.05) -> Optional[float]:
|
| 478 |
-
"""Valeur critique q_α pour k traitements, df = ∞.
|
| 479 |
-
|
| 480 |
-
Retourne ``None`` si k est hors table (< 2 ou > 50).
|
| 481 |
-
"""
|
| 482 |
-
if k < 2:
|
| 483 |
-
return None
|
| 484 |
-
if k in _NEMENYI_Q_TABLE:
|
| 485 |
-
q05, q01 = _NEMENYI_Q_TABLE[k]
|
| 486 |
-
return q05 if alpha == 0.05 else q01 if alpha == 0.01 else q05
|
| 487 |
-
# Au-delà de la table : borne supérieure (conservateur)
|
| 488 |
-
max_k = max(_NEMENYI_Q_TABLE.keys())
|
| 489 |
-
if k > max_k:
|
| 490 |
-
q05, q01 = _NEMENYI_Q_TABLE[max_k]
|
| 491 |
-
return q05 if alpha == 0.05 else q01
|
| 492 |
-
# Entre deux clés : interpolation linéaire
|
| 493 |
-
keys = sorted(_NEMENYI_Q_TABLE.keys())
|
| 494 |
-
for i in range(len(keys) - 1):
|
| 495 |
-
if keys[i] < k < keys[i + 1]:
|
| 496 |
-
lo, hi = keys[i], keys[i + 1]
|
| 497 |
-
q_lo = _NEMENYI_Q_TABLE[lo][0 if alpha == 0.05 else 1]
|
| 498 |
-
q_hi = _NEMENYI_Q_TABLE[hi][0 if alpha == 0.05 else 1]
|
| 499 |
-
frac = (k - lo) / (hi - lo)
|
| 500 |
-
return q_lo + frac * (q_hi - q_lo)
|
| 501 |
-
return None
|
| 502 |
-
|
| 503 |
-
|
| 504 |
-
def nemenyi_posthoc(
|
| 505 |
-
engine_cer_map: dict[str, list[float]],
|
| 506 |
-
alpha: float = 0.05,
|
| 507 |
-
) -> dict:
|
| 508 |
-
"""Post-hoc de Nemenyi — identifie les paires de moteurs statistiquement
|
| 509 |
-
indiscernables après un test de Friedman.
|
| 510 |
-
|
| 511 |
-
Calcule la *critical distance* CD = q_α · √(k·(k+1) / (6·n)). Deux moteurs
|
| 512 |
-
dont les rangs moyens diffèrent de moins que CD ne sont **pas**
|
| 513 |
-
statistiquement distinguables au seuil α.
|
| 514 |
-
|
| 515 |
-
Returns
|
| 516 |
-
-------
|
| 517 |
-
dict avec :
|
| 518 |
-
- ``alpha`` : seuil utilisé
|
| 519 |
-
- ``critical_distance`` : CD calculée
|
| 520 |
-
- ``q_alpha`` : valeur critique q_α issue de la table
|
| 521 |
-
- ``n_blocks``, ``n_engines``
|
| 522 |
-
- ``mean_ranks`` : rangs moyens par moteur (dict)
|
| 523 |
-
- ``engines_sorted`` : liste des moteurs triés par rang croissant
|
| 524 |
-
- ``significant_matrix`` : matrice bool (list[list[bool]]),
|
| 525 |
-
``True`` = paire significativement différente
|
| 526 |
-
- ``tied_groups`` : liste de listes de moteurs indiscernables
|
| 527 |
-
(groupes maximaux d'ex-aequo pratiques)
|
| 528 |
-
- ``error`` : présent si le test n'est pas applicable
|
| 529 |
-
"""
|
| 530 |
-
names, matrix = _aligned_cer_matrix(engine_cer_map)
|
| 531 |
-
k = len(names)
|
| 532 |
-
n = len(matrix[0]) if matrix else 0
|
| 533 |
-
|
| 534 |
-
if k < 2 or n < 2:
|
| 535 |
-
return {
|
| 536 |
-
"alpha": alpha,
|
| 537 |
-
"critical_distance": 0.0,
|
| 538 |
-
"q_alpha": 0.0,
|
| 539 |
-
"n_blocks": n,
|
| 540 |
-
"n_engines": k,
|
| 541 |
-
"mean_ranks": {name: 1.0 for name in names},
|
| 542 |
-
"engines_sorted": list(names),
|
| 543 |
-
"significant_matrix": [[False] * k for _ in range(k)],
|
| 544 |
-
"tied_groups": [list(names)] if names else [],
|
| 545 |
-
"error": "not_enough_data",
|
| 546 |
-
}
|
| 547 |
-
|
| 548 |
-
# Friedman fournit les rangs moyens — on les recalcule ici pour rester
|
| 549 |
-
# autonome (sans forcer l'utilisateur à chaîner les deux appels).
|
| 550 |
-
ranks_by_engine: list[list[float]] = [[] for _ in range(k)]
|
| 551 |
-
for j in range(n):
|
| 552 |
-
row = [matrix[i][j] for i in range(k)]
|
| 553 |
-
row_ranks = _rank_row(row)
|
| 554 |
-
for i in range(k):
|
| 555 |
-
ranks_by_engine[i].append(row_ranks[i])
|
| 556 |
|
| 557 |
-
|
| 558 |
-
|
| 559 |
-
|
| 560 |
-
|
| 561 |
-
critical_distance = q_alpha * math.sqrt(k * (k + 1) / (6.0 * n))
|
| 562 |
-
|
| 563 |
-
# Matrice de significativité : paire (i,j) significative si |R_i - R_j| > CD
|
| 564 |
-
significant_matrix = [
|
| 565 |
-
[
|
| 566 |
-
(i != j) and (abs(mean_ranks_list[i] - mean_ranks_list[j]) > critical_distance)
|
| 567 |
-
for j in range(k)
|
| 568 |
-
]
|
| 569 |
-
for i in range(k)
|
| 570 |
-
]
|
| 571 |
-
|
| 572 |
-
# Groupes d'ex-aequo pratiques : fenêtre glissante sur les rangs triés.
|
| 573 |
-
# Deux moteurs sont dans le même groupe si leur écart ≤ CD.
|
| 574 |
-
order = sorted(range(k), key=lambda i: mean_ranks_list[i])
|
| 575 |
-
sorted_names = [names[i] for i in order]
|
| 576 |
-
sorted_ranks = [mean_ranks_list[i] for i in order]
|
| 577 |
-
|
| 578 |
-
tied_groups: list[list[str]] = []
|
| 579 |
-
i = 0
|
| 580 |
-
while i < len(sorted_names):
|
| 581 |
-
# étendre le groupe tant que le moteur suivant est à ≤ CD du premier du groupe
|
| 582 |
-
j = i
|
| 583 |
-
while j + 1 < len(sorted_names) and (sorted_ranks[j + 1] - sorted_ranks[i]) <= critical_distance:
|
| 584 |
-
j += 1
|
| 585 |
-
tied_groups.append(sorted_names[i:j + 1])
|
| 586 |
-
i = j + 1 if j > i else i + 1
|
| 587 |
-
|
| 588 |
-
return {
|
| 589 |
-
"alpha": alpha,
|
| 590 |
-
"critical_distance": round(critical_distance, 4),
|
| 591 |
-
"q_alpha": round(q_alpha, 4),
|
| 592 |
-
"n_blocks": n,
|
| 593 |
-
"n_engines": k,
|
| 594 |
-
"mean_ranks": mean_ranks,
|
| 595 |
-
"engines_sorted": sorted_names,
|
| 596 |
-
"significant_matrix": significant_matrix,
|
| 597 |
-
"tied_groups": tied_groups,
|
| 598 |
-
}
|
| 599 |
-
|
| 600 |
-
|
| 601 |
-
# ---------------------------------------------------------------------------
|
| 602 |
-
# Critical Difference Diagram — rendu SVG (Sprint 17)
|
| 603 |
-
# ---------------------------------------------------------------------------
|
| 604 |
-
|
| 605 |
-
def build_critical_difference_svg(
|
| 606 |
-
nemenyi_result: dict,
|
| 607 |
-
width: int = 780,
|
| 608 |
-
row_height: int = 22,
|
| 609 |
-
) -> str:
|
| 610 |
-
"""Génère le SVG du Critical Difference Diagram (Demšar 2006).
|
| 611 |
-
|
| 612 |
-
Le diagramme montre :
|
| 613 |
-
* un axe horizontal des rangs moyens (1 à k),
|
| 614 |
-
* chaque moteur positionné sur l'axe à son rang moyen,
|
| 615 |
-
* des barres horizontales épaisses reliant les moteurs statistiquement
|
| 616 |
-
indiscernables (distance ≤ CD),
|
| 617 |
-
* la longueur de CD affichée au-dessus de l'axe en référence.
|
| 618 |
-
|
| 619 |
-
Parameters
|
| 620 |
-
----------
|
| 621 |
-
nemenyi_result:
|
| 622 |
-
Résultat de ``nemenyi_posthoc``.
|
| 623 |
-
width:
|
| 624 |
-
Largeur totale du SVG en pixels.
|
| 625 |
-
row_height:
|
| 626 |
-
Hauteur de chaque ligne d'étiquette moteur (auto-adaptatif).
|
| 627 |
-
|
| 628 |
-
Returns
|
| 629 |
-
-------
|
| 630 |
-
Chaîne contenant le SVG (balise racine ``<svg>…</svg>``).
|
| 631 |
-
"""
|
| 632 |
-
k = nemenyi_result.get("n_engines", 0)
|
| 633 |
-
if k < 2 or nemenyi_result.get("error"):
|
| 634 |
-
return (
|
| 635 |
-
'<svg xmlns="http://www.w3.org/2000/svg" width="100%" height="40" '
|
| 636 |
-
'role="img" aria-label="Critical Difference Diagram indisponible">'
|
| 637 |
-
'<text x="10" y="24" font-family="sans-serif" font-size="12" fill="#666">'
|
| 638 |
-
'Critical Difference Diagram non calculable — données insuffisantes.'
|
| 639 |
-
'</text></svg>'
|
| 640 |
-
)
|
| 641 |
-
|
| 642 |
-
engines_sorted: list[str] = list(nemenyi_result.get("engines_sorted", []))
|
| 643 |
-
mean_ranks: dict[str, float] = dict(nemenyi_result.get("mean_ranks", {}))
|
| 644 |
-
tied_groups: list[list[str]] = list(nemenyi_result.get("tied_groups", []))
|
| 645 |
-
cd: float = float(nemenyi_result.get("critical_distance", 0.0))
|
| 646 |
-
|
| 647 |
-
# Dimensions
|
| 648 |
-
left_pad, right_pad = 40, 40
|
| 649 |
-
top_pad = 50 # espace pour l'affichage CD
|
| 650 |
-
axis_y = top_pad + 10
|
| 651 |
-
bars_start_y = axis_y + 20 # première barre d'ex-aequo sous l'axe
|
| 652 |
-
# Empiler une ligne par groupe + une ligne par étiquette
|
| 653 |
-
label_rows = k # chaque moteur a sa propre ligne de label
|
| 654 |
-
bars_count = len(tied_groups)
|
| 655 |
-
total_h = bars_start_y + bars_count * 10 + label_rows * row_height + 20
|
| 656 |
-
|
| 657 |
-
axis_x0, axis_x1 = left_pad, width - right_pad
|
| 658 |
-
axis_width = axis_x1 - axis_x0
|
| 659 |
-
|
| 660 |
-
def x_for_rank(r: float) -> float:
|
| 661 |
-
# Rang 1 à gauche, rang k à droite
|
| 662 |
-
if k <= 1:
|
| 663 |
-
return axis_x0
|
| 664 |
-
return axis_x0 + (r - 1.0) / (k - 1.0) * axis_width
|
| 665 |
-
|
| 666 |
-
parts: list[str] = []
|
| 667 |
-
parts.append(
|
| 668 |
-
f'<svg xmlns="http://www.w3.org/2000/svg" width="100%" viewBox="0 0 {width} {total_h}" '
|
| 669 |
-
f'role="img" aria-label="Critical Difference Diagram (Friedman-Nemenyi)" '
|
| 670 |
-
f'font-family="system-ui, -apple-system, sans-serif">'
|
| 671 |
-
)
|
| 672 |
-
parts.append('<style>.cd-axis{stroke:#334155;stroke-width:1.5}.cd-tick{stroke:#334155;stroke-width:1}'
|
| 673 |
-
'.cd-label{fill:#0f172a;font-size:11px}'
|
| 674 |
-
'.cd-tie{stroke:#0f172a;stroke-width:4;stroke-linecap:round}'
|
| 675 |
-
'.cd-cd-bar{stroke:#dc2626;stroke-width:2}'
|
| 676 |
-
'.cd-cd-txt{fill:#dc2626;font-size:11px;font-weight:600}'
|
| 677 |
-
'.cd-name{fill:#0f172a;font-size:12px}'
|
| 678 |
-
'.cd-rank{fill:#64748b;font-size:10px}'
|
| 679 |
-
'</style>')
|
| 680 |
-
|
| 681 |
-
# Barre CD de référence (en haut, à gauche de l'axe)
|
| 682 |
-
if cd > 0 and k >= 2:
|
| 683 |
-
cd_bar_x0 = axis_x0
|
| 684 |
-
cd_bar_x1 = axis_x0 + (cd / max(1, k - 1)) * axis_width
|
| 685 |
-
cd_y = top_pad - 20
|
| 686 |
-
parts.append(f'<line class="cd-cd-bar" x1="{cd_bar_x0:.1f}" y1="{cd_y}" '
|
| 687 |
-
f'x2="{cd_bar_x1:.1f}" y2="{cd_y}"/>')
|
| 688 |
-
parts.append(f'<line class="cd-cd-bar" x1="{cd_bar_x0:.1f}" y1="{cd_y - 4}" '
|
| 689 |
-
f'x2="{cd_bar_x0:.1f}" y2="{cd_y + 4}"/>')
|
| 690 |
-
parts.append(f'<line class="cd-cd-bar" x1="{cd_bar_x1:.1f}" y1="{cd_y - 4}" '
|
| 691 |
-
f'x2="{cd_bar_x1:.1f}" y2="{cd_y + 4}"/>')
|
| 692 |
-
parts.append(f'<text class="cd-cd-txt" x="{(cd_bar_x0 + cd_bar_x1)/2:.1f}" y="{cd_y - 8}" '
|
| 693 |
-
f'text-anchor="middle">CD = {cd:.3f}</text>')
|
| 694 |
-
|
| 695 |
-
# Axe principal
|
| 696 |
-
parts.append(f'<line class="cd-axis" x1="{axis_x0}" y1="{axis_y}" '
|
| 697 |
-
f'x2="{axis_x1}" y2="{axis_y}"/>')
|
| 698 |
-
# Ticks entiers
|
| 699 |
-
for r in range(1, k + 1):
|
| 700 |
-
xt = x_for_rank(r)
|
| 701 |
-
parts.append(f'<line class="cd-tick" x1="{xt:.1f}" y1="{axis_y - 5}" '
|
| 702 |
-
f'x2="{xt:.1f}" y2="{axis_y + 5}"/>')
|
| 703 |
-
parts.append(f'<text class="cd-label" x="{xt:.1f}" y="{axis_y - 9}" '
|
| 704 |
-
f'text-anchor="middle">{r}</text>')
|
| 705 |
-
|
| 706 |
-
# Barres reliant les groupes indiscernables
|
| 707 |
-
for i, group in enumerate(tied_groups):
|
| 708 |
-
if len(group) < 2:
|
| 709 |
-
continue
|
| 710 |
-
rs = [mean_ranks[n] for n in group]
|
| 711 |
-
x0 = x_for_rank(min(rs))
|
| 712 |
-
x1 = x_for_rank(max(rs))
|
| 713 |
-
y_bar = bars_start_y + i * 10
|
| 714 |
-
parts.append(f'<line class="cd-tie" x1="{x0 - 3:.1f}" y1="{y_bar}" '
|
| 715 |
-
f'x2="{x1 + 3:.1f}" y2="{y_bar}"/>')
|
| 716 |
-
|
| 717 |
-
# Étiquettes des moteurs : la moitié la plus basse à gauche, l'autre à droite
|
| 718 |
-
labels_y_base = bars_start_y + bars_count * 10 + 15
|
| 719 |
-
half = (len(engines_sorted) + 1) // 2
|
| 720 |
-
left_engines = engines_sorted[:half]
|
| 721 |
-
right_engines = engines_sorted[half:]
|
| 722 |
-
|
| 723 |
-
for idx, name in enumerate(left_engines):
|
| 724 |
-
r = mean_ranks[name]
|
| 725 |
-
x = x_for_rank(r)
|
| 726 |
-
y_label = labels_y_base + idx * row_height
|
| 727 |
-
# Ligne du moteur vers axe
|
| 728 |
-
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{axis_y + 6}" '
|
| 729 |
-
f'x2="{x:.1f}" y2="{y_label - 4}"/>')
|
| 730 |
-
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{y_label - 4}" '
|
| 731 |
-
f'x2="{axis_x0 - 4:.1f}" y2="{y_label - 4}"/>')
|
| 732 |
-
parts.append(f'<text class="cd-name" x="{axis_x0 - 6:.1f}" y="{y_label}" '
|
| 733 |
-
f'text-anchor="end">{_svg_escape(name)} '
|
| 734 |
-
f'<tspan class="cd-rank">({r:.2f})</tspan></text>')
|
| 735 |
-
|
| 736 |
-
for idx, name in enumerate(right_engines):
|
| 737 |
-
r = mean_ranks[name]
|
| 738 |
-
x = x_for_rank(r)
|
| 739 |
-
y_label = labels_y_base + idx * row_height
|
| 740 |
-
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{axis_y + 6}" '
|
| 741 |
-
f'x2="{x:.1f}" y2="{y_label - 4}"/>')
|
| 742 |
-
parts.append(f'<line class="cd-tick" x1="{x:.1f}" y1="{y_label - 4}" '
|
| 743 |
-
f'x2="{axis_x1 + 4:.1f}" y2="{y_label - 4}"/>')
|
| 744 |
-
parts.append(f'<text class="cd-name" x="{axis_x1 + 6:.1f}" y="{y_label}" '
|
| 745 |
-
f'text-anchor="start">{_svg_escape(name)} '
|
| 746 |
-
f'<tspan class="cd-rank">({r:.2f})</tspan></text>')
|
| 747 |
-
|
| 748 |
-
parts.append('</svg>')
|
| 749 |
-
return "".join(parts)
|
| 750 |
-
|
| 751 |
-
|
| 752 |
-
def _svg_escape(text: str) -> str:
|
| 753 |
-
"""Échappe un texte pour inclusion sûre dans un nœud SVG/XML."""
|
| 754 |
-
return (text.replace("&", "&")
|
| 755 |
-
.replace("<", "<")
|
| 756 |
-
.replace(">", ">")
|
| 757 |
-
.replace('"', """)
|
| 758 |
-
.replace("'", "'"))
|
| 759 |
-
|
| 760 |
-
|
| 761 |
-
# ---------------------------------------------------------------------------
|
| 762 |
-
# Frontière de Pareto (Sprint 19)
|
| 763 |
-
# ---------------------------------------------------------------------------
|
| 764 |
-
|
| 765 |
-
def compute_pareto_front(
|
| 766 |
-
points: list[dict],
|
| 767 |
-
objectives: tuple[str, ...] = ("cer", "cost"),
|
| 768 |
-
name_key: str = "engine",
|
| 769 |
-
minimize: Optional[tuple[bool, ...]] = None,
|
| 770 |
-
) -> list[str]:
|
| 771 |
-
"""Calcule la frontière de Pareto sur ``len(objectives)`` dimensions.
|
| 772 |
-
|
| 773 |
-
Un point ``p`` est Pareto-dominant si aucun autre point n'a, pour TOUS
|
| 774 |
-
les objectifs, une valeur au moins aussi bonne ET au moins une valeur
|
| 775 |
-
strictement meilleure.
|
| 776 |
-
|
| 777 |
-
Parameters
|
| 778 |
-
----------
|
| 779 |
-
points:
|
| 780 |
-
Liste de dicts. Chaque dict doit contenir ``name_key`` et toutes les
|
| 781 |
-
clés de ``objectives``. Les points dont une valeur d'objectif est
|
| 782 |
-
``None`` sont ignorés (pas de comparaison possible).
|
| 783 |
-
objectives:
|
| 784 |
-
Clés des objectifs à minimiser/maximiser.
|
| 785 |
-
name_key:
|
| 786 |
-
Clé identifiant le point (par défaut ``"engine"``).
|
| 787 |
-
minimize:
|
| 788 |
-
Pour chaque objectif, ``True`` = minimiser (ex. CER, coût),
|
| 789 |
-
``False`` = maximiser (ex. ancrage). Doit avoir la même longueur
|
| 790 |
-
que ``objectives``.
|
| 791 |
-
|
| 792 |
-
Returns
|
| 793 |
-
-------
|
| 794 |
-
Liste des ``name`` des points sur le front Pareto, ordre stable depuis
|
| 795 |
-
``points``.
|
| 796 |
-
"""
|
| 797 |
-
if minimize is None:
|
| 798 |
-
minimize = tuple(True for _ in objectives)
|
| 799 |
-
if len(minimize) != len(objectives):
|
| 800 |
-
raise ValueError("`minimize` doit avoir la même longueur que `objectives`")
|
| 801 |
-
|
| 802 |
-
valid = []
|
| 803 |
-
for p in points:
|
| 804 |
-
try:
|
| 805 |
-
vals = tuple(float(p[k]) for k in objectives)
|
| 806 |
-
except (KeyError, TypeError, ValueError):
|
| 807 |
-
continue
|
| 808 |
-
valid.append((p[name_key], vals))
|
| 809 |
-
|
| 810 |
-
front: list[str] = []
|
| 811 |
-
for name_a, vals_a in valid:
|
| 812 |
-
dominated = False
|
| 813 |
-
for name_b, vals_b in valid:
|
| 814 |
-
if name_a == name_b:
|
| 815 |
-
continue
|
| 816 |
-
# B domine A si B est ≥ aussi bon partout ET strictement meilleur quelque part
|
| 817 |
-
better_or_equal_everywhere = True
|
| 818 |
-
strictly_better_somewhere = False
|
| 819 |
-
for va, vb, mini in zip(vals_a, vals_b, minimize):
|
| 820 |
-
if mini:
|
| 821 |
-
if vb > va:
|
| 822 |
-
better_or_equal_everywhere = False
|
| 823 |
-
break
|
| 824 |
-
if vb < va:
|
| 825 |
-
strictly_better_somewhere = True
|
| 826 |
-
else: # maximiser
|
| 827 |
-
if vb < va:
|
| 828 |
-
better_or_equal_everywhere = False
|
| 829 |
-
break
|
| 830 |
-
if vb > va:
|
| 831 |
-
strictly_better_somewhere = True
|
| 832 |
-
if better_or_equal_everywhere and strictly_better_somewhere:
|
| 833 |
-
dominated = True
|
| 834 |
-
break
|
| 835 |
-
if not dominated:
|
| 836 |
-
front.append(name_a)
|
| 837 |
-
return front
|
| 838 |
-
|
| 839 |
-
|
| 840 |
-
# ---------------------------------------------------------------------------
|
| 841 |
-
# Clustering des patterns d'erreurs
|
| 842 |
-
# ---------------------------------------------------------------------------
|
| 843 |
-
|
| 844 |
-
# Patterns d'erreurs fréquentes (OCR + HTR documents patrimoniaux)
|
| 845 |
-
_ERROR_PATTERNS = [
|
| 846 |
-
# (pattern_re, label)
|
| 847 |
-
(r"\brn\b.*\bm\b|\bm\b.*\brn\b|rn→m|m→rn", "confusion rn/m"),
|
| 848 |
-
(r"[lI]→1|1→[lI]|l→1|1→l|I→1|1→I", "confusion l/1/I"),
|
| 849 |
-
(r"u→n|n→u|v→u|u→v", "confusion u/n/v"),
|
| 850 |
-
(r"[oO]→0|0→[oO]", "confusion O/0"),
|
| 851 |
-
(r"ſ→[fs]|[fs]→ſ", "confusion ſ/f/s"),
|
| 852 |
-
(r"é→e|è→e|ê→e|e→[éèê]", "erreur diacritique é/e"),
|
| 853 |
-
(r"œ→oe|oe→œ|æ→ae|ae→æ", "ligature œ/æ"),
|
| 854 |
-
(r"[fF]i→fi|fi→[fF]i", "ligature fi"),
|
| 855 |
-
(r"[fF]l→fl|fl→[fF]l", "ligature fl"),
|
| 856 |
-
(r"\s+→''|''→\s+", "segmentation espace"),
|
| 857 |
-
]
|
| 858 |
-
|
| 859 |
-
def _extract_error_pairs(gt: str, hyp: str) -> list[tuple[str, str]]:
|
| 860 |
-
"""Extrait les paires (gt_char_seq, hyp_char_seq) d'erreurs de substitution."""
|
| 861 |
-
from picarones.report.diff_utils import compute_word_diff
|
| 862 |
-
ops = compute_word_diff(gt, hyp)
|
| 863 |
-
pairs = []
|
| 864 |
-
for op in ops:
|
| 865 |
-
if op["op"] == "replace":
|
| 866 |
-
pairs.append((op["old"], op["new"]))
|
| 867 |
-
elif op["op"] == "delete":
|
| 868 |
-
pairs.append((op["text"], ""))
|
| 869 |
-
elif op["op"] == "insert":
|
| 870 |
-
pairs.append(("", op["text"]))
|
| 871 |
-
return pairs
|
| 872 |
-
|
| 873 |
-
|
| 874 |
-
@dataclass
|
| 875 |
-
class ErrorCluster:
|
| 876 |
-
"""Un cluster d'erreurs similaires."""
|
| 877 |
-
cluster_id: int
|
| 878 |
-
label: str
|
| 879 |
-
"""Description humaine du pattern (ex. 'confusion rn/m')."""
|
| 880 |
-
count: int
|
| 881 |
-
examples: list[dict]
|
| 882 |
-
"""Liste de {engine, gt_fragment, ocr_fragment}."""
|
| 883 |
-
|
| 884 |
-
def as_dict(self) -> dict:
|
| 885 |
-
return {
|
| 886 |
-
"cluster_id": self.cluster_id,
|
| 887 |
-
"label": self.label,
|
| 888 |
-
"count": self.count,
|
| 889 |
-
"examples": self.examples[:5], # 5 exemples max
|
| 890 |
-
}
|
| 891 |
-
|
| 892 |
-
|
| 893 |
-
def cluster_errors(
|
| 894 |
-
error_data: list[dict],
|
| 895 |
-
max_clusters: int = 8,
|
| 896 |
-
) -> list[ErrorCluster]:
|
| 897 |
-
"""Regroupe les erreurs en clusters avec labels lisibles.
|
| 898 |
-
|
| 899 |
-
Parameters
|
| 900 |
-
----------
|
| 901 |
-
error_data : liste de dicts {engine, gt, hypothesis}
|
| 902 |
-
max_clusters : nombre max de clusters à retourner
|
| 903 |
-
|
| 904 |
-
Returns
|
| 905 |
-
-------
|
| 906 |
-
Liste de ErrorCluster triée par count décroissant.
|
| 907 |
-
"""
|
| 908 |
-
# Collecter tous les patterns d'erreur avec contexte
|
| 909 |
-
# Clé : catégorie d'erreur → liste d'exemples
|
| 910 |
-
bucket: dict[str, list[dict]] = defaultdict(list)
|
| 911 |
-
other_pairs: list[dict] = []
|
| 912 |
-
|
| 913 |
-
for item in error_data:
|
| 914 |
-
engine = item.get("engine", "")
|
| 915 |
-
gt = item.get("gt", "")
|
| 916 |
-
hyp = item.get("hypothesis", "")
|
| 917 |
-
pairs = _extract_error_pairs(gt, hyp)
|
| 918 |
-
|
| 919 |
-
for old, new in pairs:
|
| 920 |
-
if not old and not new:
|
| 921 |
-
continue
|
| 922 |
-
matched = False
|
| 923 |
-
# Essayer de matcher un pattern connu
|
| 924 |
-
probe = f"{old}→{new}"
|
| 925 |
-
for _pat, label in _ERROR_PATTERNS:
|
| 926 |
-
try:
|
| 927 |
-
if re.search(_pat, probe, re.IGNORECASE):
|
| 928 |
-
bucket[label].append({
|
| 929 |
-
"engine": engine,
|
| 930 |
-
"gt_fragment": old,
|
| 931 |
-
"ocr_fragment": new,
|
| 932 |
-
})
|
| 933 |
-
matched = True
|
| 934 |
-
break
|
| 935 |
-
except re.error:
|
| 936 |
-
pass
|
| 937 |
-
|
| 938 |
-
if not matched:
|
| 939 |
-
# Regrouper les substitutions restantes par paire de caractères
|
| 940 |
-
if len(old) <= 3 and len(new) <= 3:
|
| 941 |
-
key = f"{old}→{new}" if (old and new) else (f"—→{new}" if new else f"{old}→—")
|
| 942 |
-
bucket[key].append({
|
| 943 |
-
"engine": engine,
|
| 944 |
-
"gt_fragment": old,
|
| 945 |
-
"ocr_fragment": new,
|
| 946 |
-
})
|
| 947 |
-
else:
|
| 948 |
-
other_pairs.append({
|
| 949 |
-
"engine": engine,
|
| 950 |
-
"gt_fragment": old,
|
| 951 |
-
"ocr_fragment": new,
|
| 952 |
-
})
|
| 953 |
-
|
| 954 |
-
# Construire les clusters triés par fréquence
|
| 955 |
-
clusters: list[ErrorCluster] = []
|
| 956 |
-
cluster_id = 1
|
| 957 |
-
sorted_buckets = sorted(bucket.items(), key=lambda x: -len(x[1]))
|
| 958 |
-
|
| 959 |
-
for label, examples in sorted_buckets[:max_clusters - 1]:
|
| 960 |
-
clusters.append(ErrorCluster(
|
| 961 |
-
cluster_id=cluster_id,
|
| 962 |
-
label=label,
|
| 963 |
-
count=len(examples),
|
| 964 |
-
examples=examples,
|
| 965 |
-
))
|
| 966 |
-
cluster_id += 1
|
| 967 |
-
|
| 968 |
-
# Cluster "autres"
|
| 969 |
-
if other_pairs:
|
| 970 |
-
clusters.append(ErrorCluster(
|
| 971 |
-
cluster_id=cluster_id,
|
| 972 |
-
label="autres substitutions",
|
| 973 |
-
count=len(other_pairs),
|
| 974 |
-
examples=other_pairs,
|
| 975 |
-
))
|
| 976 |
-
|
| 977 |
-
# Trier par count décroissant et limiter
|
| 978 |
-
clusters.sort(key=lambda c: -c.count)
|
| 979 |
-
return clusters[:max_clusters]
|
| 980 |
-
|
| 981 |
-
|
| 982 |
-
# ---------------------------------------------------------------------------
|
| 983 |
-
# Matrice de corrélation entre métriques
|
| 984 |
-
# ---------------------------------------------------------------------------
|
| 985 |
-
|
| 986 |
-
def _pearson(x: list[float], y: list[float]) -> float:
|
| 987 |
-
"""Coefficient de corrélation de Pearson."""
|
| 988 |
-
n = len(x)
|
| 989 |
-
if n < 2:
|
| 990 |
-
return 0.0
|
| 991 |
-
mx = sum(x) / n
|
| 992 |
-
my = sum(y) / n
|
| 993 |
-
num = sum((xi - mx) * (yi - my) for xi, yi in zip(x, y))
|
| 994 |
-
den = math.sqrt(
|
| 995 |
-
sum((xi - mx) ** 2 for xi in x) * sum((yi - my) ** 2 for yi in y)
|
| 996 |
-
)
|
| 997 |
-
return num / den if den > 0 else 0.0
|
| 998 |
-
|
| 999 |
-
|
| 1000 |
-
def compute_correlation_matrix(
|
| 1001 |
-
metrics_per_doc: list[dict],
|
| 1002 |
-
metric_keys: Optional[list[str]] = None,
|
| 1003 |
-
) -> dict:
|
| 1004 |
-
"""Calcule la matrice de corrélation entre toutes les métriques numériques.
|
| 1005 |
-
|
| 1006 |
-
Parameters
|
| 1007 |
-
----------
|
| 1008 |
-
metrics_per_doc : liste de dicts, un par document, contenant les métriques
|
| 1009 |
-
metric_keys : clés à inclure (None → toutes les clés numériques)
|
| 1010 |
-
|
| 1011 |
-
Returns
|
| 1012 |
-
-------
|
| 1013 |
-
{
|
| 1014 |
-
"labels": [...],
|
| 1015 |
-
"matrix": [[r_ij, ...], ...] // coefficients de Pearson
|
| 1016 |
-
}
|
| 1017 |
-
"""
|
| 1018 |
-
if not metrics_per_doc:
|
| 1019 |
-
return {"labels": [], "matrix": []}
|
| 1020 |
-
|
| 1021 |
-
if metric_keys is None:
|
| 1022 |
-
# Déduire les clés numériques
|
| 1023 |
-
sample = metrics_per_doc[0]
|
| 1024 |
-
metric_keys = [k for k, v in sample.items() if isinstance(v, (int, float))]
|
| 1025 |
-
|
| 1026 |
-
# Construire les vecteurs
|
| 1027 |
-
vectors: dict[str, list[float]] = {k: [] for k in metric_keys}
|
| 1028 |
-
for doc in metrics_per_doc:
|
| 1029 |
-
for k in metric_keys:
|
| 1030 |
-
v = doc.get(k)
|
| 1031 |
-
vectors[k].append(float(v) if v is not None else 0.0)
|
| 1032 |
-
|
| 1033 |
-
# Calculer la matrice
|
| 1034 |
-
labels = metric_keys
|
| 1035 |
-
n = len(labels)
|
| 1036 |
-
matrix = []
|
| 1037 |
-
for i in range(n):
|
| 1038 |
-
row = []
|
| 1039 |
-
for j in range(n):
|
| 1040 |
-
r = _pearson(vectors[labels[i]], vectors[labels[j]])
|
| 1041 |
-
row.append(round(r, 4))
|
| 1042 |
-
matrix.append(row)
|
| 1043 |
-
|
| 1044 |
-
return {"labels": labels, "matrix": matrix}
|
| 1045 |
-
|
| 1046 |
-
|
| 1047 |
-
# ---------------------------------------------------------------------------
|
| 1048 |
-
# Courbe de fiabilité (reliability curve)
|
| 1049 |
-
# ---------------------------------------------------------------------------
|
| 1050 |
-
|
| 1051 |
-
def compute_reliability_curve(
|
| 1052 |
-
cer_values: list[float],
|
| 1053 |
-
steps: int = 20,
|
| 1054 |
-
) -> list[dict]:
|
| 1055 |
-
"""Pour les X% documents les plus faciles, quel est le CER moyen ?
|
| 1056 |
-
|
| 1057 |
-
Returns
|
| 1058 |
-
-------
|
| 1059 |
-
Liste de {pct_docs: float, mean_cer: float}
|
| 1060 |
-
"""
|
| 1061 |
-
if not cer_values:
|
| 1062 |
-
return []
|
| 1063 |
-
sorted_cer = sorted(cer_values)
|
| 1064 |
-
n = len(sorted_cer)
|
| 1065 |
-
points = []
|
| 1066 |
-
for step in range(1, steps + 1):
|
| 1067 |
-
pct = step / steps
|
| 1068 |
-
cutoff = max(1, int(pct * n))
|
| 1069 |
-
subset = sorted_cer[:cutoff]
|
| 1070 |
-
mean_cer = sum(subset) / len(subset)
|
| 1071 |
-
points.append({"pct_docs": round(pct * 100, 1), "mean_cer": round(mean_cer, 6)})
|
| 1072 |
-
return points
|
| 1073 |
-
|
| 1074 |
-
|
| 1075 |
-
# ---------------------------------------------------------------------------
|
| 1076 |
-
# Données pour le diagramme de Venn (erreurs communes / exclusives)
|
| 1077 |
-
# ---------------------------------------------------------------------------
|
| 1078 |
-
|
| 1079 |
-
def compute_venn_data(
|
| 1080 |
-
engine_error_sets: dict[str, set[str]],
|
| 1081 |
-
) -> dict:
|
| 1082 |
-
"""Calcule les cardinalités pour un diagramme de Venn entre 2 ou 3 concurrents.
|
| 1083 |
-
|
| 1084 |
-
Parameters
|
| 1085 |
-
----------
|
| 1086 |
-
engine_error_sets : {engine_name → set of doc_id:error_token_pair strings}
|
| 1087 |
-
|
| 1088 |
-
Returns
|
| 1089 |
-
-------
|
| 1090 |
-
Pour 2 concurrents :
|
| 1091 |
-
{only_a, only_b, both, label_a, label_b}
|
| 1092 |
-
Pour 3 concurrents :
|
| 1093 |
-
{only_a, only_b, only_c, ab, ac, bc, abc, label_a, label_b, label_c}
|
| 1094 |
-
"""
|
| 1095 |
-
names = list(engine_error_sets.keys())[:3] # max 3 pour Venn lisible
|
| 1096 |
-
if len(names) < 2:
|
| 1097 |
-
return {}
|
| 1098 |
|
| 1099 |
-
|
| 1100 |
|
| 1101 |
-
|
| 1102 |
-
|
| 1103 |
-
|
| 1104 |
-
|
| 1105 |
-
"type": "venn2",
|
| 1106 |
-
"label_a": a,
|
| 1107 |
-
"label_b": b,
|
| 1108 |
-
"only_a": len(sa - sb),
|
| 1109 |
-
"only_b": len(sb - sa),
|
| 1110 |
-
"both": len(sa & sb),
|
| 1111 |
-
}
|
| 1112 |
-
else:
|
| 1113 |
-
a, b, c = names
|
| 1114 |
-
sa, sb, sc = sets[a], sets[b], sets[c]
|
| 1115 |
-
return {
|
| 1116 |
-
"type": "venn3",
|
| 1117 |
-
"label_a": a,
|
| 1118 |
-
"label_b": b,
|
| 1119 |
-
"label_c": c,
|
| 1120 |
-
"only_a": len(sa - sb - sc),
|
| 1121 |
-
"only_b": len(sb - sa - sc),
|
| 1122 |
-
"only_c": len(sc - sa - sb),
|
| 1123 |
-
"ab": len((sa & sb) - sc),
|
| 1124 |
-
"ac": len((sa & sc) - sb),
|
| 1125 |
-
"bc": len((sb & sc) - sa),
|
| 1126 |
-
"abc": len(sa & sb & sc),
|
| 1127 |
-
}
|
|
|
|
| 1 |
+
"""Alias rétrocompat — module déplacé dans :mod:`picarones.measurements.statistics`.
|
| 2 |
|
| 3 |
+
Phase E du chantier de refonte en 3 cercles. Cette mesure (Cercle 2)
|
| 4 |
+
n'est plus dans ``picarones.core/`` ; elle vit dans
|
| 5 |
+
``picarones.measurements/``. L'alias ici permet aux imports
|
| 6 |
+
historiques (``from picarones.core.statistics import ...``) de continuer
|
| 7 |
+
à fonctionner sans modification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
Voir :doc:`docs/architecture-cercles.md` pour la cartographie des
|
| 10 |
+
3 cercles. Le ``core/`` strict ne contient plus que les abstractions
|
| 11 |
+
du domaine et l'orchestration (Cercle 1).
|
| 12 |
+
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
+
from picarones.measurements.statistics import * # noqa: F401, F403
|
| 15 |
|
| 16 |
+
import picarones.measurements.statistics as _module
|
| 17 |
+
__all__ = getattr(_module, "__all__", [
|
| 18 |
+
nm for nm in dir(_module) if not nm.startswith("_")
|
| 19 |
+
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|