Claude commited on
Commit
c813aa1
·
unverified ·
1 Parent(s): 218b7fb

feat(migration): Lots H + I + J — statistics, htr_united/huggingface, MetricsResult

Browse files

Trois lots cumulés post-fix-templates. Aucun n'a nécessité la
création de nouveaux canoniques — tous étaient des shims plats
ou des partiels d'imports déjà migrés.

Lot H — measurements.statistics → evaluation.statistics
-------------------------------------------------------
Le sous-paquet ``picarones/measurements/statistics/`` (9
fichiers : ``__init__`` + 8 sous-modules) était entièrement
constitué de shims vers ``picarones.evaluation.statistics``.
Tous supprimés en bloc après migration des 70 imports tests.

Lot I — extras.importers → adapters.corpus
------------------------------------------
3 shims migrés et supprimés :

- ``extras.importers.htr_united`` →
``adapters.corpus.htr_united``
- ``extras.importers.huggingface`` →
``adapters.corpus.huggingface``
- ``extras.importers._fallback_log`` →
``adapters.corpus._fallback_log``

Le warning ``UserWarning`` du module ``huggingface`` a été
mis à jour pour citer le nouveau chemin.
``picarones/extras/importers/__init__.py`` ré-expose les
symboles depuis les canoniques pour préserver la rétrocompat
des callers (``from picarones.extras.importers import
HuggingFaceDataset, HTRUnitedEntry``).

Lot J — measurements.metrics partiel → evaluation.metric_result
---------------------------------------------------------------
Migration ciblée sur les **deux symboles canoniquement migrés**
(``MetricsResult``, ``aggregate_metrics``) : ~25 imports.
``compute_metrics`` reste dans ``picarones.measurements.metrics``
car aucun canonique n'existe pour cette fonction. Les imports
mixtes (``from picarones.measurements.metrics import
compute_metrics, aggregate_metrics, MetricsResult``) ont été
splittés en deux lignes : une vers le canonique, une vers le
legacy résiduel.

Tests d'architecture
--------------------
- ``test_no_flat_files_in_measurements::expected_subpackages``
réduit de ``{narrative, statistics, runner}`` à
``{narrative, runner}``.
- ``test_module_coverage::TEST_ONLY_BASELINE`` réduit de 4 à
3 entrées (``"statistics"`` retiré).
- ``test_file_budgets::FILE_BUDGETS`` débarrassé des entrées
orphelines (``extras/importers/htr_united.py``,
``extras/importers/huggingface.py``).
- ``test_doc_paths::BROKEN_PATHS_BASELINE`` 134 → 138. 4
nouveaux chemins cassés héritage dans ``docs/audits/*.md``
(intouchables).

Sync README + CLAUDE.md
-----------------------
``scripts/gen_readme_tables.py`` ré-exécuté : compteur de tests
global passe de 4978 (post-fix-templates) à 5000 collected
(arrondi à la dizaine), avec 4967 passed effectifs.

Acceptance
----------
- ``pytest tests/architecture/`` : 73 passed.
- ``pytest tests/`` : **0 failed, 0 errors, 4967 passed**.
- ``ruff check picarones/ tests/`` : All checks passed.

État final de la branche claude/migrate-core-to-domain-8ubIT
------------------------------------------------------------
À l'issue des Lots A à J + fix-templates :

- ``picarones/core/`` : entièrement supprimé.
- ``picarones/engines/`` : entièrement supprimé.
- ``picarones/modules/`` : entièrement supprimé.
- ``picarones/report/`` : entièrement supprimé.
- ``picarones/measurements/statistics/`` : entièrement supprimé.
- ``picarones/measurements/`` : 50+ → 24 fichiers résiduels.
- ``picarones/reports_v2/html/templates/`` : 10 templates HTML
restaurés (fix bug cc53ead).

Soit ~165 fichiers shims/orphelins supprimés et ~700 imports
tests migrés sur la branche.

Imports legacy restants
-----------------------
365 → 270 imports tests (majorité bloquée derrière création
de canoniques) :

- ``measurements.runner.{run_benchmark,
_compute_document_result}`` : 40 imports — bloqué (Phase 6).
- ``measurements.metrics.compute_metrics`` : 10 imports —
bloqué (canonique à créer).
- ``measurements.robustness.*`` : 20 imports — bloqué.
- ``pipelines.{base, over_normalization}`` : 22 imports —
bloqué (Phase 6).
- ``extras.importers.{gallica, escriptorium, iiif}`` : 50
imports — vrais fichiers, bloqué.
- ``llm.base`` + ``web.app`` : 20 imports — bloqué.

Toutes les migrations triviales sont terminées. La suite
nécessite création de canoniques (sprints dédiés).

https://claude.ai/code/session_011XQZNitg1rCgia8ZD1a2hP

Files changed (47) hide show
  1. CLAUDE.md +3 -3
  2. README.md +1 -1
  3. docs/migration/SESSION_HANDOVER.md +27 -0
  4. picarones/adapters/corpus/huggingface.py +1 -1
  5. picarones/extras/importers/__init__.py +14 -12
  6. picarones/extras/importers/_fallback_log.py +0 -7
  7. picarones/extras/importers/htr_united.py +0 -7
  8. picarones/extras/importers/huggingface.py +0 -11
  9. picarones/fixtures.py +1 -1
  10. picarones/measurements/runner/document.py +2 -1
  11. picarones/measurements/runner/partial.py +1 -1
  12. picarones/measurements/statistics/__init__.py +0 -55
  13. picarones/measurements/statistics/bootstrap.py +0 -23
  14. picarones/measurements/statistics/cdd_render.py +0 -23
  15. picarones/measurements/statistics/clustering.py +0 -24
  16. picarones/measurements/statistics/correlation.py +0 -23
  17. picarones/measurements/statistics/distributions.py +0 -24
  18. picarones/measurements/statistics/friedman_nemenyi.py +0 -27
  19. picarones/measurements/statistics/pareto.py +0 -23
  20. picarones/measurements/statistics/wilcoxon.py +0 -26
  21. picarones/web/routers/importers.py +4 -4
  22. tests/architecture/test_doc_paths.py +6 -1
  23. tests/architecture/test_file_budgets.py +3 -4
  24. tests/architecture/test_module_coverage.py +0 -1
  25. tests/architecture/test_no_flat_files_in_measurements.py +1 -1
  26. tests/core/test_sprint14_robust_filtering.py +1 -1
  27. tests/engines/test_sprint4_normalization_iiif.py +2 -1
  28. tests/extras/test_sprint8_escriptorium_gallica.py +2 -2
  29. tests/integration/test_sprint13_parallelisation_stats.py +11 -11
  30. tests/measurements/test_metrics.py +2 -1
  31. tests/measurements/test_pricing_degenerate_cases.py +1 -1
  32. tests/measurements/test_results.py +1 -1
  33. tests/measurements/test_sprint10_error_distribution.py +4 -4
  34. tests/measurements/test_sprint12_nouvelles_fonctionnalites.py +1 -1
  35. tests/measurements/test_sprint18_friedman_nemenyi_cdd.py +1 -1
  36. tests/measurements/test_sprint20_pareto_pricing.py +1 -1
  37. tests/measurements/test_sprint23_anti_hallucination.py +1 -1
  38. tests/measurements/test_sprint40_ner_runner.py +1 -1
  39. tests/measurements/test_sprint42_calibration_runner.py +1 -1
  40. tests/measurements/test_sprint44_median_default.py +1 -1
  41. tests/measurements/test_sprint45_stratification.py +1 -1
  42. tests/measurements/test_sprint61_philological_runner.py +1 -1
  43. tests/report/test_sprint46_stratification_html.py +1 -1
  44. tests/report/test_sprint7_advanced_report.py +54 -54
  45. tests/report/test_sprint86_aii5_html.py +1 -1
  46. tests/report/test_sprint87_readability_html.py +1 -1
  47. tests/web/test_sprint6_web_interface.py +25 -25
CLAUDE.md CHANGED
@@ -123,7 +123,7 @@ picarones/
123
 
124
  ## État des tests et bugs historiques
125
 
126
- `pytest tests/` → **5020 passed, 12 skipped, 8 deselected, 0 failed**
127
  (post-S59). Les deselected sont les markers `live` (5 tests d'intégration
128
  contre vraie API/binaire) + `network` (3 tests qui hit le réseau réel),
129
  opt-in en local via `pytest -m live` ou `pytest -m network`. Le
@@ -253,7 +253,7 @@ Résumé express :
253
 
254
  1. `git branch --show-current` → `claude/repo-analysis-cukvm`.
255
  2. `git status` → working tree clean.
256
- 3. `pytest tests/ -q --no-header --tb=line` → 5020 passed.
257
  4. `git log -1 --format=%B` → décrit la prochaine sub-phase.
258
 
259
  **Règles d'architecture critiques** (apprises à la dure) :
@@ -341,7 +341,7 @@ détecte, arbitre, rend.
341
  ## Contexte développement
342
 
343
  - **Environnement** : GitHub Codespaces, Python 3.11+
344
- - **Tests** : `pytest tests/ -q` → 5020 passed, 12 skipped, 24
345
  deselected, 0 failed (au moment de la pause de session).
346
  - **Plan d'évolution actif** : [`docs/roadmap/evolution-2026.md`](docs/roadmap/evolution-2026.md).
347
  - **Plan retrait du legacy (maître)** : [`docs/migration/legacy-retirement-plan.md`](docs/migration/legacy-retirement-plan.md).
 
123
 
124
  ## État des tests et bugs historiques
125
 
126
+ `pytest tests/` → **5000 passed, 12 skipped, 8 deselected, 0 failed**
127
  (post-S59). Les deselected sont les markers `live` (5 tests d'intégration
128
  contre vraie API/binaire) + `network` (3 tests qui hit le réseau réel),
129
  opt-in en local via `pytest -m live` ou `pytest -m network`. Le
 
253
 
254
  1. `git branch --show-current` → `claude/repo-analysis-cukvm`.
255
  2. `git status` → working tree clean.
256
+ 3. `pytest tests/ -q --no-header --tb=line` → 5000 passed.
257
  4. `git log -1 --format=%B` → décrit la prochaine sub-phase.
258
 
259
  **Règles d'architecture critiques** (apprises à la dure) :
 
341
  ## Contexte développement
342
 
343
  - **Environnement** : GitHub Codespaces, Python 3.11+
344
+ - **Tests** : `pytest tests/ -q` → 5000 passed, 12 skipped, 24
345
  deselected, 0 failed (au moment de la pause de session).
346
  - **Plan d'évolution actif** : [`docs/roadmap/evolution-2026.md`](docs/roadmap/evolution-2026.md).
347
  - **Plan retrait du legacy (maître)** : [`docs/migration/legacy-retirement-plan.md`](docs/migration/legacy-retirement-plan.md).
README.md CHANGED
@@ -395,7 +395,7 @@ ruff check picarones/ tests/
395
  python -m mypy picarones/core/
396
  ```
397
 
398
- **Test suite**: ~5020 tests, ~3 min on a modern laptop. Coverage
399
  floor at 85% (currently ~87%). The `network` marker excludes tests
400
  requiring live HTTP. A handful of tests depend on optional engines
401
  (`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when
 
395
  python -m mypy picarones/core/
396
  ```
397
 
398
+ **Test suite**: ~5000 tests, ~3 min on a modern laptop. Coverage
399
  floor at 85% (currently ~87%). The `network` marker excludes tests
400
  requiring live HTTP. A handful of tests depend on optional engines
401
  (`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when
docs/migration/SESSION_HANDOVER.md CHANGED
@@ -356,6 +356,33 @@ L'ordre recommandé, par lots de symboles cohérents :
356
  simple sed est impossible — il faudrait migrer les 76
357
  imports vers des modules qui n'existent pas encore.
358
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
359
  À chaque lot : sed → tests → commit. Les shims devenus
360
  orphelins après le lot peuvent être **supprimés** dans le même
361
  commit (principe « no shim survives its caller »).
 
356
  simple sed est impossible — il faudrait migrer les 76
357
  imports vers des modules qui n'existent pas encore.
358
 
359
+ 8. ✅ **Lot H — measurements.statistics → evaluation.statistics**
360
+ (~70 imports migrés, 9 shims supprimés en bloc) :
361
+ - ``measurements.statistics.{bootstrap, cdd_render,
362
+ clustering, correlation, distributions, friedman_nemenyi,
363
+ pareto, wilcoxon}`` → ``evaluation.statistics.{...}``.
364
+ - ``measurements/statistics/`` (sous-paquet entier)
365
+ supprimé.
366
+
367
+ 9. ✅ **Lot I — extras.importers → adapters.corpus**
368
+ (3 shims supprimés, ~15 imports migrés) :
369
+ - ``extras.importers.htr_united`` →
370
+ ``adapters.corpus.htr_united``.
371
+ - ``extras.importers.huggingface`` →
372
+ ``adapters.corpus.huggingface``.
373
+ - ``extras.importers._fallback_log`` →
374
+ ``adapters.corpus._fallback_log``.
375
+
376
+ 10. ✅ **Lot J — measurements.metrics.{MetricsResult,
377
+ aggregate_metrics} → evaluation.metric_result** (~25
378
+ imports migrés, 0 shim supprimé) :
379
+ - Migration partielle uniquement des symboles canoniquement
380
+ migrés (``MetricsResult``, ``aggregate_metrics``).
381
+ - ``compute_metrics`` reste dans
382
+ ``picarones.measurements.metrics`` car aucun canonique
383
+ n'existe pour cette fonction (sera traité avec le Lot G
384
+ reporté).
385
+
386
  À chaque lot : sed → tests → commit. Les shims devenus
387
  orphelins après le lot peuvent être **supprimés** dans le même
388
  commit (principe « no shim survives its caller »).
picarones/adapters/corpus/huggingface.py CHANGED
@@ -38,7 +38,7 @@ from typing import Optional
38
  # Émission du warning ``experimental`` à l'import. Phase C du chantier
39
  # de refonte — voir docstring du module ci-dessus.
40
  warnings.warn(
41
- "picarones.extras.importers.huggingface is experimental and may "
42
  "change or be removed without notice. Use at your own risk until "
43
  "an institutional use case validates the API.",
44
  category=UserWarning,
 
38
  # Émission du warning ``experimental`` à l'import. Phase C du chantier
39
  # de refonte — voir docstring du module ci-dessus.
40
  warnings.warn(
41
+ "picarones.adapters.corpus.huggingface is experimental and may "
42
  "change or be removed without notice. Use at your own risk until "
43
  "an institutional use case validates the API.",
44
  category=UserWarning,
picarones/extras/importers/__init__.py CHANGED
@@ -1,20 +1,22 @@
1
- """Importeurs de corpus depuis sources distantes (Cercle 3).
 
 
2
 
3
- Importeurs livrés
4
- -----------------
5
  - :mod:`_http` — helpers HTTP partagés (validate_http_url, download_url)
6
  - :mod:`iiif` — manifestes IIIF v2/v3 (Bodleian, BnF, Vatican…)
7
- - :mod:`htr_united` — datasets HTR-United (CC0, GitHub)
8
  - :mod:`gallica` — BnF Gallica (SRU + IIIF + OCR brut)
9
- - :mod:`huggingface` — datasets HuggingFace ⚠ **expérimental**
10
  - :mod:`escriptorium` — projets eScriptorium ⚠ **expérimental**
11
 
12
- Modules expérimentaux
13
- ---------------------
14
- ``huggingface`` et ``escriptorium`` émettent un ``UserWarning`` à
15
- l'import. Ils sont fonctionnellement présents mais leur usage en
16
- production n'est pas garanti — l'API HuggingFace Datasets évolue
17
- fréquemment et eScriptorium n'a qu'un test isolé.
 
 
 
 
18
  """
19
 
20
  from picarones.extras.importers.iiif import IIIFImporter, import_iiif_manifest
@@ -30,7 +32,7 @@ from picarones.extras.importers.escriptorium import (
30
  EScriptoriumDocument,
31
  connect_escriptorium,
32
  )
33
- from picarones.extras.importers._fallback_log import (
34
  consume_fallback_log,
35
  peek_fallback_log,
36
  record_fallback,
 
1
+ """Importeurs de corpus depuis sources distantes.
2
+
3
+ Importeurs livrés ici (legacy, en cours de retrait) :
4
 
 
 
5
  - :mod:`_http` — helpers HTTP partagés (validate_http_url, download_url)
6
  - :mod:`iiif` — manifestes IIIF v2/v3 (Bodleian, BnF, Vatican…)
 
7
  - :mod:`gallica` — BnF Gallica (SRU + IIIF + OCR brut)
 
8
  - :mod:`escriptorium` — projets eScriptorium ⚠ **expérimental**
9
 
10
+ Importeurs migrés vers :mod:`picarones.adapters.corpus` (Lot I) :
11
+
12
+ - ``htr_united`` :mod:`picarones.adapters.corpus.htr_united`
13
+ - ``huggingface`` → :mod:`picarones.adapters.corpus.huggingface`
14
+ **expérimental**
15
+ - ``_fallback_log`` → :mod:`picarones.adapters.corpus._fallback_log`
16
+
17
+ L'API publique de ce package re-expose ces modules canoniques pour
18
+ préserver la rétrocompat (``from picarones.extras.importers import
19
+ HuggingFaceDataset, HTRUnitedEntry, …``).
20
  """
21
 
22
  from picarones.extras.importers.iiif import IIIFImporter, import_iiif_manifest
 
32
  EScriptoriumDocument,
33
  connect_escriptorium,
34
  )
35
+ from picarones.adapters.corpus._fallback_log import (
36
  consume_fallback_log,
37
  peek_fallback_log,
38
  record_fallback,
picarones/extras/importers/_fallback_log.py DELETED
@@ -1,7 +0,0 @@
1
- """Re-export — Sprint A14-S11. Le contenu canonique vit dans
2
- ``picarones.adapters.corpus._fallback_log``.
3
- """
4
-
5
- from __future__ import annotations
6
-
7
- from picarones.adapters.corpus._fallback_log import * # noqa: F401,F403
 
 
 
 
 
 
 
 
picarones/extras/importers/htr_united.py DELETED
@@ -1,7 +0,0 @@
1
- """Re-export — Sprint A14-S11. Le contenu canonique vit dans
2
- ``picarones.adapters.corpus.htr_united``.
3
- """
4
-
5
- from __future__ import annotations
6
-
7
- from picarones.adapters.corpus.htr_united import * # noqa: F401,F403
 
 
 
 
 
 
 
 
picarones/extras/importers/huggingface.py DELETED
@@ -1,11 +0,0 @@
1
- """Re-export — Sprint A14-S11. Le contenu canonique vit dans
2
- ``picarones.adapters.corpus.huggingface``.
3
-
4
- Ré-expose explicitement ``_REFERENCE_DATASETS`` (importé par les
5
- tests web).
6
- """
7
-
8
- from __future__ import annotations
9
-
10
- from picarones.adapters.corpus.huggingface import * # noqa: F401,F403
11
- from picarones.adapters.corpus.huggingface import _REFERENCE_DATASETS # noqa: F401
 
 
 
 
 
 
 
 
 
 
 
 
picarones/fixtures.py CHANGED
@@ -13,7 +13,7 @@ import random
13
  import struct
14
  import zlib
15
 
16
- from picarones.measurements.metrics import MetricsResult
17
  from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
18
  from picarones.pipelines.over_normalization import detect_over_normalization
19
  # Sprint 5 — métriques avancées
 
13
  import struct
14
  import zlib
15
 
16
+ from picarones.evaluation.metric_result import MetricsResult
17
  from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
18
  from picarones.pipelines.over_normalization import detect_over_normalization
19
  # Sprint 5 — métriques avancées
picarones/measurements/runner/document.py CHANGED
@@ -16,7 +16,8 @@ from typing import Optional
16
 
17
  from picarones.evaluation.benchmark_result import DocumentResult
18
  from picarones.adapters.legacy_engines.base import EngineResult
19
- from picarones.measurements.metrics import MetricsResult, compute_metrics
 
20
 
21
 
22
  def _calibration_from_engine_result(
 
16
 
17
  from picarones.evaluation.benchmark_result import DocumentResult
18
  from picarones.adapters.legacy_engines.base import EngineResult
19
+ from picarones.evaluation.metric_result import MetricsResult
20
+ from picarones.measurements.metrics import compute_metrics
21
 
22
 
23
  def _calibration_from_engine_result(
picarones/measurements/runner/partial.py CHANGED
@@ -21,7 +21,7 @@ from pathlib import Path
21
  from typing import Optional
22
 
23
  from picarones.evaluation.benchmark_result import DocumentResult
24
- from picarones.measurements.metrics import MetricsResult
25
 
26
  logger = logging.getLogger(__name__)
27
 
 
21
  from typing import Optional
22
 
23
  from picarones.evaluation.benchmark_result import DocumentResult
24
+ from picarones.evaluation.metric_result import MetricsResult
25
 
26
  logger = logging.getLogger(__name__)
27
 
picarones/measurements/statistics/__init__.py DELETED
@@ -1,55 +0,0 @@
1
- """``picarones.measurements.statistics`` — shim re-export (déprécié, suppression 2.0).
2
-
3
- Canonique : :mod:`picarones.evaluation.statistics`. Migration ::
4
-
5
- from picarones.evaluation.statistics import (
6
- bootstrap_ci, wilcoxon_test, friedman_test, ...
7
- )
8
-
9
- Tous les symboles publics de l'API legacy (incluant les privés
10
- ``_SCIPY_AVAILABLE``, ``_chi_square_sf``, ``_nemenyi_critical_value``,
11
- ``_rank_row`` consommés par certains tests) restent accessibles
12
- identiquement.
13
- """
14
-
15
- from __future__ import annotations
16
-
17
- import warnings
18
-
19
- from picarones.evaluation.statistics import (
20
- _SCIPY_AVAILABLE,
21
- _chi_square_sf,
22
- _nemenyi_critical_value,
23
- _rank_row,
24
- ErrorCluster,
25
- bootstrap_ci,
26
- build_critical_difference_svg,
27
- cluster_errors,
28
- compute_correlation_matrix,
29
- compute_pairwise_stats,
30
- compute_pareto_front,
31
- compute_reliability_curve,
32
- compute_venn_data,
33
- friedman_test,
34
- nemenyi_posthoc,
35
- wilcoxon_test,
36
- )
37
-
38
- warnings.warn(
39
- "picarones.measurements.statistics is deprecated and will be "
40
- "removed in 2.0. Import from picarones.evaluation.statistics instead.",
41
- DeprecationWarning,
42
- stacklevel=2,
43
- )
44
-
45
- __all__ = [
46
- "bootstrap_ci",
47
- "wilcoxon_test", "compute_pairwise_stats",
48
- "friedman_test", "nemenyi_posthoc", "build_critical_difference_svg",
49
- "compute_pareto_front",
50
- "ErrorCluster", "cluster_errors",
51
- "compute_correlation_matrix",
52
- "compute_reliability_curve", "compute_venn_data",
53
- "_SCIPY_AVAILABLE", "_chi_square_sf",
54
- "_nemenyi_critical_value", "_rank_row",
55
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
picarones/measurements/statistics/bootstrap.py DELETED
@@ -1,23 +0,0 @@
1
- """``picarones.measurements.statistics.bootstrap`` — shim re-export (déprécié, suppression 2.0).
2
-
3
- Canonique : :mod:`picarones.evaluation.statistics.bootstrap`. Migration ::
4
-
5
- from picarones.evaluation.statistics import ...
6
- """
7
-
8
- from __future__ import annotations
9
-
10
- import warnings
11
-
12
- from picarones.evaluation.statistics.bootstrap import (
13
- bootstrap_ci,
14
- )
15
-
16
- warnings.warn(
17
- "picarones.measurements.statistics.bootstrap is deprecated and will be "
18
- "removed in 2.0. Import from picarones.evaluation.statistics instead.",
19
- DeprecationWarning,
20
- stacklevel=2,
21
- )
22
-
23
- __all__ = ['bootstrap_ci']
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
picarones/measurements/statistics/cdd_render.py DELETED
@@ -1,23 +0,0 @@
1
- """``picarones.measurements.statistics.cdd_render`` — shim re-export (déprécié, suppression 2.0).
2
-
3
- Canonique : :mod:`picarones.evaluation.statistics.cdd_render`. Migration ::
4
-
5
- from picarones.evaluation.statistics import ...
6
- """
7
-
8
- from __future__ import annotations
9
-
10
- import warnings
11
-
12
- from picarones.evaluation.statistics.cdd_render import (
13
- build_critical_difference_svg,
14
- )
15
-
16
- warnings.warn(
17
- "picarones.measurements.statistics.cdd_render is deprecated and will be "
18
- "removed in 2.0. Import from picarones.evaluation.statistics instead.",
19
- DeprecationWarning,
20
- stacklevel=2,
21
- )
22
-
23
- __all__ = ['build_critical_difference_svg']
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
picarones/measurements/statistics/clustering.py DELETED
@@ -1,24 +0,0 @@
1
- """``picarones.measurements.statistics.clustering`` — shim re-export (déprécié, suppression 2.0).
2
-
3
- Canonique : :mod:`picarones.evaluation.statistics.clustering`. Migration ::
4
-
5
- from picarones.evaluation.statistics import ...
6
- """
7
-
8
- from __future__ import annotations
9
-
10
- import warnings
11
-
12
- from picarones.evaluation.statistics.clustering import (
13
- ErrorCluster,
14
- cluster_errors,
15
- )
16
-
17
- warnings.warn(
18
- "picarones.measurements.statistics.clustering is deprecated and will be "
19
- "removed in 2.0. Import from picarones.evaluation.statistics instead.",
20
- DeprecationWarning,
21
- stacklevel=2,
22
- )
23
-
24
- __all__ = ['ErrorCluster', 'cluster_errors']
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
picarones/measurements/statistics/correlation.py DELETED
@@ -1,23 +0,0 @@
1
- """``picarones.measurements.statistics.correlation`` — shim re-export (déprécié, suppression 2.0).
2
-
3
- Canonique : :mod:`picarones.evaluation.statistics.correlation`. Migration ::
4
-
5
- from picarones.evaluation.statistics import ...
6
- """
7
-
8
- from __future__ import annotations
9
-
10
- import warnings
11
-
12
- from picarones.evaluation.statistics.correlation import (
13
- compute_correlation_matrix,
14
- )
15
-
16
- warnings.warn(
17
- "picarones.measurements.statistics.correlation is deprecated and will be "
18
- "removed in 2.0. Import from picarones.evaluation.statistics instead.",
19
- DeprecationWarning,
20
- stacklevel=2,
21
- )
22
-
23
- __all__ = ['compute_correlation_matrix']
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
picarones/measurements/statistics/distributions.py DELETED
@@ -1,24 +0,0 @@
1
- """``picarones.measurements.statistics.distributions`` — shim re-export (déprécié, suppression 2.0).
2
-
3
- Canonique : :mod:`picarones.evaluation.statistics.distributions`. Migration ::
4
-
5
- from picarones.evaluation.statistics import ...
6
- """
7
-
8
- from __future__ import annotations
9
-
10
- import warnings
11
-
12
- from picarones.evaluation.statistics.distributions import (
13
- compute_reliability_curve,
14
- compute_venn_data,
15
- )
16
-
17
- warnings.warn(
18
- "picarones.measurements.statistics.distributions is deprecated and will be "
19
- "removed in 2.0. Import from picarones.evaluation.statistics instead.",
20
- DeprecationWarning,
21
- stacklevel=2,
22
- )
23
-
24
- __all__ = ['compute_reliability_curve', 'compute_venn_data']
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
picarones/measurements/statistics/friedman_nemenyi.py DELETED
@@ -1,27 +0,0 @@
1
- """``picarones.measurements.statistics.friedman_nemenyi`` — shim re-export (déprécié, suppression 2.0).
2
-
3
- Canonique : :mod:`picarones.evaluation.statistics.friedman_nemenyi`. Migration ::
4
-
5
- from picarones.evaluation.statistics import ...
6
- """
7
-
8
- from __future__ import annotations
9
-
10
- import warnings
11
-
12
- from picarones.evaluation.statistics.friedman_nemenyi import (
13
- friedman_test,
14
- nemenyi_posthoc,
15
- _chi_square_sf,
16
- _nemenyi_critical_value,
17
- _rank_row,
18
- )
19
-
20
- warnings.warn(
21
- "picarones.measurements.statistics.friedman_nemenyi is deprecated and will be "
22
- "removed in 2.0. Import from picarones.evaluation.statistics instead.",
23
- DeprecationWarning,
24
- stacklevel=2,
25
- )
26
-
27
- __all__ = ['friedman_test', 'nemenyi_posthoc', '_chi_square_sf', '_nemenyi_critical_value', '_rank_row']
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
picarones/measurements/statistics/pareto.py DELETED
@@ -1,23 +0,0 @@
1
- """``picarones.measurements.statistics.pareto`` — shim re-export (déprécié, suppression 2.0).
2
-
3
- Canonique : :mod:`picarones.evaluation.statistics.pareto`. Migration ::
4
-
5
- from picarones.evaluation.statistics import ...
6
- """
7
-
8
- from __future__ import annotations
9
-
10
- import warnings
11
-
12
- from picarones.evaluation.statistics.pareto import (
13
- compute_pareto_front,
14
- )
15
-
16
- warnings.warn(
17
- "picarones.measurements.statistics.pareto is deprecated and will be "
18
- "removed in 2.0. Import from picarones.evaluation.statistics instead.",
19
- DeprecationWarning,
20
- stacklevel=2,
21
- )
22
-
23
- __all__ = ['compute_pareto_front']
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
picarones/measurements/statistics/wilcoxon.py DELETED
@@ -1,26 +0,0 @@
1
- """``picarones.measurements.statistics.wilcoxon`` — shim re-export (déprécié, suppression 2.0).
2
-
3
- Canonique : :mod:`picarones.evaluation.statistics.wilcoxon`. Migration ::
4
-
5
- from picarones.evaluation.statistics import ...
6
- """
7
-
8
- from __future__ import annotations
9
-
10
- import warnings
11
-
12
- from picarones.evaluation.statistics.wilcoxon import (
13
- compute_pairwise_stats,
14
- wilcoxon_test,
15
- _SCIPY_AVAILABLE,
16
- _normal_sf,
17
- )
18
-
19
- warnings.warn(
20
- "picarones.measurements.statistics.wilcoxon is deprecated and will be "
21
- "removed in 2.0. Import from picarones.evaluation.statistics instead.",
22
- DeprecationWarning,
23
- stacklevel=2,
24
- )
25
-
26
- __all__ = ['compute_pairwise_stats', 'wilcoxon_test', '_SCIPY_AVAILABLE', '_normal_sf']
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
picarones/web/routers/importers.py CHANGED
@@ -20,7 +20,7 @@ async def api_htr_united_catalogue(
20
  script: str = Query(default="", description="Filtre type d'écriture"),
21
  ) -> dict:
22
  """Catalogue HTR-United filtrable."""
23
- from picarones.extras.importers.htr_united import HTRUnitedCatalogue
24
 
25
  cat = HTRUnitedCatalogue.from_demo()
26
  results = cat.search(
@@ -40,7 +40,7 @@ async def api_htr_united_catalogue(
40
  @router.post("/api/htr-united/import")
41
  async def api_htr_united_import(req: HTRUnitedImportRequest) -> dict:
42
  """Importe une entrée HTR-United dans ``req.output_dir``."""
43
- from picarones.extras.importers.htr_united import (
44
  HTRUnitedCatalogue,
45
  import_htr_united_corpus,
46
  )
@@ -71,7 +71,7 @@ async def api_huggingface_search(
71
  limit: int = Query(default=20, ge=1, le=50),
72
  ) -> dict:
73
  """Recherche de datasets sur HuggingFace Hub."""
74
- from picarones.extras.importers.huggingface import HuggingFaceImporter
75
 
76
  tag_list = [t.strip() for t in tags.split(",") if t.strip()] if tags else None
77
  importer = HuggingFaceImporter()
@@ -90,7 +90,7 @@ async def api_huggingface_search(
90
  @router.post("/api/huggingface/import")
91
  async def api_huggingface_import(req: HuggingFaceImportRequest) -> dict:
92
  """Importe un dataset HuggingFace dans ``req.output_dir``."""
93
- from picarones.extras.importers.huggingface import HuggingFaceImporter
94
 
95
  importer = HuggingFaceImporter()
96
  return importer.import_dataset(
 
20
  script: str = Query(default="", description="Filtre type d'écriture"),
21
  ) -> dict:
22
  """Catalogue HTR-United filtrable."""
23
+ from picarones.adapters.corpus.htr_united import HTRUnitedCatalogue
24
 
25
  cat = HTRUnitedCatalogue.from_demo()
26
  results = cat.search(
 
40
  @router.post("/api/htr-united/import")
41
  async def api_htr_united_import(req: HTRUnitedImportRequest) -> dict:
42
  """Importe une entrée HTR-United dans ``req.output_dir``."""
43
+ from picarones.adapters.corpus.htr_united import (
44
  HTRUnitedCatalogue,
45
  import_htr_united_corpus,
46
  )
 
71
  limit: int = Query(default=20, ge=1, le=50),
72
  ) -> dict:
73
  """Recherche de datasets sur HuggingFace Hub."""
74
+ from picarones.adapters.corpus.huggingface import HuggingFaceImporter
75
 
76
  tag_list = [t.strip() for t in tags.split(",") if t.strip()] if tags else None
77
  importer = HuggingFaceImporter()
 
90
  @router.post("/api/huggingface/import")
91
  async def api_huggingface_import(req: HuggingFaceImportRequest) -> dict:
92
  """Importe un dataset HuggingFace dans ``req.output_dir``."""
93
+ from picarones.adapters.corpus.huggingface import HuggingFaceImporter
94
 
95
  importer = HuggingFaceImporter()
96
  return importer.import_dataset(
tests/architecture/test_doc_paths.py CHANGED
@@ -97,6 +97,11 @@ REPO_ROOT = Path(__file__).resolve().parents[2]
97
  #: suppression des 2 derniers shims de ``picarones/core/``. Le
98
  #: sous-paquet ``core/`` n'existe plus du tout. Deux nouveaux
99
  #: chemins cassés héritage dans ``CHANGELOG.md`` (intouchable).
 
 
 
 
 
100
  #:
101
  #: Les chemins cassés restants sont **TOUS** dans :
102
  #: - ``CHANGELOG.md`` : journal historique versionné, intouchable.
@@ -105,7 +110,7 @@ REPO_ROOT = Path(__file__).resolve().parents[2]
105
  #: - ``docs/migration/{executor-equivalence, legacy-retirement-plan}.md`` :
106
  #: audits/plans historiques (citent des chemins legacy à des fins
107
  #: de comparaison).
108
- BROKEN_PATHS_BASELINE = 134
109
 
110
  #: Patrons de fichiers de documentation à scanner.
111
  DOC_GLOBS: tuple[str, ...] = (
 
97
  #: suppression des 2 derniers shims de ``picarones/core/``. Le
98
  #: sous-paquet ``core/`` n'existe plus du tout. Deux nouveaux
99
  #: chemins cassés héritage dans ``CHANGELOG.md`` (intouchable).
100
+ #: - 138 (sprints « Lots H + I », 2026-05-07) : suppression du
101
+ #: sous-paquet ``measurements/statistics/`` (Lot H, 9 shims) et
102
+ #: des 3 shims ``extras/importers/{htr_united, huggingface,
103
+ #: _fallback_log}`` (Lot I). Quatre nouveaux chemins cassés
104
+ #: héritage répartis dans ``docs/audits/*.md`` (intouchables).
105
  #:
106
  #: Les chemins cassés restants sont **TOUS** dans :
107
  #: - ``CHANGELOG.md`` : journal historique versionné, intouchable.
 
110
  #: - ``docs/migration/{executor-equivalence, legacy-retirement-plan}.md`` :
111
  #: audits/plans historiques (citent des chemins legacy à des fins
112
  #: de comparaison).
113
+ BROKEN_PATHS_BASELINE = 138
114
 
115
  #: Patrons de fichiers de documentation à scanner.
116
  DOC_GLOBS: tuple[str, ...] = (
tests/architecture/test_file_budgets.py CHANGED
@@ -123,13 +123,12 @@ FILE_BUDGETS: dict[str, int] = {
123
  # ``measurements/roman_numerals.py`` a été supprimé. Seul le
124
  # canonique ``evaluation/metrics/roman_numerals.py`` reste.
125
  "picarones/evaluation/metrics/roman_numerals.py": 575, # actuel 484
126
- "picarones/extras/importers/htr_united.py": 575, # actuel 473 (re-export S11)
127
- # Sprint A14-S11 — d\xc3\xa9plac\xc3\xa9s depuis extras/importers/, l'ancien
128
- # emplacement est d\xc3\xa9sormais un re-export.
129
  "picarones/adapters/corpus/htr_united.py": 575, # actuel 473
130
  "picarones/adapters/corpus/huggingface.py": 550, # actuel 464
131
  "picarones/cli/_workflows.py": 550, # actuel 469
132
- "picarones/extras/importers/huggingface.py": 550, # actuel 464
133
  # Phase 4-ter : ``core/metric_hooks.py`` est désormais un shim
134
  # (≤ 80 l). Le contenu canonique vit dans ``evaluation/`` ;
135
  # même budget pour la même raison historique (centralise les
 
123
  # ``measurements/roman_numerals.py`` a été supprimé. Seul le
124
  # canonique ``evaluation/metrics/roman_numerals.py`` reste.
125
  "picarones/evaluation/metrics/roman_numerals.py": 575, # actuel 484
126
+ # Sprint A14-S11 + Lot I — déplacés depuis extras/importers/.
127
+ # Les shims ``extras/importers/{htr_united, huggingface,
128
+ # _fallback_log}`` ont été supprimés au Lot I (mai 2026).
129
  "picarones/adapters/corpus/htr_united.py": 575, # actuel 473
130
  "picarones/adapters/corpus/huggingface.py": 550, # actuel 464
131
  "picarones/cli/_workflows.py": 550, # actuel 469
 
132
  # Phase 4-ter : ``core/metric_hooks.py`` est désormais un shim
133
  # (≤ 80 l). Le contenu canonique vit dans ``evaluation/`` ;
134
  # même budget pour la même raison historique (centralise les
tests/architecture/test_module_coverage.py CHANGED
@@ -71,7 +71,6 @@ TEST_ONLY_BASELINE: frozenset[str] = frozenset({
71
  "numerical_sequences_hooks",
72
  "pipeline_benchmark",
73
  "pipeline_comparison",
74
- "statistics",
75
  })
76
 
77
 
 
71
  "numerical_sequences_hooks",
72
  "pipeline_benchmark",
73
  "pipeline_comparison",
 
74
  })
75
 
76
 
tests/architecture/test_no_flat_files_in_measurements.py CHANGED
@@ -128,7 +128,7 @@ def test_no_orphaned_whitelist_entries() -> None:
128
  def test_subpackages_not_affected() -> None:
129
  """Méta-test : les sous-packages existants de ``measurements/``
130
  (narrative, statistics, runner) restent intouchés par ce test."""
131
- expected_subpackages = {"narrative", "statistics", "runner"}
132
  actual = {
133
  p.name for p in MEASUREMENTS_DIR.iterdir()
134
  if p.is_dir() and not p.name.startswith("_") and "__pycache__" not in p.name
 
128
  def test_subpackages_not_affected() -> None:
129
  """Méta-test : les sous-packages existants de ``measurements/``
130
  (narrative, statistics, runner) restent intouchés par ce test."""
131
+ expected_subpackages = {"narrative", "runner"}
132
  actual = {
133
  p.name for p in MEASUREMENTS_DIR.iterdir()
134
  if p.is_dir() and not p.name.startswith("_") and "__pycache__" not in p.name
tests/core/test_sprint14_robust_filtering.py CHANGED
@@ -23,7 +23,7 @@ import pytest
23
  def _make_fake_benchmark():
24
  """Retourne un BenchmarkResult minimal pour tester le générateur."""
25
  from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
26
- from picarones.measurements.metrics import MetricsResult
27
 
28
  def _metrics(cer, wer=0.2):
29
  return MetricsResult(
 
23
  def _make_fake_benchmark():
24
  """Retourne un BenchmarkResult minimal pour tester le générateur."""
25
  from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
26
+ from picarones.evaluation.metric_result import MetricsResult
27
 
28
  def _metrics(cer, wer=0.2):
29
  return MetricsResult(
tests/engines/test_sprint4_normalization_iiif.py CHANGED
@@ -10,7 +10,8 @@ from picarones.evaluation.metrics.normalization import (
10
  _apply_diplomatic_table,
11
  get_builtin_profile,
12
  )
13
- from picarones.measurements.metrics import compute_metrics, aggregate_metrics, MetricsResult
 
14
  from picarones.extras.importers.iiif import (
15
  IIIFManifestParser,
16
  parse_page_selector,
 
10
  _apply_diplomatic_table,
11
  get_builtin_profile,
12
  )
13
+ from picarones.evaluation.metric_result import aggregate_metrics, MetricsResult
14
+ from picarones.measurements.metrics import compute_metrics
15
  from picarones.extras.importers.iiif import (
16
  IIIFManifestParser,
17
  parse_page_selector,
tests/extras/test_sprint8_escriptorium_gallica.py CHANGED
@@ -162,7 +162,7 @@ class TestEScriptoriumExport:
162
 
163
  def _make_benchmark(self, engine_name: str = "tesseract") -> "BenchmarkResult":
164
  from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
165
- from picarones.measurements.metrics import MetricsResult
166
  metrics = MetricsResult(cer=0.05, wer=0.10, cer_nfc=0.05,
167
  cer_caseless=0.04, cer_diplomatic=0.04,
168
  wer_normalized=0.09, mer=0.09, wil=0.05,
@@ -228,7 +228,7 @@ class TestEScriptoriumExport:
228
  def test_export_skips_error_docs(self):
229
  from picarones.extras.importers.escriptorium import EScriptoriumClient
230
  from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
231
- from picarones.measurements.metrics import MetricsResult
232
  metrics = MetricsResult(cer=0.1, wer=0.2, cer_nfc=0.1, cer_caseless=0.1,
233
  cer_diplomatic=0.1, wer_normalized=0.2, mer=0.2, wil=0.1,
234
  reference_length=50, hypothesis_length=50)
 
162
 
163
  def _make_benchmark(self, engine_name: str = "tesseract") -> "BenchmarkResult":
164
  from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
165
+ from picarones.evaluation.metric_result import MetricsResult
166
  metrics = MetricsResult(cer=0.05, wer=0.10, cer_nfc=0.05,
167
  cer_caseless=0.04, cer_diplomatic=0.04,
168
  wer_normalized=0.09, mer=0.09, wil=0.05,
 
228
  def test_export_skips_error_docs(self):
229
  from picarones.extras.importers.escriptorium import EScriptoriumClient
230
  from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
231
+ from picarones.evaluation.metric_result import MetricsResult
232
  metrics = MetricsResult(cer=0.1, wer=0.2, cer_nfc=0.1, cer_caseless=0.1,
233
  cer_diplomatic=0.1, wer_normalized=0.2, mer=0.2, wil=0.1,
234
  reference_length=50, hypothesis_length=50)
tests/integration/test_sprint13_parallelisation_stats.py CHANGED
@@ -418,7 +418,7 @@ class TestRunnerSilentExceptions:
418
 
419
  # Créer un doc_result avec des données de confusion corrompues
420
  from picarones.evaluation.benchmark_result import DocumentResult
421
- from picarones.measurements.metrics import MetricsResult
422
  bad_dr = DocumentResult(
423
  doc_id="x", image_path="x.png", ground_truth="gt", hypothesis="hyp",
424
  metrics=MetricsResult(cer=0.1, cer_nfc=0.1, cer_caseless=0.1,
@@ -441,7 +441,7 @@ class TestWilcoxonValidation:
441
 
442
  def test_identical_sequences_not_significant(self):
443
  """Séquences identiques → pas de différence, p = 1.0, significant = False."""
444
- from picarones.measurements.statistics import wilcoxon_test
445
  a = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
446
  r = wilcoxon_test(a, a)
447
  assert r["significant"] is False
@@ -450,7 +450,7 @@ class TestWilcoxonValidation:
450
 
451
  def test_all_positive_diffs_w_minus_is_zero(self):
452
  """Si toutes les différences a−b sont positives : W⁻ = 0, W⁺ = n(n+1)/2."""
453
- from picarones.measurements.statistics import wilcoxon_test
454
  n = 10
455
  a = [float(i) for i in range(1, n + 1)]
456
  b = [0.0] * n
@@ -461,7 +461,7 @@ class TestWilcoxonValidation:
461
 
462
  def test_w_plus_w_minus_sum_invariant(self):
463
  """W⁺ + W⁻ doit toujours être égal à n(n+1)/2 (n = nombre de paires non nulles)."""
464
- from picarones.measurements.statistics import wilcoxon_test
465
  a = [0.10, 0.25, 0.05, 0.40, 0.30, 0.15, 0.20, 0.35, 0.08, 0.18]
466
  b = [0.12, 0.20, 0.08, 0.35, 0.28, 0.18, 0.15, 0.40, 0.10, 0.20]
467
  r = wilcoxon_test(a, b)
@@ -474,7 +474,7 @@ class TestWilcoxonValidation:
474
 
475
  def test_clearly_different_sequences_significant(self):
476
  """Deux séquences très différentes (n=15) doivent donner p < 0.05."""
477
- from picarones.measurements.statistics import wilcoxon_test
478
  a = [0.05] * 15 # moteur A très performant
479
  b = [0.60] * 15 # moteur B peu performant — toutes diff = −0.55
480
  # Diffs a−b = −0.55 pour tous → W⁺ = 0 → devrait être significatif
@@ -484,7 +484,7 @@ class TestWilcoxonValidation:
484
 
485
  def test_large_n_normal_approximation_reasonable(self):
486
  """Pour n = 20, l'approximation normale doit donner une p-value dans [0, 1]."""
487
- from picarones.measurements.statistics import wilcoxon_test
488
  import random
489
  rng = random.Random(42)
490
  a = [rng.uniform(0.1, 0.5) for _ in range(20)]
@@ -495,7 +495,7 @@ class TestWilcoxonValidation:
495
 
496
  def test_small_n_returns_conservative_p(self):
497
  """Pour n < 10, la p-value doit être 0.04 (significatif) ou 0.20 (non sign.)."""
498
- from picarones.measurements.statistics import wilcoxon_test, _SCIPY_AVAILABLE
499
  if _SCIPY_AVAILABLE:
500
  pytest.skip("scipy disponible — la table exacte n'est pas utilisée")
501
  a = [0.1, 0.2, 0.3]
@@ -506,7 +506,7 @@ class TestWilcoxonValidation:
506
 
507
  def test_result_keys_complete(self):
508
  """Le dict retourné doit contenir toutes les clés documentées."""
509
- from picarones.measurements.statistics import wilcoxon_test
510
  r = wilcoxon_test([0.1, 0.3, 0.2, 0.4, 0.15, 0.35, 0.25, 0.5, 0.45, 0.05],
511
  [0.2, 0.2, 0.3, 0.3, 0.25, 0.25, 0.35, 0.35, 0.40, 0.15])
512
  for key in ("statistic", "p_value", "significant", "interpretation", "n_pairs", "W_plus", "W_minus"):
@@ -521,12 +521,12 @@ class TestWilcoxonScipyIntegration:
521
 
522
  def test_scipy_available_flag_is_bool(self):
523
  """_SCIPY_AVAILABLE doit être un booléen."""
524
- from picarones.measurements.statistics import _SCIPY_AVAILABLE
525
  assert isinstance(_SCIPY_AVAILABLE, bool)
526
 
527
  def test_scipy_and_native_agree_on_significance(self):
528
  """Scipy et l'implémentation native doivent s'accorder sur la significativité."""
529
- from picarones.measurements.statistics import wilcoxon_test, _SCIPY_AVAILABLE
530
  if not _SCIPY_AVAILABLE:
531
  pytest.skip("scipy non disponible")
532
 
@@ -542,7 +542,7 @@ class TestWilcoxonScipyIntegration:
542
 
543
  def test_scipy_p_value_in_valid_range(self):
544
  """La p-value fournie par scipy doit être dans [0, 1]."""
545
- from picarones.measurements.statistics import wilcoxon_test, _SCIPY_AVAILABLE
546
  if not _SCIPY_AVAILABLE:
547
  pytest.skip("scipy non disponible")
548
 
 
418
 
419
  # Créer un doc_result avec des données de confusion corrompues
420
  from picarones.evaluation.benchmark_result import DocumentResult
421
+ from picarones.evaluation.metric_result import MetricsResult
422
  bad_dr = DocumentResult(
423
  doc_id="x", image_path="x.png", ground_truth="gt", hypothesis="hyp",
424
  metrics=MetricsResult(cer=0.1, cer_nfc=0.1, cer_caseless=0.1,
 
441
 
442
  def test_identical_sequences_not_significant(self):
443
  """Séquences identiques → pas de différence, p = 1.0, significant = False."""
444
+ from picarones.evaluation.statistics import wilcoxon_test
445
  a = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
446
  r = wilcoxon_test(a, a)
447
  assert r["significant"] is False
 
450
 
451
  def test_all_positive_diffs_w_minus_is_zero(self):
452
  """Si toutes les différences a−b sont positives : W⁻ = 0, W⁺ = n(n+1)/2."""
453
+ from picarones.evaluation.statistics import wilcoxon_test
454
  n = 10
455
  a = [float(i) for i in range(1, n + 1)]
456
  b = [0.0] * n
 
461
 
462
  def test_w_plus_w_minus_sum_invariant(self):
463
  """W⁺ + W⁻ doit toujours être égal à n(n+1)/2 (n = nombre de paires non nulles)."""
464
+ from picarones.evaluation.statistics import wilcoxon_test
465
  a = [0.10, 0.25, 0.05, 0.40, 0.30, 0.15, 0.20, 0.35, 0.08, 0.18]
466
  b = [0.12, 0.20, 0.08, 0.35, 0.28, 0.18, 0.15, 0.40, 0.10, 0.20]
467
  r = wilcoxon_test(a, b)
 
474
 
475
  def test_clearly_different_sequences_significant(self):
476
  """Deux séquences très différentes (n=15) doivent donner p < 0.05."""
477
+ from picarones.evaluation.statistics import wilcoxon_test
478
  a = [0.05] * 15 # moteur A très performant
479
  b = [0.60] * 15 # moteur B peu performant — toutes diff = −0.55
480
  # Diffs a−b = −0.55 pour tous → W⁺ = 0 → devrait être significatif
 
484
 
485
  def test_large_n_normal_approximation_reasonable(self):
486
  """Pour n = 20, l'approximation normale doit donner une p-value dans [0, 1]."""
487
+ from picarones.evaluation.statistics import wilcoxon_test
488
  import random
489
  rng = random.Random(42)
490
  a = [rng.uniform(0.1, 0.5) for _ in range(20)]
 
495
 
496
  def test_small_n_returns_conservative_p(self):
497
  """Pour n < 10, la p-value doit être 0.04 (significatif) ou 0.20 (non sign.)."""
498
+ from picarones.evaluation.statistics import wilcoxon_test, _SCIPY_AVAILABLE
499
  if _SCIPY_AVAILABLE:
500
  pytest.skip("scipy disponible — la table exacte n'est pas utilisée")
501
  a = [0.1, 0.2, 0.3]
 
506
 
507
  def test_result_keys_complete(self):
508
  """Le dict retourné doit contenir toutes les clés documentées."""
509
+ from picarones.evaluation.statistics import wilcoxon_test
510
  r = wilcoxon_test([0.1, 0.3, 0.2, 0.4, 0.15, 0.35, 0.25, 0.5, 0.45, 0.05],
511
  [0.2, 0.2, 0.3, 0.3, 0.25, 0.25, 0.35, 0.35, 0.40, 0.15])
512
  for key in ("statistic", "p_value", "significant", "interpretation", "n_pairs", "W_plus", "W_minus"):
 
521
 
522
  def test_scipy_available_flag_is_bool(self):
523
  """_SCIPY_AVAILABLE doit être un booléen."""
524
+ from picarones.evaluation.statistics import _SCIPY_AVAILABLE
525
  assert isinstance(_SCIPY_AVAILABLE, bool)
526
 
527
  def test_scipy_and_native_agree_on_significance(self):
528
  """Scipy et l'implémentation native doivent s'accorder sur la significativité."""
529
+ from picarones.evaluation.statistics import wilcoxon_test, _SCIPY_AVAILABLE
530
  if not _SCIPY_AVAILABLE:
531
  pytest.skip("scipy non disponible")
532
 
 
542
 
543
  def test_scipy_p_value_in_valid_range(self):
544
  """La p-value fournie par scipy doit être dans [0, 1]."""
545
+ from picarones.evaluation.statistics import wilcoxon_test, _SCIPY_AVAILABLE
546
  if not _SCIPY_AVAILABLE:
547
  pytest.skip("scipy non disponible")
548
 
tests/measurements/test_metrics.py CHANGED
@@ -2,7 +2,8 @@
2
 
3
  import pytest
4
 
5
- from picarones.measurements.metrics import aggregate_metrics, compute_metrics, MetricsResult
 
6
 
7
 
8
  class TestComputeMetrics:
 
2
 
3
  import pytest
4
 
5
+ from picarones.evaluation.metric_result import aggregate_metrics, MetricsResult
6
+ from picarones.measurements.metrics import compute_metrics
7
 
8
 
9
  class TestComputeMetrics:
tests/measurements/test_pricing_degenerate_cases.py CHANGED
@@ -26,7 +26,7 @@ from picarones.evaluation.metrics.pricing import (
26
  estimate_cost,
27
  load_pricing_database,
28
  )
29
- from picarones.measurements.statistics import compute_pareto_front
30
 
31
 
32
  # ---------------------------------------------------------------------------
 
26
  estimate_cost,
27
  load_pricing_database,
28
  )
29
+ from picarones.evaluation.statistics import compute_pareto_front
30
 
31
 
32
  # ---------------------------------------------------------------------------
tests/measurements/test_results.py CHANGED
@@ -3,7 +3,7 @@
3
  import json
4
  import pytest
5
 
6
- from picarones.measurements.metrics import MetricsResult
7
  from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
8
 
9
 
 
3
  import json
4
  import pytest
5
 
6
+ from picarones.evaluation.metric_result import MetricsResult
7
  from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
8
 
9
 
tests/measurements/test_sprint10_error_distribution.py CHANGED
@@ -225,7 +225,7 @@ class TestLineMetricsInResults:
225
 
226
  def test_document_result_has_line_metrics_field(self):
227
  from picarones.evaluation.benchmark_result import DocumentResult
228
- from picarones.measurements.metrics import MetricsResult
229
  dr = DocumentResult(
230
  doc_id="test_001",
231
  image_path="/test/img.jpg",
@@ -245,7 +245,7 @@ class TestLineMetricsInResults:
245
 
246
  def test_document_result_has_hallucination_metrics_field(self):
247
  from picarones.evaluation.benchmark_result import DocumentResult
248
- from picarones.measurements.metrics import MetricsResult
249
  dr = DocumentResult(
250
  doc_id="test_002",
251
  image_path="/test/img.jpg",
@@ -265,7 +265,7 @@ class TestLineMetricsInResults:
265
 
266
  def test_document_result_as_dict_includes_sprint10_fields(self):
267
  from picarones.evaluation.benchmark_result import DocumentResult
268
- from picarones.measurements.metrics import MetricsResult
269
  dr = DocumentResult(
270
  doc_id="test_003",
271
  image_path="/test/img.jpg",
@@ -287,7 +287,7 @@ class TestLineMetricsInResults:
287
 
288
  def test_engine_report_has_aggregated_sprint10_fields(self):
289
  from picarones.evaluation.benchmark_result import EngineReport, DocumentResult
290
- from picarones.measurements.metrics import MetricsResult
291
  dr = DocumentResult(
292
  doc_id="test_004",
293
  image_path="/test/img.jpg",
 
225
 
226
  def test_document_result_has_line_metrics_field(self):
227
  from picarones.evaluation.benchmark_result import DocumentResult
228
+ from picarones.evaluation.metric_result import MetricsResult
229
  dr = DocumentResult(
230
  doc_id="test_001",
231
  image_path="/test/img.jpg",
 
245
 
246
  def test_document_result_has_hallucination_metrics_field(self):
247
  from picarones.evaluation.benchmark_result import DocumentResult
248
+ from picarones.evaluation.metric_result import MetricsResult
249
  dr = DocumentResult(
250
  doc_id="test_002",
251
  image_path="/test/img.jpg",
 
265
 
266
  def test_document_result_as_dict_includes_sprint10_fields(self):
267
  from picarones.evaluation.benchmark_result import DocumentResult
268
+ from picarones.evaluation.metric_result import MetricsResult
269
  dr = DocumentResult(
270
  doc_id="test_003",
271
  image_path="/test/img.jpg",
 
287
 
288
  def test_engine_report_has_aggregated_sprint10_fields(self):
289
  from picarones.evaluation.benchmark_result import EngineReport, DocumentResult
290
+ from picarones.evaluation.metric_result import MetricsResult
291
  dr = DocumentResult(
292
  doc_id="test_004",
293
  image_path="/test/img.jpg",
tests/measurements/test_sprint12_nouvelles_fonctionnalites.py CHANGED
@@ -195,7 +195,7 @@ def sample_generator():
195
  """Fixture partagée : crée un ReportGenerator avec des données fictives."""
196
  from picarones.reports_v2.html.generator import ReportGenerator
197
  from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
198
- from picarones.measurements.metrics import MetricsResult
199
 
200
  def _make_metric(cer=0.1):
201
  return MetricsResult(
 
195
  """Fixture partagée : crée un ReportGenerator avec des données fictives."""
196
  from picarones.reports_v2.html.generator import ReportGenerator
197
  from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
198
+ from picarones.evaluation.metric_result import MetricsResult
199
 
200
  def _make_metric(cer=0.1):
201
  return MetricsResult(
tests/measurements/test_sprint18_friedman_nemenyi_cdd.py CHANGED
@@ -14,7 +14,7 @@ import re
14
 
15
  import pytest
16
 
17
- from picarones.measurements.statistics import (
18
  build_critical_difference_svg,
19
  friedman_test,
20
  nemenyi_posthoc,
 
14
 
15
  import pytest
16
 
17
+ from picarones.evaluation.statistics import (
18
  build_critical_difference_svg,
19
  friedman_test,
20
  nemenyi_posthoc,
tests/measurements/test_sprint20_pareto_pricing.py CHANGED
@@ -26,7 +26,7 @@ from picarones.evaluation.metrics.pricing import (
26
  estimate_cost,
27
  load_pricing_database,
28
  )
29
- from picarones.measurements.statistics import compute_pareto_front
30
 
31
 
32
  # ---------------------------------------------------------------------------
 
26
  estimate_cost,
27
  load_pricing_database,
28
  )
29
+ from picarones.evaluation.statistics import compute_pareto_front
30
 
31
 
32
  # ---------------------------------------------------------------------------
tests/measurements/test_sprint23_anti_hallucination.py CHANGED
@@ -38,7 +38,7 @@ from picarones.measurements.narrative import (
38
  select_facts,
39
  )
40
  from picarones.measurements.narrative.arbiter import DEFAULT_TYPE_ORDER
41
- from picarones.measurements.statistics import bootstrap_ci
42
 
43
  ROOT = Path(__file__).parent.parent.parent
44
  TEMPLATES_DIR = ROOT / "picarones" / "measurements" / "narrative" / "templates"
 
38
  select_facts,
39
  )
40
  from picarones.measurements.narrative.arbiter import DEFAULT_TYPE_ORDER
41
+ from picarones.evaluation.statistics import bootstrap_ci
42
 
43
  ROOT = Path(__file__).parent.parent.parent
44
  TEMPLATES_DIR = ROOT / "picarones" / "measurements" / "narrative" / "templates"
tests/measurements/test_sprint40_ner_runner.py CHANGED
@@ -97,7 +97,7 @@ def _make_document_result(
97
  hypothesis: str = "Marie de Bourgogne en 1477.",
98
  ner_metrics: dict | None = None,
99
  ) -> DocumentResult:
100
- from picarones.measurements.metrics import MetricsResult
101
 
102
  return DocumentResult(
103
  doc_id=doc_id,
 
97
  hypothesis: str = "Marie de Bourgogne en 1477.",
98
  ner_metrics: dict | None = None,
99
  ) -> DocumentResult:
100
+ from picarones.evaluation.metric_result import MetricsResult
101
 
102
  return DocumentResult(
103
  doc_id=doc_id,
tests/measurements/test_sprint42_calibration_runner.py CHANGED
@@ -59,7 +59,7 @@ class TestEngineResultExtension:
59
 
60
 
61
  def _make_dr(calibration_metrics: dict | None = None) -> DocumentResult:
62
- from picarones.measurements.metrics import MetricsResult
63
 
64
  return DocumentResult(
65
  doc_id="d1", image_path="/tmp/x.png",
 
59
 
60
 
61
  def _make_dr(calibration_metrics: dict | None = None) -> DocumentResult:
62
+ from picarones.evaluation.metric_result import MetricsResult
63
 
64
  return DocumentResult(
65
  doc_id="d1", image_path="/tmp/x.png",
tests/measurements/test_sprint44_median_default.py CHANGED
@@ -23,7 +23,7 @@ import re
23
 
24
  import pytest
25
 
26
- from picarones.measurements.metrics import MetricsResult
27
  from picarones.measurements.narrative.detectors import detect_median_mean_gap_warning
28
  from picarones.domain.facts import FactImportance, FactType
29
  from picarones.measurements.narrative.renderer import extract_numbers, render_fact
 
23
 
24
  import pytest
25
 
26
+ from picarones.evaluation.metric_result import MetricsResult
27
  from picarones.measurements.narrative.detectors import detect_median_mean_gap_warning
28
  from picarones.domain.facts import FactImportance, FactType
29
  from picarones.measurements.narrative.renderer import extract_numbers, render_fact
tests/measurements/test_sprint45_stratification.py CHANGED
@@ -26,7 +26,7 @@ from __future__ import annotations
26
 
27
  import pytest
28
 
29
- from picarones.measurements.metrics import MetricsResult
30
  from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
31
 
32
 
 
26
 
27
  import pytest
28
 
29
+ from picarones.evaluation.metric_result import MetricsResult
30
  from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
31
 
32
 
tests/measurements/test_sprint61_philological_runner.py CHANGED
@@ -29,7 +29,7 @@ from picarones.measurements.philological_hooks import (
29
  compute_philological_metrics,
30
  )
31
  from picarones.evaluation.benchmark_result import DocumentResult, EngineReport
32
- from picarones.measurements.metrics import MetricsResult
33
 
34
 
35
  def _make_doc(
 
29
  compute_philological_metrics,
30
  )
31
  from picarones.evaluation.benchmark_result import DocumentResult, EngineReport
32
+ from picarones.evaluation.metric_result import MetricsResult
33
 
34
 
35
  def _make_doc(
tests/report/test_sprint46_stratification_html.py CHANGED
@@ -26,7 +26,7 @@ from pathlib import Path
26
 
27
  import pytest
28
 
29
- from picarones.measurements.metrics import MetricsResult
30
  from picarones.measurements.narrative.detectors import detect_stratification_recommended
31
  from picarones.domain.facts import FactImportance, FactType
32
  from picarones.measurements.narrative.renderer import extract_numbers, render_fact
 
26
 
27
  import pytest
28
 
29
+ from picarones.evaluation.metric_result import MetricsResult
30
  from picarones.measurements.narrative.detectors import detect_stratification_recommended
31
  from picarones.domain.facts import FactImportance, FactType
32
  from picarones.measurements.narrative.renderer import extract_numbers, render_fact
tests/report/test_sprint7_advanced_report.py CHANGED
@@ -53,41 +53,41 @@ def html_s7(sample_benchmark_s7):
53
 
54
  class TestBootstrapCI:
55
  def test_returns_tuple_of_two(self):
56
- from picarones.measurements.statistics import bootstrap_ci
57
  result = bootstrap_ci([0.1, 0.2, 0.3])
58
  assert isinstance(result, tuple) and len(result) == 2
59
 
60
  def test_lower_le_upper(self):
61
- from picarones.measurements.statistics import bootstrap_ci
62
  lo, hi = bootstrap_ci([0.1, 0.2, 0.3, 0.4, 0.5])
63
  assert lo <= hi
64
 
65
  def test_ci_contains_mean(self):
66
- from picarones.measurements.statistics import bootstrap_ci
67
  values = [0.1, 0.15, 0.2, 0.12, 0.18, 0.13, 0.17]
68
  lo, hi = bootstrap_ci(values)
69
  mean = sum(values) / len(values)
70
  assert lo <= mean <= hi
71
 
72
  def test_empty_returns_zeros(self):
73
- from picarones.measurements.statistics import bootstrap_ci
74
  lo, hi = bootstrap_ci([])
75
  assert lo == 0.0 and hi == 0.0
76
 
77
  def test_single_value(self):
78
- from picarones.measurements.statistics import bootstrap_ci
79
  lo, hi = bootstrap_ci([0.25])
80
  assert lo <= 0.25 <= hi
81
 
82
  def test_reproducible_with_seed(self):
83
- from picarones.measurements.statistics import bootstrap_ci
84
  vals = [0.1, 0.2, 0.3, 0.15, 0.25]
85
  r1 = bootstrap_ci(vals, seed=1)
86
  r2 = bootstrap_ci(vals, seed=1)
87
  assert r1 == r2
88
 
89
  def test_wider_with_more_variance(self):
90
- from picarones.measurements.statistics import bootstrap_ci
91
  narrow = [0.10, 0.11, 0.10, 0.11, 0.10]
92
  wide = [0.01, 0.50, 0.02, 0.49, 0.01]
93
  lo_n, hi_n = bootstrap_ci(narrow, n_iter=500)
@@ -101,7 +101,7 @@ class TestBootstrapCI:
101
 
102
  class TestWilcoxonTest:
103
  def test_returns_dict_with_keys(self):
104
- from picarones.measurements.statistics import wilcoxon_test
105
  r = wilcoxon_test([0.1]*5, [0.1]*5)
106
  assert "statistic" in r
107
  assert "p_value" in r
@@ -109,13 +109,13 @@ class TestWilcoxonTest:
109
  assert "interpretation" in r
110
 
111
  def test_identical_series_not_significant(self):
112
- from picarones.measurements.statistics import wilcoxon_test
113
  vals = [0.1, 0.2, 0.3, 0.15, 0.05]
114
  r = wilcoxon_test(vals, vals)
115
  assert not r["significant"]
116
 
117
  def test_clearly_different_series_significant(self):
118
- from picarones.measurements.statistics import wilcoxon_test
119
  a = [0.01]*12
120
  b = [0.80]*12
121
  r = wilcoxon_test(a, b)
@@ -123,37 +123,37 @@ class TestWilcoxonTest:
123
  assert r["p_value"] < 0.05
124
 
125
  def test_p_value_in_range(self):
126
- from picarones.measurements.statistics import wilcoxon_test
127
  a = [0.1, 0.15, 0.2, 0.08]
128
  b = [0.2, 0.25, 0.3, 0.18]
129
  r = wilcoxon_test(a, b)
130
  assert 0.0 <= r["p_value"] <= 1.0
131
 
132
  def test_interpretation_is_string(self):
133
- from picarones.measurements.statistics import wilcoxon_test
134
  r = wilcoxon_test([0.1, 0.2], [0.1, 0.2])
135
  assert isinstance(r["interpretation"], str) and len(r["interpretation"]) > 10
136
 
137
  def test_n_pairs_correct(self):
138
- from picarones.measurements.statistics import wilcoxon_test
139
  r = wilcoxon_test([0.1, 0.2, 0.3], [0.1, 0.2, 0.3])
140
  # tous les diffs = 0, filtrés en mode wilcox
141
  assert r["n_pairs"] == 0
142
 
143
  def test_mismatched_lengths_raises(self):
144
- from picarones.measurements.statistics import wilcoxon_test
145
  with pytest.raises(ValueError):
146
  wilcoxon_test([0.1, 0.2], [0.1])
147
 
148
  def test_w_plus_w_minus_present(self):
149
- from picarones.measurements.statistics import wilcoxon_test
150
  a = [0.1, 0.2, 0.3, 0.15, 0.25, 0.18, 0.12, 0.22, 0.08, 0.27]
151
  b = [0.2, 0.3, 0.4, 0.25, 0.35, 0.28, 0.22, 0.32, 0.18, 0.37]
152
  r = wilcoxon_test(a, b)
153
  assert "W_plus" in r and "W_minus" in r
154
 
155
  def test_significant_larger_sample(self):
156
- from picarones.measurements.statistics import wilcoxon_test
157
  import random
158
  rng = random.Random(0)
159
  a = [rng.uniform(0.0, 0.05) for _ in range(15)]
@@ -162,7 +162,7 @@ class TestWilcoxonTest:
162
  assert r["significant"]
163
 
164
  def test_symmetry(self):
165
- from picarones.measurements.statistics import wilcoxon_test
166
  a = [0.1, 0.2, 0.3, 0.15, 0.25, 0.18, 0.22, 0.08, 0.27, 0.14]
167
  b = [0.2, 0.3, 0.4, 0.25, 0.35, 0.28, 0.32, 0.18, 0.37, 0.24]
168
  r_ab = wilcoxon_test(a, b)
@@ -177,35 +177,35 @@ class TestWilcoxonTest:
177
 
178
  class TestPairwiseStats:
179
  def test_returns_list(self):
180
- from picarones.measurements.statistics import compute_pairwise_stats
181
  r = compute_pairwise_stats({"A": [0.1, 0.2], "B": [0.3, 0.4]})
182
  assert isinstance(r, list)
183
 
184
  def test_correct_pair_count_2_engines(self):
185
- from picarones.measurements.statistics import compute_pairwise_stats
186
  r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
187
  assert len(r) == 1
188
 
189
  def test_correct_pair_count_3_engines(self):
190
- from picarones.measurements.statistics import compute_pairwise_stats
191
  r = compute_pairwise_stats({
192
  "A": [0.1]*5, "B": [0.2]*5, "C": [0.3]*5
193
  })
194
  assert len(r) == 3
195
 
196
  def test_pair_has_engine_names(self):
197
- from picarones.measurements.statistics import compute_pairwise_stats
198
  r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
199
  assert r[0]["engine_a"] in ["A", "B"]
200
  assert r[0]["engine_b"] in ["A", "B"]
201
 
202
  def test_pair_has_p_value(self):
203
- from picarones.measurements.statistics import compute_pairwise_stats
204
  r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
205
  assert "p_value" in r[0]
206
 
207
  def test_single_engine_returns_empty(self):
208
- from picarones.measurements.statistics import compute_pairwise_stats
209
  r = compute_pairwise_stats({"A": [0.1]*5})
210
  assert r == []
211
 
@@ -216,33 +216,33 @@ class TestPairwiseStats:
216
 
217
  class TestReliabilityCurve:
218
  def test_returns_list(self):
219
- from picarones.measurements.statistics import compute_reliability_curve
220
  r = compute_reliability_curve([0.1, 0.2, 0.3])
221
  assert isinstance(r, list)
222
 
223
  def test_correct_number_of_steps(self):
224
- from picarones.measurements.statistics import compute_reliability_curve
225
  r = compute_reliability_curve([0.1]*10, steps=5)
226
  assert len(r) == 5
227
 
228
  def test_pct_docs_increases(self):
229
- from picarones.measurements.statistics import compute_reliability_curve
230
  r = compute_reliability_curve([0.1, 0.2, 0.3, 0.4, 0.5], steps=5)
231
  pcts = [p["pct_docs"] for p in r]
232
  assert pcts == sorted(pcts)
233
 
234
  def test_mean_cer_increases(self):
235
- from picarones.measurements.statistics import compute_reliability_curve
236
  r = compute_reliability_curve([0.05, 0.10, 0.20, 0.30, 0.50], steps=5)
237
  cers = [p["mean_cer"] for p in r]
238
  assert cers[0] <= cers[-1]
239
 
240
  def test_empty_returns_empty(self):
241
- from picarones.measurements.statistics import compute_reliability_curve
242
  assert compute_reliability_curve([]) == []
243
 
244
  def test_last_point_includes_all(self):
245
- from picarones.measurements.statistics import compute_reliability_curve
246
  vals = [0.1, 0.2, 0.3]
247
  r = compute_reliability_curve(vals, steps=4)
248
  last = r[-1]
@@ -250,7 +250,7 @@ class TestReliabilityCurve:
250
  assert last["mean_cer"] == pytest.approx(expected, rel=1e-4)
251
 
252
  def test_each_point_has_required_keys(self):
253
- from picarones.measurements.statistics import compute_reliability_curve
254
  r = compute_reliability_curve([0.1, 0.2, 0.3], steps=3)
255
  for p in r:
256
  assert "pct_docs" in p and "mean_cer" in p
@@ -262,47 +262,47 @@ class TestReliabilityCurve:
262
 
263
  class TestVennData:
264
  def test_venn2_type(self):
265
- from picarones.measurements.statistics import compute_venn_data
266
  r = compute_venn_data({"A": {"e1","e2"}, "B": {"e2","e3"}})
267
  assert r["type"] == "venn2"
268
 
269
  def test_venn3_type(self):
270
- from picarones.measurements.statistics import compute_venn_data
271
  r = compute_venn_data({"A": {"e1"}, "B": {"e2"}, "C": {"e3"}})
272
  assert r["type"] == "venn3"
273
 
274
  def test_venn2_counts_correct(self):
275
- from picarones.measurements.statistics import compute_venn_data
276
  r = compute_venn_data({"A": {"e1","e2","e3"}, "B": {"e2","e3","e4"}})
277
  assert r["only_a"] == 1
278
  assert r["only_b"] == 1
279
  assert r["both"] == 2
280
 
281
  def test_venn2_disjoint(self):
282
- from picarones.measurements.statistics import compute_venn_data
283
  r = compute_venn_data({"A": {"e1"}, "B": {"e2"}})
284
  assert r["both"] == 0
285
  assert r["only_a"] == 1
286
  assert r["only_b"] == 1
287
 
288
  def test_venn2_subset(self):
289
- from picarones.measurements.statistics import compute_venn_data
290
  r = compute_venn_data({"A": {"e1","e2"}, "B": {"e1","e2","e3"}})
291
  assert r["only_a"] == 0
292
 
293
  def test_venn3_abc_count(self):
294
- from picarones.measurements.statistics import compute_venn_data
295
  shared = {"e1","e2"}
296
  r = compute_venn_data({"A": shared, "B": shared, "C": shared})
297
  assert r["abc"] == 2
298
 
299
  def test_empty_returns_empty(self):
300
- from picarones.measurements.statistics import compute_venn_data
301
  r = compute_venn_data({})
302
  assert r == {}
303
 
304
  def test_labels_present(self):
305
- from picarones.measurements.statistics import compute_venn_data
306
  r = compute_venn_data({"moteur_a": {"e1"}, "moteur_b": {"e2"}})
307
  assert r["label_a"] == "moteur_a"
308
  assert r["label_b"] == "moteur_b"
@@ -324,17 +324,17 @@ class TestErrorClustering:
324
  ]
325
 
326
  def test_returns_list(self):
327
- from picarones.measurements.statistics import cluster_errors
328
  result = cluster_errors(self._sample_data())
329
  assert isinstance(result, list)
330
 
331
  def test_max_clusters_respected(self):
332
- from picarones.measurements.statistics import cluster_errors
333
  result = cluster_errors(self._sample_data(), max_clusters=3)
334
  assert len(result) <= 3
335
 
336
  def test_cluster_has_required_keys(self):
337
- from picarones.measurements.statistics import cluster_errors
338
  result = cluster_errors(self._sample_data())
339
  if result:
340
  c = result[0]
@@ -344,7 +344,7 @@ class TestErrorClustering:
344
  assert hasattr(c, "examples")
345
 
346
  def test_as_dict_method(self):
347
- from picarones.measurements.statistics import cluster_errors
348
  result = cluster_errors(self._sample_data())
349
  if result:
350
  d = result[0].as_dict()
@@ -354,24 +354,24 @@ class TestErrorClustering:
354
  assert "examples" in d
355
 
356
  def test_sorted_by_count_descending(self):
357
- from picarones.measurements.statistics import cluster_errors
358
  result = cluster_errors(self._sample_data())
359
  if len(result) >= 2:
360
  assert result[0].count >= result[1].count
361
 
362
  def test_examples_capped_at_5(self):
363
- from picarones.measurements.statistics import cluster_errors
364
  result = cluster_errors(self._sample_data())
365
  for c in result:
366
  assert len(c.as_dict()["examples"]) <= 5
367
 
368
  def test_empty_data_returns_empty(self):
369
- from picarones.measurements.statistics import cluster_errors
370
  result = cluster_errors([])
371
  assert result == []
372
 
373
  def test_cluster_id_unique(self):
374
- from picarones.measurements.statistics import cluster_errors
375
  result = cluster_errors(self._sample_data())
376
  ids = [c.cluster_id for c in result]
377
  assert len(ids) == len(set(ids))
@@ -392,12 +392,12 @@ class TestCorrelationMatrix:
392
  ]
393
 
394
  def test_returns_dict_with_labels_and_matrix(self):
395
- from picarones.measurements.statistics import compute_correlation_matrix
396
  r = compute_correlation_matrix(self._sample_metrics())
397
  assert "labels" in r and "matrix" in r
398
 
399
  def test_matrix_is_square(self):
400
- from picarones.measurements.statistics import compute_correlation_matrix
401
  r = compute_correlation_matrix(self._sample_metrics())
402
  n = len(r["labels"])
403
  assert len(r["matrix"]) == n
@@ -405,13 +405,13 @@ class TestCorrelationMatrix:
405
  assert len(row) == n
406
 
407
  def test_diagonal_is_one(self):
408
- from picarones.measurements.statistics import compute_correlation_matrix
409
  r = compute_correlation_matrix(self._sample_metrics())
410
  for i in range(len(r["labels"])):
411
  assert r["matrix"][i][i] == pytest.approx(1.0)
412
 
413
  def test_cer_quality_negatively_correlated(self):
414
- from picarones.measurements.statistics import compute_correlation_matrix
415
  r = compute_correlation_matrix(self._sample_metrics())
416
  labels = r["labels"]
417
  if "cer" in labels and "quality_score" in labels:
@@ -420,7 +420,7 @@ class TestCorrelationMatrix:
420
  assert r["matrix"][i][j] < 0 # plus la qualité est bonne, plus le CER est bas
421
 
422
  def test_symmetric_matrix(self):
423
- from picarones.measurements.statistics import compute_correlation_matrix
424
  r = compute_correlation_matrix(self._sample_metrics())
425
  n = len(r["labels"])
426
  for i in range(n):
@@ -428,18 +428,18 @@ class TestCorrelationMatrix:
428
  assert r["matrix"][i][j] == pytest.approx(r["matrix"][j][i], abs=1e-6)
429
 
430
  def test_empty_returns_empty(self):
431
- from picarones.measurements.statistics import compute_correlation_matrix
432
  r = compute_correlation_matrix([])
433
  assert r == {"labels": [], "matrix": []}
434
 
435
  def test_custom_metric_keys(self):
436
- from picarones.measurements.statistics import compute_correlation_matrix
437
  data = [{"a": 1.0, "b": 2.0, "c": 3.0}] * 5
438
  r = compute_correlation_matrix(data, metric_keys=["a", "b"])
439
  assert r["labels"] == ["a", "b"]
440
 
441
  def test_values_in_range(self):
442
- from picarones.measurements.statistics import compute_correlation_matrix
443
  r = compute_correlation_matrix(self._sample_metrics())
444
  for row in r["matrix"]:
445
  for v in row:
 
53
 
54
  class TestBootstrapCI:
55
  def test_returns_tuple_of_two(self):
56
+ from picarones.evaluation.statistics import bootstrap_ci
57
  result = bootstrap_ci([0.1, 0.2, 0.3])
58
  assert isinstance(result, tuple) and len(result) == 2
59
 
60
  def test_lower_le_upper(self):
61
+ from picarones.evaluation.statistics import bootstrap_ci
62
  lo, hi = bootstrap_ci([0.1, 0.2, 0.3, 0.4, 0.5])
63
  assert lo <= hi
64
 
65
  def test_ci_contains_mean(self):
66
+ from picarones.evaluation.statistics import bootstrap_ci
67
  values = [0.1, 0.15, 0.2, 0.12, 0.18, 0.13, 0.17]
68
  lo, hi = bootstrap_ci(values)
69
  mean = sum(values) / len(values)
70
  assert lo <= mean <= hi
71
 
72
  def test_empty_returns_zeros(self):
73
+ from picarones.evaluation.statistics import bootstrap_ci
74
  lo, hi = bootstrap_ci([])
75
  assert lo == 0.0 and hi == 0.0
76
 
77
  def test_single_value(self):
78
+ from picarones.evaluation.statistics import bootstrap_ci
79
  lo, hi = bootstrap_ci([0.25])
80
  assert lo <= 0.25 <= hi
81
 
82
  def test_reproducible_with_seed(self):
83
+ from picarones.evaluation.statistics import bootstrap_ci
84
  vals = [0.1, 0.2, 0.3, 0.15, 0.25]
85
  r1 = bootstrap_ci(vals, seed=1)
86
  r2 = bootstrap_ci(vals, seed=1)
87
  assert r1 == r2
88
 
89
  def test_wider_with_more_variance(self):
90
+ from picarones.evaluation.statistics import bootstrap_ci
91
  narrow = [0.10, 0.11, 0.10, 0.11, 0.10]
92
  wide = [0.01, 0.50, 0.02, 0.49, 0.01]
93
  lo_n, hi_n = bootstrap_ci(narrow, n_iter=500)
 
101
 
102
  class TestWilcoxonTest:
103
  def test_returns_dict_with_keys(self):
104
+ from picarones.evaluation.statistics import wilcoxon_test
105
  r = wilcoxon_test([0.1]*5, [0.1]*5)
106
  assert "statistic" in r
107
  assert "p_value" in r
 
109
  assert "interpretation" in r
110
 
111
  def test_identical_series_not_significant(self):
112
+ from picarones.evaluation.statistics import wilcoxon_test
113
  vals = [0.1, 0.2, 0.3, 0.15, 0.05]
114
  r = wilcoxon_test(vals, vals)
115
  assert not r["significant"]
116
 
117
  def test_clearly_different_series_significant(self):
118
+ from picarones.evaluation.statistics import wilcoxon_test
119
  a = [0.01]*12
120
  b = [0.80]*12
121
  r = wilcoxon_test(a, b)
 
123
  assert r["p_value"] < 0.05
124
 
125
  def test_p_value_in_range(self):
126
+ from picarones.evaluation.statistics import wilcoxon_test
127
  a = [0.1, 0.15, 0.2, 0.08]
128
  b = [0.2, 0.25, 0.3, 0.18]
129
  r = wilcoxon_test(a, b)
130
  assert 0.0 <= r["p_value"] <= 1.0
131
 
132
  def test_interpretation_is_string(self):
133
+ from picarones.evaluation.statistics import wilcoxon_test
134
  r = wilcoxon_test([0.1, 0.2], [0.1, 0.2])
135
  assert isinstance(r["interpretation"], str) and len(r["interpretation"]) > 10
136
 
137
  def test_n_pairs_correct(self):
138
+ from picarones.evaluation.statistics import wilcoxon_test
139
  r = wilcoxon_test([0.1, 0.2, 0.3], [0.1, 0.2, 0.3])
140
  # tous les diffs = 0, filtrés en mode wilcox
141
  assert r["n_pairs"] == 0
142
 
143
  def test_mismatched_lengths_raises(self):
144
+ from picarones.evaluation.statistics import wilcoxon_test
145
  with pytest.raises(ValueError):
146
  wilcoxon_test([0.1, 0.2], [0.1])
147
 
148
  def test_w_plus_w_minus_present(self):
149
+ from picarones.evaluation.statistics import wilcoxon_test
150
  a = [0.1, 0.2, 0.3, 0.15, 0.25, 0.18, 0.12, 0.22, 0.08, 0.27]
151
  b = [0.2, 0.3, 0.4, 0.25, 0.35, 0.28, 0.22, 0.32, 0.18, 0.37]
152
  r = wilcoxon_test(a, b)
153
  assert "W_plus" in r and "W_minus" in r
154
 
155
  def test_significant_larger_sample(self):
156
+ from picarones.evaluation.statistics import wilcoxon_test
157
  import random
158
  rng = random.Random(0)
159
  a = [rng.uniform(0.0, 0.05) for _ in range(15)]
 
162
  assert r["significant"]
163
 
164
  def test_symmetry(self):
165
+ from picarones.evaluation.statistics import wilcoxon_test
166
  a = [0.1, 0.2, 0.3, 0.15, 0.25, 0.18, 0.22, 0.08, 0.27, 0.14]
167
  b = [0.2, 0.3, 0.4, 0.25, 0.35, 0.28, 0.32, 0.18, 0.37, 0.24]
168
  r_ab = wilcoxon_test(a, b)
 
177
 
178
  class TestPairwiseStats:
179
  def test_returns_list(self):
180
+ from picarones.evaluation.statistics import compute_pairwise_stats
181
  r = compute_pairwise_stats({"A": [0.1, 0.2], "B": [0.3, 0.4]})
182
  assert isinstance(r, list)
183
 
184
  def test_correct_pair_count_2_engines(self):
185
+ from picarones.evaluation.statistics import compute_pairwise_stats
186
  r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
187
  assert len(r) == 1
188
 
189
  def test_correct_pair_count_3_engines(self):
190
+ from picarones.evaluation.statistics import compute_pairwise_stats
191
  r = compute_pairwise_stats({
192
  "A": [0.1]*5, "B": [0.2]*5, "C": [0.3]*5
193
  })
194
  assert len(r) == 3
195
 
196
  def test_pair_has_engine_names(self):
197
+ from picarones.evaluation.statistics import compute_pairwise_stats
198
  r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
199
  assert r[0]["engine_a"] in ["A", "B"]
200
  assert r[0]["engine_b"] in ["A", "B"]
201
 
202
  def test_pair_has_p_value(self):
203
+ from picarones.evaluation.statistics import compute_pairwise_stats
204
  r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
205
  assert "p_value" in r[0]
206
 
207
  def test_single_engine_returns_empty(self):
208
+ from picarones.evaluation.statistics import compute_pairwise_stats
209
  r = compute_pairwise_stats({"A": [0.1]*5})
210
  assert r == []
211
 
 
216
 
217
  class TestReliabilityCurve:
218
  def test_returns_list(self):
219
+ from picarones.evaluation.statistics import compute_reliability_curve
220
  r = compute_reliability_curve([0.1, 0.2, 0.3])
221
  assert isinstance(r, list)
222
 
223
  def test_correct_number_of_steps(self):
224
+ from picarones.evaluation.statistics import compute_reliability_curve
225
  r = compute_reliability_curve([0.1]*10, steps=5)
226
  assert len(r) == 5
227
 
228
  def test_pct_docs_increases(self):
229
+ from picarones.evaluation.statistics import compute_reliability_curve
230
  r = compute_reliability_curve([0.1, 0.2, 0.3, 0.4, 0.5], steps=5)
231
  pcts = [p["pct_docs"] for p in r]
232
  assert pcts == sorted(pcts)
233
 
234
  def test_mean_cer_increases(self):
235
+ from picarones.evaluation.statistics import compute_reliability_curve
236
  r = compute_reliability_curve([0.05, 0.10, 0.20, 0.30, 0.50], steps=5)
237
  cers = [p["mean_cer"] for p in r]
238
  assert cers[0] <= cers[-1]
239
 
240
  def test_empty_returns_empty(self):
241
+ from picarones.evaluation.statistics import compute_reliability_curve
242
  assert compute_reliability_curve([]) == []
243
 
244
  def test_last_point_includes_all(self):
245
+ from picarones.evaluation.statistics import compute_reliability_curve
246
  vals = [0.1, 0.2, 0.3]
247
  r = compute_reliability_curve(vals, steps=4)
248
  last = r[-1]
 
250
  assert last["mean_cer"] == pytest.approx(expected, rel=1e-4)
251
 
252
  def test_each_point_has_required_keys(self):
253
+ from picarones.evaluation.statistics import compute_reliability_curve
254
  r = compute_reliability_curve([0.1, 0.2, 0.3], steps=3)
255
  for p in r:
256
  assert "pct_docs" in p and "mean_cer" in p
 
262
 
263
  class TestVennData:
264
  def test_venn2_type(self):
265
+ from picarones.evaluation.statistics import compute_venn_data
266
  r = compute_venn_data({"A": {"e1","e2"}, "B": {"e2","e3"}})
267
  assert r["type"] == "venn2"
268
 
269
  def test_venn3_type(self):
270
+ from picarones.evaluation.statistics import compute_venn_data
271
  r = compute_venn_data({"A": {"e1"}, "B": {"e2"}, "C": {"e3"}})
272
  assert r["type"] == "venn3"
273
 
274
  def test_venn2_counts_correct(self):
275
+ from picarones.evaluation.statistics import compute_venn_data
276
  r = compute_venn_data({"A": {"e1","e2","e3"}, "B": {"e2","e3","e4"}})
277
  assert r["only_a"] == 1
278
  assert r["only_b"] == 1
279
  assert r["both"] == 2
280
 
281
  def test_venn2_disjoint(self):
282
+ from picarones.evaluation.statistics import compute_venn_data
283
  r = compute_venn_data({"A": {"e1"}, "B": {"e2"}})
284
  assert r["both"] == 0
285
  assert r["only_a"] == 1
286
  assert r["only_b"] == 1
287
 
288
  def test_venn2_subset(self):
289
+ from picarones.evaluation.statistics import compute_venn_data
290
  r = compute_venn_data({"A": {"e1","e2"}, "B": {"e1","e2","e3"}})
291
  assert r["only_a"] == 0
292
 
293
  def test_venn3_abc_count(self):
294
+ from picarones.evaluation.statistics import compute_venn_data
295
  shared = {"e1","e2"}
296
  r = compute_venn_data({"A": shared, "B": shared, "C": shared})
297
  assert r["abc"] == 2
298
 
299
  def test_empty_returns_empty(self):
300
+ from picarones.evaluation.statistics import compute_venn_data
301
  r = compute_venn_data({})
302
  assert r == {}
303
 
304
  def test_labels_present(self):
305
+ from picarones.evaluation.statistics import compute_venn_data
306
  r = compute_venn_data({"moteur_a": {"e1"}, "moteur_b": {"e2"}})
307
  assert r["label_a"] == "moteur_a"
308
  assert r["label_b"] == "moteur_b"
 
324
  ]
325
 
326
  def test_returns_list(self):
327
+ from picarones.evaluation.statistics import cluster_errors
328
  result = cluster_errors(self._sample_data())
329
  assert isinstance(result, list)
330
 
331
  def test_max_clusters_respected(self):
332
+ from picarones.evaluation.statistics import cluster_errors
333
  result = cluster_errors(self._sample_data(), max_clusters=3)
334
  assert len(result) <= 3
335
 
336
  def test_cluster_has_required_keys(self):
337
+ from picarones.evaluation.statistics import cluster_errors
338
  result = cluster_errors(self._sample_data())
339
  if result:
340
  c = result[0]
 
344
  assert hasattr(c, "examples")
345
 
346
  def test_as_dict_method(self):
347
+ from picarones.evaluation.statistics import cluster_errors
348
  result = cluster_errors(self._sample_data())
349
  if result:
350
  d = result[0].as_dict()
 
354
  assert "examples" in d
355
 
356
  def test_sorted_by_count_descending(self):
357
+ from picarones.evaluation.statistics import cluster_errors
358
  result = cluster_errors(self._sample_data())
359
  if len(result) >= 2:
360
  assert result[0].count >= result[1].count
361
 
362
  def test_examples_capped_at_5(self):
363
+ from picarones.evaluation.statistics import cluster_errors
364
  result = cluster_errors(self._sample_data())
365
  for c in result:
366
  assert len(c.as_dict()["examples"]) <= 5
367
 
368
  def test_empty_data_returns_empty(self):
369
+ from picarones.evaluation.statistics import cluster_errors
370
  result = cluster_errors([])
371
  assert result == []
372
 
373
  def test_cluster_id_unique(self):
374
+ from picarones.evaluation.statistics import cluster_errors
375
  result = cluster_errors(self._sample_data())
376
  ids = [c.cluster_id for c in result]
377
  assert len(ids) == len(set(ids))
 
392
  ]
393
 
394
  def test_returns_dict_with_labels_and_matrix(self):
395
+ from picarones.evaluation.statistics import compute_correlation_matrix
396
  r = compute_correlation_matrix(self._sample_metrics())
397
  assert "labels" in r and "matrix" in r
398
 
399
  def test_matrix_is_square(self):
400
+ from picarones.evaluation.statistics import compute_correlation_matrix
401
  r = compute_correlation_matrix(self._sample_metrics())
402
  n = len(r["labels"])
403
  assert len(r["matrix"]) == n
 
405
  assert len(row) == n
406
 
407
  def test_diagonal_is_one(self):
408
+ from picarones.evaluation.statistics import compute_correlation_matrix
409
  r = compute_correlation_matrix(self._sample_metrics())
410
  for i in range(len(r["labels"])):
411
  assert r["matrix"][i][i] == pytest.approx(1.0)
412
 
413
  def test_cer_quality_negatively_correlated(self):
414
+ from picarones.evaluation.statistics import compute_correlation_matrix
415
  r = compute_correlation_matrix(self._sample_metrics())
416
  labels = r["labels"]
417
  if "cer" in labels and "quality_score" in labels:
 
420
  assert r["matrix"][i][j] < 0 # plus la qualité est bonne, plus le CER est bas
421
 
422
  def test_symmetric_matrix(self):
423
+ from picarones.evaluation.statistics import compute_correlation_matrix
424
  r = compute_correlation_matrix(self._sample_metrics())
425
  n = len(r["labels"])
426
  for i in range(n):
 
428
  assert r["matrix"][i][j] == pytest.approx(r["matrix"][j][i], abs=1e-6)
429
 
430
  def test_empty_returns_empty(self):
431
+ from picarones.evaluation.statistics import compute_correlation_matrix
432
  r = compute_correlation_matrix([])
433
  assert r == {"labels": [], "matrix": []}
434
 
435
  def test_custom_metric_keys(self):
436
+ from picarones.evaluation.statistics import compute_correlation_matrix
437
  data = [{"a": 1.0, "b": 2.0, "c": 3.0}] * 5
438
  r = compute_correlation_matrix(data, metric_keys=["a", "b"])
439
  assert r["labels"] == ["a", "b"]
440
 
441
  def test_values_in_range(self):
442
+ from picarones.evaluation.statistics import compute_correlation_matrix
443
  r = compute_correlation_matrix(self._sample_metrics())
444
  for row in r["matrix"]:
445
  for v in row:
tests/report/test_sprint86_aii5_html.py CHANGED
@@ -22,7 +22,7 @@ from picarones.measurements.numerical_sequences_hooks import (
22
  aggregate_numerical_sequence_metrics,
23
  compute_numerical_sequence_metrics_adaptive,
24
  )
25
- from picarones.measurements.metrics import MetricsResult
26
  from picarones.evaluation.benchmark_result import DocumentResult, EngineReport
27
 
28
 
 
22
  aggregate_numerical_sequence_metrics,
23
  compute_numerical_sequence_metrics_adaptive,
24
  )
25
+ from picarones.evaluation.metric_result import MetricsResult
26
  from picarones.evaluation.benchmark_result import DocumentResult, EngineReport
27
 
28
 
tests/report/test_sprint87_readability_html.py CHANGED
@@ -16,7 +16,7 @@ from __future__ import annotations
16
  import json
17
  from pathlib import Path
18
 
19
- from picarones.measurements.metrics import MetricsResult
20
  from picarones.measurements.readability_hooks import (
21
  aggregate_readability_metrics,
22
  compute_readability_metrics,
 
16
  import json
17
  from pathlib import Path
18
 
19
+ from picarones.evaluation.metric_result import MetricsResult
20
  from picarones.measurements.readability_hooks import (
21
  aggregate_readability_metrics,
22
  compute_readability_metrics,
tests/web/test_sprint6_web_interface.py CHANGED
@@ -57,13 +57,13 @@ def client():
57
 
58
  @pytest.fixture
59
  def htr_catalogue():
60
- from picarones.extras.importers.htr_united import HTRUnitedCatalogue
61
  return HTRUnitedCatalogue.from_demo()
62
 
63
 
64
  @pytest.fixture
65
  def hf_importer():
66
- from picarones.extras.importers.huggingface import HuggingFaceImporter
67
  return HuggingFaceImporter()
68
 
69
 
@@ -74,7 +74,7 @@ def hf_importer():
74
  class TestHTRUnitedEntry:
75
 
76
  def test_from_dict_basic(self):
77
- from picarones.extras.importers.htr_united import HTRUnitedEntry
78
  d = {
79
  "id": "test-corpus", "title": "Test Corpus", "url": "https://github.com/test/corpus",
80
  "language": ["French"], "script": ["Gothic"], "century": [14, 15],
@@ -88,7 +88,7 @@ class TestHTRUnitedEntry:
88
  assert e.lines == 5000
89
 
90
  def test_as_dict_roundtrip(self):
91
- from picarones.extras.importers.htr_united import HTRUnitedEntry
92
  d = {
93
  "id": "rtrip", "title": "Round Trip", "url": "https://github.com/a/b",
94
  "language": ["Latin"], "script": ["Caroline"], "century": [9],
@@ -102,19 +102,19 @@ class TestHTRUnitedEntry:
102
  assert out["format"] == "PAGE"
103
 
104
  def test_century_str_roman(self):
105
- from picarones.extras.importers.htr_united import HTRUnitedEntry
106
  e = HTRUnitedEntry(id="x", title="x", url="x", century=[12, 14])
107
  cs = e.century_str
108
  assert "XIIe" in cs
109
  assert "XIVe" in cs
110
 
111
  def test_century_str_single(self):
112
- from picarones.extras.importers.htr_united import HTRUnitedEntry
113
  e = HTRUnitedEntry(id="x", title="x", url="x", century=[19])
114
  assert "XIXe" in e.century_str
115
 
116
  def test_default_fields(self):
117
- from picarones.extras.importers.htr_united import HTRUnitedEntry
118
  e = HTRUnitedEntry(id="minimal", title="Min", url="http://x")
119
  assert e.language == []
120
  assert e.lines == 0
@@ -122,14 +122,14 @@ class TestHTRUnitedEntry:
122
  assert e.tags == []
123
 
124
  def test_from_dict_missing_fields(self):
125
- from picarones.extras.importers.htr_united import HTRUnitedEntry
126
  e = HTRUnitedEntry.from_dict({"id": "sparse", "title": "Sparse"})
127
  assert e.id == "sparse"
128
  assert e.institution == ""
129
  assert e.lines == 0
130
 
131
  def test_as_dict_has_all_keys(self):
132
- from picarones.extras.importers.htr_united import HTRUnitedEntry
133
  e = HTRUnitedEntry(id="k", title="K", url="http://k")
134
  d = e.as_dict()
135
  for key in ["id", "title", "url", "language", "script", "century",
@@ -137,7 +137,7 @@ class TestHTRUnitedEntry:
137
  assert key in d, f"Missing key: {key}"
138
 
139
  def test_url_preserved(self):
140
- from picarones.extras.importers.htr_united import HTRUnitedEntry
141
  url = "https://github.com/HTR-United/cremma-medieval"
142
  e = HTRUnitedEntry(id="c", title="CREMMA", url=url)
143
  assert e.url == url
@@ -250,14 +250,14 @@ class TestHTRUnitedImport:
250
  """
251
 
252
  def test_import_creates_meta_file(self, tmp_path, htr_catalogue):
253
- from picarones.extras.importers.htr_united import import_htr_united_corpus
254
  entry = htr_catalogue.entries[0]
255
  result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
256
  meta_file = Path(result["metadata_file"])
257
  assert meta_file.exists()
258
 
259
  def test_import_meta_content(self, tmp_path, htr_catalogue):
260
- from picarones.extras.importers.htr_united import import_htr_united_corpus
261
  entry = htr_catalogue.entries[0]
262
  result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
263
  meta = json.loads(Path(result["metadata_file"]).read_text())
@@ -265,14 +265,14 @@ class TestHTRUnitedImport:
265
  assert meta["entry_id"] == entry.id
266
 
267
  def test_import_returns_dict_keys(self, tmp_path, htr_catalogue):
268
- from picarones.extras.importers.htr_united import import_htr_united_corpus
269
  entry = htr_catalogue.entries[0]
270
  result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
271
  for k in ["entry_id", "title", "output_dir", "files_imported", "metadata_file"]:
272
  assert k in result, f"Missing key: {k}"
273
 
274
  def test_import_creates_output_dir(self, tmp_path, htr_catalogue):
275
- from picarones.extras.importers.htr_united import import_htr_united_corpus
276
  entry = htr_catalogue.entries[0]
277
  new_dir = tmp_path / "new_subdir" / "corpus"
278
  import_htr_united_corpus(entry, new_dir, max_samples=5)
@@ -286,7 +286,7 @@ class TestHTRUnitedImport:
286
  class TestHuggingFaceDataset:
287
 
288
  def test_from_dict_basic(self):
289
- from picarones.extras.importers.huggingface import HuggingFaceDataset
290
  d = {
291
  "dataset_id": "test/dataset", "title": "Test Dataset",
292
  "description": "A test dataset.", "language": ["French"],
@@ -299,7 +299,7 @@ class TestHuggingFaceDataset:
299
  assert ds.downloads == 500
300
 
301
  def test_as_dict_roundtrip(self):
302
- from picarones.extras.importers.huggingface import HuggingFaceDataset
303
  ds = HuggingFaceDataset(
304
  dataset_id="a/b", title="AB", description="desc",
305
  language=["Latin"], tags=["htr"],
@@ -309,12 +309,12 @@ class TestHuggingFaceDataset:
309
  assert d["language"] == ["Latin"]
310
 
311
  def test_hf_url(self):
312
- from picarones.extras.importers.huggingface import HuggingFaceDataset
313
  ds = HuggingFaceDataset(dataset_id="CATMuS/medieval", title="CATMuS")
314
  assert ds.hf_url == "https://huggingface.co/datasets/CATMuS/medieval"
315
 
316
  def test_as_dict_has_all_keys(self):
317
- from picarones.extras.importers.huggingface import HuggingFaceDataset
318
  ds = HuggingFaceDataset(dataset_id="x/y", title="XY")
319
  d = ds.as_dict()
320
  for k in ["dataset_id", "title", "description", "language", "tags",
@@ -322,17 +322,17 @@ class TestHuggingFaceDataset:
322
  assert k in d, f"Missing: {k}"
323
 
324
  def test_default_source(self):
325
- from picarones.extras.importers.huggingface import HuggingFaceDataset
326
  ds = HuggingFaceDataset(dataset_id="x/y", title="XY")
327
  assert ds.source == "reference"
328
 
329
  def test_from_dict_uses_id_as_fallback_title(self):
330
- from picarones.extras.importers.huggingface import HuggingFaceDataset
331
  ds = HuggingFaceDataset.from_dict({"dataset_id": "owner/repo"})
332
  assert ds.title == "owner/repo"
333
 
334
  def test_replace_source_helper(self):
335
- from picarones.extras.importers.huggingface import HuggingFaceDataset
336
  ds = HuggingFaceDataset(dataset_id="x/y", title="XY", source="reference")
337
  ds2 = ds._replace_source("api")
338
  assert ds2.source == "api"
@@ -399,23 +399,23 @@ class TestHuggingFaceImporter:
399
  class TestHuggingFaceReferenceData:
400
 
401
  def test_reference_datasets_loaded(self):
402
- from picarones.extras.importers.huggingface import _REFERENCE_DATASETS
403
  assert len(_REFERENCE_DATASETS) >= 5
404
 
405
  def test_catmus_present(self):
406
- from picarones.extras.importers.huggingface import _REFERENCE_DATASETS
407
  ids = [d["dataset_id"] for d in _REFERENCE_DATASETS]
408
  assert any("CATMuS" in did or "catmus" in did.lower() for did in ids)
409
 
410
  def test_all_have_required_fields(self):
411
- from picarones.extras.importers.huggingface import _REFERENCE_DATASETS
412
  for d in _REFERENCE_DATASETS:
413
  assert "dataset_id" in d
414
  assert "title" in d
415
  assert "language" in d
416
 
417
  def test_all_are_image_to_text(self):
418
- from picarones.extras.importers.huggingface import _REFERENCE_DATASETS
419
  for d in _REFERENCE_DATASETS:
420
  assert d.get("task", "image-to-text") == "image-to-text"
421
 
 
57
 
58
  @pytest.fixture
59
  def htr_catalogue():
60
+ from picarones.adapters.corpus.htr_united import HTRUnitedCatalogue
61
  return HTRUnitedCatalogue.from_demo()
62
 
63
 
64
  @pytest.fixture
65
  def hf_importer():
66
+ from picarones.adapters.corpus.huggingface import HuggingFaceImporter
67
  return HuggingFaceImporter()
68
 
69
 
 
74
  class TestHTRUnitedEntry:
75
 
76
  def test_from_dict_basic(self):
77
+ from picarones.adapters.corpus.htr_united import HTRUnitedEntry
78
  d = {
79
  "id": "test-corpus", "title": "Test Corpus", "url": "https://github.com/test/corpus",
80
  "language": ["French"], "script": ["Gothic"], "century": [14, 15],
 
88
  assert e.lines == 5000
89
 
90
  def test_as_dict_roundtrip(self):
91
+ from picarones.adapters.corpus.htr_united import HTRUnitedEntry
92
  d = {
93
  "id": "rtrip", "title": "Round Trip", "url": "https://github.com/a/b",
94
  "language": ["Latin"], "script": ["Caroline"], "century": [9],
 
102
  assert out["format"] == "PAGE"
103
 
104
  def test_century_str_roman(self):
105
+ from picarones.adapters.corpus.htr_united import HTRUnitedEntry
106
  e = HTRUnitedEntry(id="x", title="x", url="x", century=[12, 14])
107
  cs = e.century_str
108
  assert "XIIe" in cs
109
  assert "XIVe" in cs
110
 
111
  def test_century_str_single(self):
112
+ from picarones.adapters.corpus.htr_united import HTRUnitedEntry
113
  e = HTRUnitedEntry(id="x", title="x", url="x", century=[19])
114
  assert "XIXe" in e.century_str
115
 
116
  def test_default_fields(self):
117
+ from picarones.adapters.corpus.htr_united import HTRUnitedEntry
118
  e = HTRUnitedEntry(id="minimal", title="Min", url="http://x")
119
  assert e.language == []
120
  assert e.lines == 0
 
122
  assert e.tags == []
123
 
124
  def test_from_dict_missing_fields(self):
125
+ from picarones.adapters.corpus.htr_united import HTRUnitedEntry
126
  e = HTRUnitedEntry.from_dict({"id": "sparse", "title": "Sparse"})
127
  assert e.id == "sparse"
128
  assert e.institution == ""
129
  assert e.lines == 0
130
 
131
  def test_as_dict_has_all_keys(self):
132
+ from picarones.adapters.corpus.htr_united import HTRUnitedEntry
133
  e = HTRUnitedEntry(id="k", title="K", url="http://k")
134
  d = e.as_dict()
135
  for key in ["id", "title", "url", "language", "script", "century",
 
137
  assert key in d, f"Missing key: {key}"
138
 
139
  def test_url_preserved(self):
140
+ from picarones.adapters.corpus.htr_united import HTRUnitedEntry
141
  url = "https://github.com/HTR-United/cremma-medieval"
142
  e = HTRUnitedEntry(id="c", title="CREMMA", url=url)
143
  assert e.url == url
 
250
  """
251
 
252
  def test_import_creates_meta_file(self, tmp_path, htr_catalogue):
253
+ from picarones.adapters.corpus.htr_united import import_htr_united_corpus
254
  entry = htr_catalogue.entries[0]
255
  result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
256
  meta_file = Path(result["metadata_file"])
257
  assert meta_file.exists()
258
 
259
  def test_import_meta_content(self, tmp_path, htr_catalogue):
260
+ from picarones.adapters.corpus.htr_united import import_htr_united_corpus
261
  entry = htr_catalogue.entries[0]
262
  result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
263
  meta = json.loads(Path(result["metadata_file"]).read_text())
 
265
  assert meta["entry_id"] == entry.id
266
 
267
  def test_import_returns_dict_keys(self, tmp_path, htr_catalogue):
268
+ from picarones.adapters.corpus.htr_united import import_htr_united_corpus
269
  entry = htr_catalogue.entries[0]
270
  result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
271
  for k in ["entry_id", "title", "output_dir", "files_imported", "metadata_file"]:
272
  assert k in result, f"Missing key: {k}"
273
 
274
  def test_import_creates_output_dir(self, tmp_path, htr_catalogue):
275
+ from picarones.adapters.corpus.htr_united import import_htr_united_corpus
276
  entry = htr_catalogue.entries[0]
277
  new_dir = tmp_path / "new_subdir" / "corpus"
278
  import_htr_united_corpus(entry, new_dir, max_samples=5)
 
286
  class TestHuggingFaceDataset:
287
 
288
  def test_from_dict_basic(self):
289
+ from picarones.adapters.corpus.huggingface import HuggingFaceDataset
290
  d = {
291
  "dataset_id": "test/dataset", "title": "Test Dataset",
292
  "description": "A test dataset.", "language": ["French"],
 
299
  assert ds.downloads == 500
300
 
301
  def test_as_dict_roundtrip(self):
302
+ from picarones.adapters.corpus.huggingface import HuggingFaceDataset
303
  ds = HuggingFaceDataset(
304
  dataset_id="a/b", title="AB", description="desc",
305
  language=["Latin"], tags=["htr"],
 
309
  assert d["language"] == ["Latin"]
310
 
311
  def test_hf_url(self):
312
+ from picarones.adapters.corpus.huggingface import HuggingFaceDataset
313
  ds = HuggingFaceDataset(dataset_id="CATMuS/medieval", title="CATMuS")
314
  assert ds.hf_url == "https://huggingface.co/datasets/CATMuS/medieval"
315
 
316
  def test_as_dict_has_all_keys(self):
317
+ from picarones.adapters.corpus.huggingface import HuggingFaceDataset
318
  ds = HuggingFaceDataset(dataset_id="x/y", title="XY")
319
  d = ds.as_dict()
320
  for k in ["dataset_id", "title", "description", "language", "tags",
 
322
  assert k in d, f"Missing: {k}"
323
 
324
  def test_default_source(self):
325
+ from picarones.adapters.corpus.huggingface import HuggingFaceDataset
326
  ds = HuggingFaceDataset(dataset_id="x/y", title="XY")
327
  assert ds.source == "reference"
328
 
329
  def test_from_dict_uses_id_as_fallback_title(self):
330
+ from picarones.adapters.corpus.huggingface import HuggingFaceDataset
331
  ds = HuggingFaceDataset.from_dict({"dataset_id": "owner/repo"})
332
  assert ds.title == "owner/repo"
333
 
334
  def test_replace_source_helper(self):
335
+ from picarones.adapters.corpus.huggingface import HuggingFaceDataset
336
  ds = HuggingFaceDataset(dataset_id="x/y", title="XY", source="reference")
337
  ds2 = ds._replace_source("api")
338
  assert ds2.source == "api"
 
399
  class TestHuggingFaceReferenceData:
400
 
401
  def test_reference_datasets_loaded(self):
402
+ from picarones.adapters.corpus.huggingface import _REFERENCE_DATASETS
403
  assert len(_REFERENCE_DATASETS) >= 5
404
 
405
  def test_catmus_present(self):
406
+ from picarones.adapters.corpus.huggingface import _REFERENCE_DATASETS
407
  ids = [d["dataset_id"] for d in _REFERENCE_DATASETS]
408
  assert any("CATMuS" in did or "catmus" in did.lower() for did in ids)
409
 
410
  def test_all_have_required_fields(self):
411
+ from picarones.adapters.corpus.huggingface import _REFERENCE_DATASETS
412
  for d in _REFERENCE_DATASETS:
413
  assert "dataset_id" in d
414
  assert "title" in d
415
  assert "language" in d
416
 
417
  def test_all_are_image_to_text(self):
418
+ from picarones.adapters.corpus.huggingface import _REFERENCE_DATASETS
419
  for d in _REFERENCE_DATASETS:
420
  assert d.get("task", "image-to-text") == "image-to-text"
421