Spaces:
Running
feat(migration): Lots H + I + J — statistics, htr_united/huggingface, MetricsResult
Browse filesTrois lots cumulés post-fix-templates. Aucun n'a nécessité la
création de nouveaux canoniques — tous étaient des shims plats
ou des partiels d'imports déjà migrés.
Lot H — measurements.statistics → evaluation.statistics
-------------------------------------------------------
Le sous-paquet ``picarones/measurements/statistics/`` (9
fichiers : ``__init__`` + 8 sous-modules) était entièrement
constitué de shims vers ``picarones.evaluation.statistics``.
Tous supprimés en bloc après migration des 70 imports tests.
Lot I — extras.importers → adapters.corpus
------------------------------------------
3 shims migrés et supprimés :
- ``extras.importers.htr_united`` →
``adapters.corpus.htr_united``
- ``extras.importers.huggingface`` →
``adapters.corpus.huggingface``
- ``extras.importers._fallback_log`` →
``adapters.corpus._fallback_log``
Le warning ``UserWarning`` du module ``huggingface`` a été
mis à jour pour citer le nouveau chemin.
``picarones/extras/importers/__init__.py`` ré-expose les
symboles depuis les canoniques pour préserver la rétrocompat
des callers (``from picarones.extras.importers import
HuggingFaceDataset, HTRUnitedEntry``).
Lot J — measurements.metrics partiel → evaluation.metric_result
---------------------------------------------------------------
Migration ciblée sur les **deux symboles canoniquement migrés**
(``MetricsResult``, ``aggregate_metrics``) : ~25 imports.
``compute_metrics`` reste dans ``picarones.measurements.metrics``
car aucun canonique n'existe pour cette fonction. Les imports
mixtes (``from picarones.measurements.metrics import
compute_metrics, aggregate_metrics, MetricsResult``) ont été
splittés en deux lignes : une vers le canonique, une vers le
legacy résiduel.
Tests d'architecture
--------------------
- ``test_no_flat_files_in_measurements::expected_subpackages``
réduit de ``{narrative, statistics, runner}`` à
``{narrative, runner}``.
- ``test_module_coverage::TEST_ONLY_BASELINE`` réduit de 4 à
3 entrées (``"statistics"`` retiré).
- ``test_file_budgets::FILE_BUDGETS`` débarrassé des entrées
orphelines (``extras/importers/htr_united.py``,
``extras/importers/huggingface.py``).
- ``test_doc_paths::BROKEN_PATHS_BASELINE`` 134 → 138. 4
nouveaux chemins cassés héritage dans ``docs/audits/*.md``
(intouchables).
Sync README + CLAUDE.md
-----------------------
``scripts/gen_readme_tables.py`` ré-exécuté : compteur de tests
global passe de 4978 (post-fix-templates) à 5000 collected
(arrondi à la dizaine), avec 4967 passed effectifs.
Acceptance
----------
- ``pytest tests/architecture/`` : 73 passed.
- ``pytest tests/`` : **0 failed, 0 errors, 4967 passed**.
- ``ruff check picarones/ tests/`` : All checks passed.
État final de la branche claude/migrate-core-to-domain-8ubIT
------------------------------------------------------------
À l'issue des Lots A à J + fix-templates :
- ``picarones/core/`` : entièrement supprimé.
- ``picarones/engines/`` : entièrement supprimé.
- ``picarones/modules/`` : entièrement supprimé.
- ``picarones/report/`` : entièrement supprimé.
- ``picarones/measurements/statistics/`` : entièrement supprimé.
- ``picarones/measurements/`` : 50+ → 24 fichiers résiduels.
- ``picarones/reports_v2/html/templates/`` : 10 templates HTML
restaurés (fix bug cc53ead).
Soit ~165 fichiers shims/orphelins supprimés et ~700 imports
tests migrés sur la branche.
Imports legacy restants
-----------------------
365 → 270 imports tests (majorité bloquée derrière création
de canoniques) :
- ``measurements.runner.{run_benchmark,
_compute_document_result}`` : 40 imports — bloqué (Phase 6).
- ``measurements.metrics.compute_metrics`` : 10 imports —
bloqué (canonique à créer).
- ``measurements.robustness.*`` : 20 imports — bloqué.
- ``pipelines.{base, over_normalization}`` : 22 imports —
bloqué (Phase 6).
- ``extras.importers.{gallica, escriptorium, iiif}`` : 50
imports — vrais fichiers, bloqué.
- ``llm.base`` + ``web.app`` : 20 imports — bloqué.
Toutes les migrations triviales sont terminées. La suite
nécessite création de canoniques (sprints dédiés).
https://claude.ai/code/session_011XQZNitg1rCgia8ZD1a2hP
- CLAUDE.md +3 -3
- README.md +1 -1
- docs/migration/SESSION_HANDOVER.md +27 -0
- picarones/adapters/corpus/huggingface.py +1 -1
- picarones/extras/importers/__init__.py +14 -12
- picarones/extras/importers/_fallback_log.py +0 -7
- picarones/extras/importers/htr_united.py +0 -7
- picarones/extras/importers/huggingface.py +0 -11
- picarones/fixtures.py +1 -1
- picarones/measurements/runner/document.py +2 -1
- picarones/measurements/runner/partial.py +1 -1
- picarones/measurements/statistics/__init__.py +0 -55
- picarones/measurements/statistics/bootstrap.py +0 -23
- picarones/measurements/statistics/cdd_render.py +0 -23
- picarones/measurements/statistics/clustering.py +0 -24
- picarones/measurements/statistics/correlation.py +0 -23
- picarones/measurements/statistics/distributions.py +0 -24
- picarones/measurements/statistics/friedman_nemenyi.py +0 -27
- picarones/measurements/statistics/pareto.py +0 -23
- picarones/measurements/statistics/wilcoxon.py +0 -26
- picarones/web/routers/importers.py +4 -4
- tests/architecture/test_doc_paths.py +6 -1
- tests/architecture/test_file_budgets.py +3 -4
- tests/architecture/test_module_coverage.py +0 -1
- tests/architecture/test_no_flat_files_in_measurements.py +1 -1
- tests/core/test_sprint14_robust_filtering.py +1 -1
- tests/engines/test_sprint4_normalization_iiif.py +2 -1
- tests/extras/test_sprint8_escriptorium_gallica.py +2 -2
- tests/integration/test_sprint13_parallelisation_stats.py +11 -11
- tests/measurements/test_metrics.py +2 -1
- tests/measurements/test_pricing_degenerate_cases.py +1 -1
- tests/measurements/test_results.py +1 -1
- tests/measurements/test_sprint10_error_distribution.py +4 -4
- tests/measurements/test_sprint12_nouvelles_fonctionnalites.py +1 -1
- tests/measurements/test_sprint18_friedman_nemenyi_cdd.py +1 -1
- tests/measurements/test_sprint20_pareto_pricing.py +1 -1
- tests/measurements/test_sprint23_anti_hallucination.py +1 -1
- tests/measurements/test_sprint40_ner_runner.py +1 -1
- tests/measurements/test_sprint42_calibration_runner.py +1 -1
- tests/measurements/test_sprint44_median_default.py +1 -1
- tests/measurements/test_sprint45_stratification.py +1 -1
- tests/measurements/test_sprint61_philological_runner.py +1 -1
- tests/report/test_sprint46_stratification_html.py +1 -1
- tests/report/test_sprint7_advanced_report.py +54 -54
- tests/report/test_sprint86_aii5_html.py +1 -1
- tests/report/test_sprint87_readability_html.py +1 -1
- tests/web/test_sprint6_web_interface.py +25 -25
|
@@ -123,7 +123,7 @@ picarones/
|
|
| 123 |
|
| 124 |
## État des tests et bugs historiques
|
| 125 |
|
| 126 |
-
`pytest tests/` → **
|
| 127 |
(post-S59). Les deselected sont les markers `live` (5 tests d'intégration
|
| 128 |
contre vraie API/binaire) + `network` (3 tests qui hit le réseau réel),
|
| 129 |
opt-in en local via `pytest -m live` ou `pytest -m network`. Le
|
|
@@ -253,7 +253,7 @@ Résumé express :
|
|
| 253 |
|
| 254 |
1. `git branch --show-current` → `claude/repo-analysis-cukvm`.
|
| 255 |
2. `git status` → working tree clean.
|
| 256 |
-
3. `pytest tests/ -q --no-header --tb=line` →
|
| 257 |
4. `git log -1 --format=%B` → décrit la prochaine sub-phase.
|
| 258 |
|
| 259 |
**Règles d'architecture critiques** (apprises à la dure) :
|
|
@@ -341,7 +341,7 @@ détecte, arbitre, rend.
|
|
| 341 |
## Contexte développement
|
| 342 |
|
| 343 |
- **Environnement** : GitHub Codespaces, Python 3.11+
|
| 344 |
-
- **Tests** : `pytest tests/ -q` →
|
| 345 |
deselected, 0 failed (au moment de la pause de session).
|
| 346 |
- **Plan d'évolution actif** : [`docs/roadmap/evolution-2026.md`](docs/roadmap/evolution-2026.md).
|
| 347 |
- **Plan retrait du legacy (maître)** : [`docs/migration/legacy-retirement-plan.md`](docs/migration/legacy-retirement-plan.md).
|
|
|
|
| 123 |
|
| 124 |
## État des tests et bugs historiques
|
| 125 |
|
| 126 |
+
`pytest tests/` → **5000 passed, 12 skipped, 8 deselected, 0 failed**
|
| 127 |
(post-S59). Les deselected sont les markers `live` (5 tests d'intégration
|
| 128 |
contre vraie API/binaire) + `network` (3 tests qui hit le réseau réel),
|
| 129 |
opt-in en local via `pytest -m live` ou `pytest -m network`. Le
|
|
|
|
| 253 |
|
| 254 |
1. `git branch --show-current` → `claude/repo-analysis-cukvm`.
|
| 255 |
2. `git status` → working tree clean.
|
| 256 |
+
3. `pytest tests/ -q --no-header --tb=line` → 5000 passed.
|
| 257 |
4. `git log -1 --format=%B` → décrit la prochaine sub-phase.
|
| 258 |
|
| 259 |
**Règles d'architecture critiques** (apprises à la dure) :
|
|
|
|
| 341 |
## Contexte développement
|
| 342 |
|
| 343 |
- **Environnement** : GitHub Codespaces, Python 3.11+
|
| 344 |
+
- **Tests** : `pytest tests/ -q` → 5000 passed, 12 skipped, 24
|
| 345 |
deselected, 0 failed (au moment de la pause de session).
|
| 346 |
- **Plan d'évolution actif** : [`docs/roadmap/evolution-2026.md`](docs/roadmap/evolution-2026.md).
|
| 347 |
- **Plan retrait du legacy (maître)** : [`docs/migration/legacy-retirement-plan.md`](docs/migration/legacy-retirement-plan.md).
|
|
@@ -395,7 +395,7 @@ ruff check picarones/ tests/
|
|
| 395 |
python -m mypy picarones/core/
|
| 396 |
```
|
| 397 |
|
| 398 |
-
**Test suite**: ~
|
| 399 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 400 |
requiring live HTTP. A handful of tests depend on optional engines
|
| 401 |
(`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when
|
|
|
|
| 395 |
python -m mypy picarones/core/
|
| 396 |
```
|
| 397 |
|
| 398 |
+
**Test suite**: ~5000 tests, ~3 min on a modern laptop. Coverage
|
| 399 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 400 |
requiring live HTTP. A handful of tests depend on optional engines
|
| 401 |
(`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when
|
|
@@ -356,6 +356,33 @@ L'ordre recommandé, par lots de symboles cohérents :
|
|
| 356 |
simple sed est impossible — il faudrait migrer les 76
|
| 357 |
imports vers des modules qui n'existent pas encore.
|
| 358 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 359 |
À chaque lot : sed → tests → commit. Les shims devenus
|
| 360 |
orphelins après le lot peuvent être **supprimés** dans le même
|
| 361 |
commit (principe « no shim survives its caller »).
|
|
|
|
| 356 |
simple sed est impossible — il faudrait migrer les 76
|
| 357 |
imports vers des modules qui n'existent pas encore.
|
| 358 |
|
| 359 |
+
8. ✅ **Lot H — measurements.statistics → evaluation.statistics**
|
| 360 |
+
(~70 imports migrés, 9 shims supprimés en bloc) :
|
| 361 |
+
- ``measurements.statistics.{bootstrap, cdd_render,
|
| 362 |
+
clustering, correlation, distributions, friedman_nemenyi,
|
| 363 |
+
pareto, wilcoxon}`` → ``evaluation.statistics.{...}``.
|
| 364 |
+
- ``measurements/statistics/`` (sous-paquet entier)
|
| 365 |
+
supprimé.
|
| 366 |
+
|
| 367 |
+
9. ✅ **Lot I — extras.importers → adapters.corpus**
|
| 368 |
+
(3 shims supprimés, ~15 imports migrés) :
|
| 369 |
+
- ``extras.importers.htr_united`` →
|
| 370 |
+
``adapters.corpus.htr_united``.
|
| 371 |
+
- ``extras.importers.huggingface`` →
|
| 372 |
+
``adapters.corpus.huggingface``.
|
| 373 |
+
- ``extras.importers._fallback_log`` →
|
| 374 |
+
``adapters.corpus._fallback_log``.
|
| 375 |
+
|
| 376 |
+
10. ✅ **Lot J — measurements.metrics.{MetricsResult,
|
| 377 |
+
aggregate_metrics} → evaluation.metric_result** (~25
|
| 378 |
+
imports migrés, 0 shim supprimé) :
|
| 379 |
+
- Migration partielle uniquement des symboles canoniquement
|
| 380 |
+
migrés (``MetricsResult``, ``aggregate_metrics``).
|
| 381 |
+
- ``compute_metrics`` reste dans
|
| 382 |
+
``picarones.measurements.metrics`` car aucun canonique
|
| 383 |
+
n'existe pour cette fonction (sera traité avec le Lot G
|
| 384 |
+
reporté).
|
| 385 |
+
|
| 386 |
À chaque lot : sed → tests → commit. Les shims devenus
|
| 387 |
orphelins après le lot peuvent être **supprimés** dans le même
|
| 388 |
commit (principe « no shim survives its caller »).
|
|
@@ -38,7 +38,7 @@ from typing import Optional
|
|
| 38 |
# Émission du warning ``experimental`` à l'import. Phase C du chantier
|
| 39 |
# de refonte — voir docstring du module ci-dessus.
|
| 40 |
warnings.warn(
|
| 41 |
-
"picarones.
|
| 42 |
"change or be removed without notice. Use at your own risk until "
|
| 43 |
"an institutional use case validates the API.",
|
| 44 |
category=UserWarning,
|
|
|
|
| 38 |
# Émission du warning ``experimental`` à l'import. Phase C du chantier
|
| 39 |
# de refonte — voir docstring du module ci-dessus.
|
| 40 |
warnings.warn(
|
| 41 |
+
"picarones.adapters.corpus.huggingface is experimental and may "
|
| 42 |
"change or be removed without notice. Use at your own risk until "
|
| 43 |
"an institutional use case validates the API.",
|
| 44 |
category=UserWarning,
|
|
@@ -1,20 +1,22 @@
|
|
| 1 |
-
"""Importeurs de corpus depuis sources distantes
|
|
|
|
|
|
|
| 2 |
|
| 3 |
-
Importeurs livrés
|
| 4 |
-
-----------------
|
| 5 |
- :mod:`_http` — helpers HTTP partagés (validate_http_url, download_url)
|
| 6 |
- :mod:`iiif` — manifestes IIIF v2/v3 (Bodleian, BnF, Vatican…)
|
| 7 |
-
- :mod:`htr_united` — datasets HTR-United (CC0, GitHub)
|
| 8 |
- :mod:`gallica` — BnF Gallica (SRU + IIIF + OCR brut)
|
| 9 |
-
- :mod:`huggingface` — datasets HuggingFace ⚠ **expérimental**
|
| 10 |
- :mod:`escriptorium` — projets eScriptorium ⚠ **expérimental**
|
| 11 |
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
"""
|
| 19 |
|
| 20 |
from picarones.extras.importers.iiif import IIIFImporter, import_iiif_manifest
|
|
@@ -30,7 +32,7 @@ from picarones.extras.importers.escriptorium import (
|
|
| 30 |
EScriptoriumDocument,
|
| 31 |
connect_escriptorium,
|
| 32 |
)
|
| 33 |
-
from picarones.
|
| 34 |
consume_fallback_log,
|
| 35 |
peek_fallback_log,
|
| 36 |
record_fallback,
|
|
|
|
| 1 |
+
"""Importeurs de corpus depuis sources distantes.
|
| 2 |
+
|
| 3 |
+
Importeurs livrés ici (legacy, en cours de retrait) :
|
| 4 |
|
|
|
|
|
|
|
| 5 |
- :mod:`_http` — helpers HTTP partagés (validate_http_url, download_url)
|
| 6 |
- :mod:`iiif` — manifestes IIIF v2/v3 (Bodleian, BnF, Vatican…)
|
|
|
|
| 7 |
- :mod:`gallica` — BnF Gallica (SRU + IIIF + OCR brut)
|
|
|
|
| 8 |
- :mod:`escriptorium` — projets eScriptorium ⚠ **expérimental**
|
| 9 |
|
| 10 |
+
Importeurs migrés vers :mod:`picarones.adapters.corpus` (Lot I) :
|
| 11 |
+
|
| 12 |
+
- ``htr_united`` → :mod:`picarones.adapters.corpus.htr_united`
|
| 13 |
+
- ``huggingface`` → :mod:`picarones.adapters.corpus.huggingface`
|
| 14 |
+
⚠ **expérimental**
|
| 15 |
+
- ``_fallback_log`` → :mod:`picarones.adapters.corpus._fallback_log`
|
| 16 |
+
|
| 17 |
+
L'API publique de ce package re-expose ces modules canoniques pour
|
| 18 |
+
préserver la rétrocompat (``from picarones.extras.importers import
|
| 19 |
+
HuggingFaceDataset, HTRUnitedEntry, …``).
|
| 20 |
"""
|
| 21 |
|
| 22 |
from picarones.extras.importers.iiif import IIIFImporter, import_iiif_manifest
|
|
|
|
| 32 |
EScriptoriumDocument,
|
| 33 |
connect_escriptorium,
|
| 34 |
)
|
| 35 |
+
from picarones.adapters.corpus._fallback_log import (
|
| 36 |
consume_fallback_log,
|
| 37 |
peek_fallback_log,
|
| 38 |
record_fallback,
|
|
@@ -1,7 +0,0 @@
|
|
| 1 |
-
"""Re-export — Sprint A14-S11. Le contenu canonique vit dans
|
| 2 |
-
``picarones.adapters.corpus._fallback_log``.
|
| 3 |
-
"""
|
| 4 |
-
|
| 5 |
-
from __future__ import annotations
|
| 6 |
-
|
| 7 |
-
from picarones.adapters.corpus._fallback_log import * # noqa: F401,F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,7 +0,0 @@
|
|
| 1 |
-
"""Re-export — Sprint A14-S11. Le contenu canonique vit dans
|
| 2 |
-
``picarones.adapters.corpus.htr_united``.
|
| 3 |
-
"""
|
| 4 |
-
|
| 5 |
-
from __future__ import annotations
|
| 6 |
-
|
| 7 |
-
from picarones.adapters.corpus.htr_united import * # noqa: F401,F403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,11 +0,0 @@
|
|
| 1 |
-
"""Re-export — Sprint A14-S11. Le contenu canonique vit dans
|
| 2 |
-
``picarones.adapters.corpus.huggingface``.
|
| 3 |
-
|
| 4 |
-
Ré-expose explicitement ``_REFERENCE_DATASETS`` (importé par les
|
| 5 |
-
tests web).
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from __future__ import annotations
|
| 9 |
-
|
| 10 |
-
from picarones.adapters.corpus.huggingface import * # noqa: F401,F403
|
| 11 |
-
from picarones.adapters.corpus.huggingface import _REFERENCE_DATASETS # noqa: F401
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -13,7 +13,7 @@ import random
|
|
| 13 |
import struct
|
| 14 |
import zlib
|
| 15 |
|
| 16 |
-
from picarones.
|
| 17 |
from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
|
| 18 |
from picarones.pipelines.over_normalization import detect_over_normalization
|
| 19 |
# Sprint 5 — métriques avancées
|
|
|
|
| 13 |
import struct
|
| 14 |
import zlib
|
| 15 |
|
| 16 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 17 |
from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
|
| 18 |
from picarones.pipelines.over_normalization import detect_over_normalization
|
| 19 |
# Sprint 5 — métriques avancées
|
|
@@ -16,7 +16,8 @@ from typing import Optional
|
|
| 16 |
|
| 17 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 18 |
from picarones.adapters.legacy_engines.base import EngineResult
|
| 19 |
-
from picarones.
|
|
|
|
| 20 |
|
| 21 |
|
| 22 |
def _calibration_from_engine_result(
|
|
|
|
| 16 |
|
| 17 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 18 |
from picarones.adapters.legacy_engines.base import EngineResult
|
| 19 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 20 |
+
from picarones.measurements.metrics import compute_metrics
|
| 21 |
|
| 22 |
|
| 23 |
def _calibration_from_engine_result(
|
|
@@ -21,7 +21,7 @@ from pathlib import Path
|
|
| 21 |
from typing import Optional
|
| 22 |
|
| 23 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 24 |
-
from picarones.
|
| 25 |
|
| 26 |
logger = logging.getLogger(__name__)
|
| 27 |
|
|
|
|
| 21 |
from typing import Optional
|
| 22 |
|
| 23 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 24 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 25 |
|
| 26 |
logger = logging.getLogger(__name__)
|
| 27 |
|
|
@@ -1,55 +0,0 @@
|
|
| 1 |
-
"""``picarones.measurements.statistics`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
-
|
| 3 |
-
Canonique : :mod:`picarones.evaluation.statistics`. Migration ::
|
| 4 |
-
|
| 5 |
-
from picarones.evaluation.statistics import (
|
| 6 |
-
bootstrap_ci, wilcoxon_test, friedman_test, ...
|
| 7 |
-
)
|
| 8 |
-
|
| 9 |
-
Tous les symboles publics de l'API legacy (incluant les privés
|
| 10 |
-
``_SCIPY_AVAILABLE``, ``_chi_square_sf``, ``_nemenyi_critical_value``,
|
| 11 |
-
``_rank_row`` consommés par certains tests) restent accessibles
|
| 12 |
-
identiquement.
|
| 13 |
-
"""
|
| 14 |
-
|
| 15 |
-
from __future__ import annotations
|
| 16 |
-
|
| 17 |
-
import warnings
|
| 18 |
-
|
| 19 |
-
from picarones.evaluation.statistics import (
|
| 20 |
-
_SCIPY_AVAILABLE,
|
| 21 |
-
_chi_square_sf,
|
| 22 |
-
_nemenyi_critical_value,
|
| 23 |
-
_rank_row,
|
| 24 |
-
ErrorCluster,
|
| 25 |
-
bootstrap_ci,
|
| 26 |
-
build_critical_difference_svg,
|
| 27 |
-
cluster_errors,
|
| 28 |
-
compute_correlation_matrix,
|
| 29 |
-
compute_pairwise_stats,
|
| 30 |
-
compute_pareto_front,
|
| 31 |
-
compute_reliability_curve,
|
| 32 |
-
compute_venn_data,
|
| 33 |
-
friedman_test,
|
| 34 |
-
nemenyi_posthoc,
|
| 35 |
-
wilcoxon_test,
|
| 36 |
-
)
|
| 37 |
-
|
| 38 |
-
warnings.warn(
|
| 39 |
-
"picarones.measurements.statistics is deprecated and will be "
|
| 40 |
-
"removed in 2.0. Import from picarones.evaluation.statistics instead.",
|
| 41 |
-
DeprecationWarning,
|
| 42 |
-
stacklevel=2,
|
| 43 |
-
)
|
| 44 |
-
|
| 45 |
-
__all__ = [
|
| 46 |
-
"bootstrap_ci",
|
| 47 |
-
"wilcoxon_test", "compute_pairwise_stats",
|
| 48 |
-
"friedman_test", "nemenyi_posthoc", "build_critical_difference_svg",
|
| 49 |
-
"compute_pareto_front",
|
| 50 |
-
"ErrorCluster", "cluster_errors",
|
| 51 |
-
"compute_correlation_matrix",
|
| 52 |
-
"compute_reliability_curve", "compute_venn_data",
|
| 53 |
-
"_SCIPY_AVAILABLE", "_chi_square_sf",
|
| 54 |
-
"_nemenyi_critical_value", "_rank_row",
|
| 55 |
-
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,23 +0,0 @@
|
|
| 1 |
-
"""``picarones.measurements.statistics.bootstrap`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
-
|
| 3 |
-
Canonique : :mod:`picarones.evaluation.statistics.bootstrap`. Migration ::
|
| 4 |
-
|
| 5 |
-
from picarones.evaluation.statistics import ...
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from __future__ import annotations
|
| 9 |
-
|
| 10 |
-
import warnings
|
| 11 |
-
|
| 12 |
-
from picarones.evaluation.statistics.bootstrap import (
|
| 13 |
-
bootstrap_ci,
|
| 14 |
-
)
|
| 15 |
-
|
| 16 |
-
warnings.warn(
|
| 17 |
-
"picarones.measurements.statistics.bootstrap is deprecated and will be "
|
| 18 |
-
"removed in 2.0. Import from picarones.evaluation.statistics instead.",
|
| 19 |
-
DeprecationWarning,
|
| 20 |
-
stacklevel=2,
|
| 21 |
-
)
|
| 22 |
-
|
| 23 |
-
__all__ = ['bootstrap_ci']
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,23 +0,0 @@
|
|
| 1 |
-
"""``picarones.measurements.statistics.cdd_render`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
-
|
| 3 |
-
Canonique : :mod:`picarones.evaluation.statistics.cdd_render`. Migration ::
|
| 4 |
-
|
| 5 |
-
from picarones.evaluation.statistics import ...
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from __future__ import annotations
|
| 9 |
-
|
| 10 |
-
import warnings
|
| 11 |
-
|
| 12 |
-
from picarones.evaluation.statistics.cdd_render import (
|
| 13 |
-
build_critical_difference_svg,
|
| 14 |
-
)
|
| 15 |
-
|
| 16 |
-
warnings.warn(
|
| 17 |
-
"picarones.measurements.statistics.cdd_render is deprecated and will be "
|
| 18 |
-
"removed in 2.0. Import from picarones.evaluation.statistics instead.",
|
| 19 |
-
DeprecationWarning,
|
| 20 |
-
stacklevel=2,
|
| 21 |
-
)
|
| 22 |
-
|
| 23 |
-
__all__ = ['build_critical_difference_svg']
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,24 +0,0 @@
|
|
| 1 |
-
"""``picarones.measurements.statistics.clustering`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
-
|
| 3 |
-
Canonique : :mod:`picarones.evaluation.statistics.clustering`. Migration ::
|
| 4 |
-
|
| 5 |
-
from picarones.evaluation.statistics import ...
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from __future__ import annotations
|
| 9 |
-
|
| 10 |
-
import warnings
|
| 11 |
-
|
| 12 |
-
from picarones.evaluation.statistics.clustering import (
|
| 13 |
-
ErrorCluster,
|
| 14 |
-
cluster_errors,
|
| 15 |
-
)
|
| 16 |
-
|
| 17 |
-
warnings.warn(
|
| 18 |
-
"picarones.measurements.statistics.clustering is deprecated and will be "
|
| 19 |
-
"removed in 2.0. Import from picarones.evaluation.statistics instead.",
|
| 20 |
-
DeprecationWarning,
|
| 21 |
-
stacklevel=2,
|
| 22 |
-
)
|
| 23 |
-
|
| 24 |
-
__all__ = ['ErrorCluster', 'cluster_errors']
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,23 +0,0 @@
|
|
| 1 |
-
"""``picarones.measurements.statistics.correlation`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
-
|
| 3 |
-
Canonique : :mod:`picarones.evaluation.statistics.correlation`. Migration ::
|
| 4 |
-
|
| 5 |
-
from picarones.evaluation.statistics import ...
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from __future__ import annotations
|
| 9 |
-
|
| 10 |
-
import warnings
|
| 11 |
-
|
| 12 |
-
from picarones.evaluation.statistics.correlation import (
|
| 13 |
-
compute_correlation_matrix,
|
| 14 |
-
)
|
| 15 |
-
|
| 16 |
-
warnings.warn(
|
| 17 |
-
"picarones.measurements.statistics.correlation is deprecated and will be "
|
| 18 |
-
"removed in 2.0. Import from picarones.evaluation.statistics instead.",
|
| 19 |
-
DeprecationWarning,
|
| 20 |
-
stacklevel=2,
|
| 21 |
-
)
|
| 22 |
-
|
| 23 |
-
__all__ = ['compute_correlation_matrix']
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,24 +0,0 @@
|
|
| 1 |
-
"""``picarones.measurements.statistics.distributions`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
-
|
| 3 |
-
Canonique : :mod:`picarones.evaluation.statistics.distributions`. Migration ::
|
| 4 |
-
|
| 5 |
-
from picarones.evaluation.statistics import ...
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from __future__ import annotations
|
| 9 |
-
|
| 10 |
-
import warnings
|
| 11 |
-
|
| 12 |
-
from picarones.evaluation.statistics.distributions import (
|
| 13 |
-
compute_reliability_curve,
|
| 14 |
-
compute_venn_data,
|
| 15 |
-
)
|
| 16 |
-
|
| 17 |
-
warnings.warn(
|
| 18 |
-
"picarones.measurements.statistics.distributions is deprecated and will be "
|
| 19 |
-
"removed in 2.0. Import from picarones.evaluation.statistics instead.",
|
| 20 |
-
DeprecationWarning,
|
| 21 |
-
stacklevel=2,
|
| 22 |
-
)
|
| 23 |
-
|
| 24 |
-
__all__ = ['compute_reliability_curve', 'compute_venn_data']
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,27 +0,0 @@
|
|
| 1 |
-
"""``picarones.measurements.statistics.friedman_nemenyi`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
-
|
| 3 |
-
Canonique : :mod:`picarones.evaluation.statistics.friedman_nemenyi`. Migration ::
|
| 4 |
-
|
| 5 |
-
from picarones.evaluation.statistics import ...
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from __future__ import annotations
|
| 9 |
-
|
| 10 |
-
import warnings
|
| 11 |
-
|
| 12 |
-
from picarones.evaluation.statistics.friedman_nemenyi import (
|
| 13 |
-
friedman_test,
|
| 14 |
-
nemenyi_posthoc,
|
| 15 |
-
_chi_square_sf,
|
| 16 |
-
_nemenyi_critical_value,
|
| 17 |
-
_rank_row,
|
| 18 |
-
)
|
| 19 |
-
|
| 20 |
-
warnings.warn(
|
| 21 |
-
"picarones.measurements.statistics.friedman_nemenyi is deprecated and will be "
|
| 22 |
-
"removed in 2.0. Import from picarones.evaluation.statistics instead.",
|
| 23 |
-
DeprecationWarning,
|
| 24 |
-
stacklevel=2,
|
| 25 |
-
)
|
| 26 |
-
|
| 27 |
-
__all__ = ['friedman_test', 'nemenyi_posthoc', '_chi_square_sf', '_nemenyi_critical_value', '_rank_row']
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,23 +0,0 @@
|
|
| 1 |
-
"""``picarones.measurements.statistics.pareto`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
-
|
| 3 |
-
Canonique : :mod:`picarones.evaluation.statistics.pareto`. Migration ::
|
| 4 |
-
|
| 5 |
-
from picarones.evaluation.statistics import ...
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from __future__ import annotations
|
| 9 |
-
|
| 10 |
-
import warnings
|
| 11 |
-
|
| 12 |
-
from picarones.evaluation.statistics.pareto import (
|
| 13 |
-
compute_pareto_front,
|
| 14 |
-
)
|
| 15 |
-
|
| 16 |
-
warnings.warn(
|
| 17 |
-
"picarones.measurements.statistics.pareto is deprecated and will be "
|
| 18 |
-
"removed in 2.0. Import from picarones.evaluation.statistics instead.",
|
| 19 |
-
DeprecationWarning,
|
| 20 |
-
stacklevel=2,
|
| 21 |
-
)
|
| 22 |
-
|
| 23 |
-
__all__ = ['compute_pareto_front']
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,26 +0,0 @@
|
|
| 1 |
-
"""``picarones.measurements.statistics.wilcoxon`` — shim re-export (déprécié, suppression 2.0).
|
| 2 |
-
|
| 3 |
-
Canonique : :mod:`picarones.evaluation.statistics.wilcoxon`. Migration ::
|
| 4 |
-
|
| 5 |
-
from picarones.evaluation.statistics import ...
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
from __future__ import annotations
|
| 9 |
-
|
| 10 |
-
import warnings
|
| 11 |
-
|
| 12 |
-
from picarones.evaluation.statistics.wilcoxon import (
|
| 13 |
-
compute_pairwise_stats,
|
| 14 |
-
wilcoxon_test,
|
| 15 |
-
_SCIPY_AVAILABLE,
|
| 16 |
-
_normal_sf,
|
| 17 |
-
)
|
| 18 |
-
|
| 19 |
-
warnings.warn(
|
| 20 |
-
"picarones.measurements.statistics.wilcoxon is deprecated and will be "
|
| 21 |
-
"removed in 2.0. Import from picarones.evaluation.statistics instead.",
|
| 22 |
-
DeprecationWarning,
|
| 23 |
-
stacklevel=2,
|
| 24 |
-
)
|
| 25 |
-
|
| 26 |
-
__all__ = ['compute_pairwise_stats', 'wilcoxon_test', '_SCIPY_AVAILABLE', '_normal_sf']
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -20,7 +20,7 @@ async def api_htr_united_catalogue(
|
|
| 20 |
script: str = Query(default="", description="Filtre type d'écriture"),
|
| 21 |
) -> dict:
|
| 22 |
"""Catalogue HTR-United filtrable."""
|
| 23 |
-
from picarones.
|
| 24 |
|
| 25 |
cat = HTRUnitedCatalogue.from_demo()
|
| 26 |
results = cat.search(
|
|
@@ -40,7 +40,7 @@ async def api_htr_united_catalogue(
|
|
| 40 |
@router.post("/api/htr-united/import")
|
| 41 |
async def api_htr_united_import(req: HTRUnitedImportRequest) -> dict:
|
| 42 |
"""Importe une entrée HTR-United dans ``req.output_dir``."""
|
| 43 |
-
from picarones.
|
| 44 |
HTRUnitedCatalogue,
|
| 45 |
import_htr_united_corpus,
|
| 46 |
)
|
|
@@ -71,7 +71,7 @@ async def api_huggingface_search(
|
|
| 71 |
limit: int = Query(default=20, ge=1, le=50),
|
| 72 |
) -> dict:
|
| 73 |
"""Recherche de datasets sur HuggingFace Hub."""
|
| 74 |
-
from picarones.
|
| 75 |
|
| 76 |
tag_list = [t.strip() for t in tags.split(",") if t.strip()] if tags else None
|
| 77 |
importer = HuggingFaceImporter()
|
|
@@ -90,7 +90,7 @@ async def api_huggingface_search(
|
|
| 90 |
@router.post("/api/huggingface/import")
|
| 91 |
async def api_huggingface_import(req: HuggingFaceImportRequest) -> dict:
|
| 92 |
"""Importe un dataset HuggingFace dans ``req.output_dir``."""
|
| 93 |
-
from picarones.
|
| 94 |
|
| 95 |
importer = HuggingFaceImporter()
|
| 96 |
return importer.import_dataset(
|
|
|
|
| 20 |
script: str = Query(default="", description="Filtre type d'écriture"),
|
| 21 |
) -> dict:
|
| 22 |
"""Catalogue HTR-United filtrable."""
|
| 23 |
+
from picarones.adapters.corpus.htr_united import HTRUnitedCatalogue
|
| 24 |
|
| 25 |
cat = HTRUnitedCatalogue.from_demo()
|
| 26 |
results = cat.search(
|
|
|
|
| 40 |
@router.post("/api/htr-united/import")
|
| 41 |
async def api_htr_united_import(req: HTRUnitedImportRequest) -> dict:
|
| 42 |
"""Importe une entrée HTR-United dans ``req.output_dir``."""
|
| 43 |
+
from picarones.adapters.corpus.htr_united import (
|
| 44 |
HTRUnitedCatalogue,
|
| 45 |
import_htr_united_corpus,
|
| 46 |
)
|
|
|
|
| 71 |
limit: int = Query(default=20, ge=1, le=50),
|
| 72 |
) -> dict:
|
| 73 |
"""Recherche de datasets sur HuggingFace Hub."""
|
| 74 |
+
from picarones.adapters.corpus.huggingface import HuggingFaceImporter
|
| 75 |
|
| 76 |
tag_list = [t.strip() for t in tags.split(",") if t.strip()] if tags else None
|
| 77 |
importer = HuggingFaceImporter()
|
|
|
|
| 90 |
@router.post("/api/huggingface/import")
|
| 91 |
async def api_huggingface_import(req: HuggingFaceImportRequest) -> dict:
|
| 92 |
"""Importe un dataset HuggingFace dans ``req.output_dir``."""
|
| 93 |
+
from picarones.adapters.corpus.huggingface import HuggingFaceImporter
|
| 94 |
|
| 95 |
importer = HuggingFaceImporter()
|
| 96 |
return importer.import_dataset(
|
|
@@ -97,6 +97,11 @@ REPO_ROOT = Path(__file__).resolve().parents[2]
|
|
| 97 |
#: suppression des 2 derniers shims de ``picarones/core/``. Le
|
| 98 |
#: sous-paquet ``core/`` n'existe plus du tout. Deux nouveaux
|
| 99 |
#: chemins cassés héritage dans ``CHANGELOG.md`` (intouchable).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
#:
|
| 101 |
#: Les chemins cassés restants sont **TOUS** dans :
|
| 102 |
#: - ``CHANGELOG.md`` : journal historique versionné, intouchable.
|
|
@@ -105,7 +110,7 @@ REPO_ROOT = Path(__file__).resolve().parents[2]
|
|
| 105 |
#: - ``docs/migration/{executor-equivalence, legacy-retirement-plan}.md`` :
|
| 106 |
#: audits/plans historiques (citent des chemins legacy à des fins
|
| 107 |
#: de comparaison).
|
| 108 |
-
BROKEN_PATHS_BASELINE =
|
| 109 |
|
| 110 |
#: Patrons de fichiers de documentation à scanner.
|
| 111 |
DOC_GLOBS: tuple[str, ...] = (
|
|
|
|
| 97 |
#: suppression des 2 derniers shims de ``picarones/core/``. Le
|
| 98 |
#: sous-paquet ``core/`` n'existe plus du tout. Deux nouveaux
|
| 99 |
#: chemins cassés héritage dans ``CHANGELOG.md`` (intouchable).
|
| 100 |
+
#: - 138 (sprints « Lots H + I », 2026-05-07) : suppression du
|
| 101 |
+
#: sous-paquet ``measurements/statistics/`` (Lot H, 9 shims) et
|
| 102 |
+
#: des 3 shims ``extras/importers/{htr_united, huggingface,
|
| 103 |
+
#: _fallback_log}`` (Lot I). Quatre nouveaux chemins cassés
|
| 104 |
+
#: héritage répartis dans ``docs/audits/*.md`` (intouchables).
|
| 105 |
#:
|
| 106 |
#: Les chemins cassés restants sont **TOUS** dans :
|
| 107 |
#: - ``CHANGELOG.md`` : journal historique versionné, intouchable.
|
|
|
|
| 110 |
#: - ``docs/migration/{executor-equivalence, legacy-retirement-plan}.md`` :
|
| 111 |
#: audits/plans historiques (citent des chemins legacy à des fins
|
| 112 |
#: de comparaison).
|
| 113 |
+
BROKEN_PATHS_BASELINE = 138
|
| 114 |
|
| 115 |
#: Patrons de fichiers de documentation à scanner.
|
| 116 |
DOC_GLOBS: tuple[str, ...] = (
|
|
@@ -123,13 +123,12 @@ FILE_BUDGETS: dict[str, int] = {
|
|
| 123 |
# ``measurements/roman_numerals.py`` a été supprimé. Seul le
|
| 124 |
# canonique ``evaluation/metrics/roman_numerals.py`` reste.
|
| 125 |
"picarones/evaluation/metrics/roman_numerals.py": 575, # actuel 484
|
| 126 |
-
|
| 127 |
-
#
|
| 128 |
-
#
|
| 129 |
"picarones/adapters/corpus/htr_united.py": 575, # actuel 473
|
| 130 |
"picarones/adapters/corpus/huggingface.py": 550, # actuel 464
|
| 131 |
"picarones/cli/_workflows.py": 550, # actuel 469
|
| 132 |
-
"picarones/extras/importers/huggingface.py": 550, # actuel 464
|
| 133 |
# Phase 4-ter : ``core/metric_hooks.py`` est désormais un shim
|
| 134 |
# (≤ 80 l). Le contenu canonique vit dans ``evaluation/`` ;
|
| 135 |
# même budget pour la même raison historique (centralise les
|
|
|
|
| 123 |
# ``measurements/roman_numerals.py`` a été supprimé. Seul le
|
| 124 |
# canonique ``evaluation/metrics/roman_numerals.py`` reste.
|
| 125 |
"picarones/evaluation/metrics/roman_numerals.py": 575, # actuel 484
|
| 126 |
+
# Sprint A14-S11 + Lot I — déplacés depuis extras/importers/.
|
| 127 |
+
# Les shims ``extras/importers/{htr_united, huggingface,
|
| 128 |
+
# _fallback_log}`` ont été supprimés au Lot I (mai 2026).
|
| 129 |
"picarones/adapters/corpus/htr_united.py": 575, # actuel 473
|
| 130 |
"picarones/adapters/corpus/huggingface.py": 550, # actuel 464
|
| 131 |
"picarones/cli/_workflows.py": 550, # actuel 469
|
|
|
|
| 132 |
# Phase 4-ter : ``core/metric_hooks.py`` est désormais un shim
|
| 133 |
# (≤ 80 l). Le contenu canonique vit dans ``evaluation/`` ;
|
| 134 |
# même budget pour la même raison historique (centralise les
|
|
@@ -71,7 +71,6 @@ TEST_ONLY_BASELINE: frozenset[str] = frozenset({
|
|
| 71 |
"numerical_sequences_hooks",
|
| 72 |
"pipeline_benchmark",
|
| 73 |
"pipeline_comparison",
|
| 74 |
-
"statistics",
|
| 75 |
})
|
| 76 |
|
| 77 |
|
|
|
|
| 71 |
"numerical_sequences_hooks",
|
| 72 |
"pipeline_benchmark",
|
| 73 |
"pipeline_comparison",
|
|
|
|
| 74 |
})
|
| 75 |
|
| 76 |
|
|
@@ -128,7 +128,7 @@ def test_no_orphaned_whitelist_entries() -> None:
|
|
| 128 |
def test_subpackages_not_affected() -> None:
|
| 129 |
"""Méta-test : les sous-packages existants de ``measurements/``
|
| 130 |
(narrative, statistics, runner) restent intouchés par ce test."""
|
| 131 |
-
expected_subpackages = {"narrative", "
|
| 132 |
actual = {
|
| 133 |
p.name for p in MEASUREMENTS_DIR.iterdir()
|
| 134 |
if p.is_dir() and not p.name.startswith("_") and "__pycache__" not in p.name
|
|
|
|
| 128 |
def test_subpackages_not_affected() -> None:
|
| 129 |
"""Méta-test : les sous-packages existants de ``measurements/``
|
| 130 |
(narrative, statistics, runner) restent intouchés par ce test."""
|
| 131 |
+
expected_subpackages = {"narrative", "runner"}
|
| 132 |
actual = {
|
| 133 |
p.name for p in MEASUREMENTS_DIR.iterdir()
|
| 134 |
if p.is_dir() and not p.name.startswith("_") and "__pycache__" not in p.name
|
|
@@ -23,7 +23,7 @@ import pytest
|
|
| 23 |
def _make_fake_benchmark():
|
| 24 |
"""Retourne un BenchmarkResult minimal pour tester le générateur."""
|
| 25 |
from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
|
| 26 |
-
from picarones.
|
| 27 |
|
| 28 |
def _metrics(cer, wer=0.2):
|
| 29 |
return MetricsResult(
|
|
|
|
| 23 |
def _make_fake_benchmark():
|
| 24 |
"""Retourne un BenchmarkResult minimal pour tester le générateur."""
|
| 25 |
from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
|
| 26 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 27 |
|
| 28 |
def _metrics(cer, wer=0.2):
|
| 29 |
return MetricsResult(
|
|
@@ -10,7 +10,8 @@ from picarones.evaluation.metrics.normalization import (
|
|
| 10 |
_apply_diplomatic_table,
|
| 11 |
get_builtin_profile,
|
| 12 |
)
|
| 13 |
-
from picarones.
|
|
|
|
| 14 |
from picarones.extras.importers.iiif import (
|
| 15 |
IIIFManifestParser,
|
| 16 |
parse_page_selector,
|
|
|
|
| 10 |
_apply_diplomatic_table,
|
| 11 |
get_builtin_profile,
|
| 12 |
)
|
| 13 |
+
from picarones.evaluation.metric_result import aggregate_metrics, MetricsResult
|
| 14 |
+
from picarones.measurements.metrics import compute_metrics
|
| 15 |
from picarones.extras.importers.iiif import (
|
| 16 |
IIIFManifestParser,
|
| 17 |
parse_page_selector,
|
|
@@ -162,7 +162,7 @@ class TestEScriptoriumExport:
|
|
| 162 |
|
| 163 |
def _make_benchmark(self, engine_name: str = "tesseract") -> "BenchmarkResult":
|
| 164 |
from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
|
| 165 |
-
from picarones.
|
| 166 |
metrics = MetricsResult(cer=0.05, wer=0.10, cer_nfc=0.05,
|
| 167 |
cer_caseless=0.04, cer_diplomatic=0.04,
|
| 168 |
wer_normalized=0.09, mer=0.09, wil=0.05,
|
|
@@ -228,7 +228,7 @@ class TestEScriptoriumExport:
|
|
| 228 |
def test_export_skips_error_docs(self):
|
| 229 |
from picarones.extras.importers.escriptorium import EScriptoriumClient
|
| 230 |
from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
|
| 231 |
-
from picarones.
|
| 232 |
metrics = MetricsResult(cer=0.1, wer=0.2, cer_nfc=0.1, cer_caseless=0.1,
|
| 233 |
cer_diplomatic=0.1, wer_normalized=0.2, mer=0.2, wil=0.1,
|
| 234 |
reference_length=50, hypothesis_length=50)
|
|
|
|
| 162 |
|
| 163 |
def _make_benchmark(self, engine_name: str = "tesseract") -> "BenchmarkResult":
|
| 164 |
from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
|
| 165 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 166 |
metrics = MetricsResult(cer=0.05, wer=0.10, cer_nfc=0.05,
|
| 167 |
cer_caseless=0.04, cer_diplomatic=0.04,
|
| 168 |
wer_normalized=0.09, mer=0.09, wil=0.05,
|
|
|
|
| 228 |
def test_export_skips_error_docs(self):
|
| 229 |
from picarones.extras.importers.escriptorium import EScriptoriumClient
|
| 230 |
from picarones.evaluation.benchmark_result import BenchmarkResult, EngineReport, DocumentResult
|
| 231 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 232 |
metrics = MetricsResult(cer=0.1, wer=0.2, cer_nfc=0.1, cer_caseless=0.1,
|
| 233 |
cer_diplomatic=0.1, wer_normalized=0.2, mer=0.2, wil=0.1,
|
| 234 |
reference_length=50, hypothesis_length=50)
|
|
@@ -418,7 +418,7 @@ class TestRunnerSilentExceptions:
|
|
| 418 |
|
| 419 |
# Créer un doc_result avec des données de confusion corrompues
|
| 420 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 421 |
-
from picarones.
|
| 422 |
bad_dr = DocumentResult(
|
| 423 |
doc_id="x", image_path="x.png", ground_truth="gt", hypothesis="hyp",
|
| 424 |
metrics=MetricsResult(cer=0.1, cer_nfc=0.1, cer_caseless=0.1,
|
|
@@ -441,7 +441,7 @@ class TestWilcoxonValidation:
|
|
| 441 |
|
| 442 |
def test_identical_sequences_not_significant(self):
|
| 443 |
"""Séquences identiques → pas de différence, p = 1.0, significant = False."""
|
| 444 |
-
from picarones.
|
| 445 |
a = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
|
| 446 |
r = wilcoxon_test(a, a)
|
| 447 |
assert r["significant"] is False
|
|
@@ -450,7 +450,7 @@ class TestWilcoxonValidation:
|
|
| 450 |
|
| 451 |
def test_all_positive_diffs_w_minus_is_zero(self):
|
| 452 |
"""Si toutes les différences a−b sont positives : W⁻ = 0, W⁺ = n(n+1)/2."""
|
| 453 |
-
from picarones.
|
| 454 |
n = 10
|
| 455 |
a = [float(i) for i in range(1, n + 1)]
|
| 456 |
b = [0.0] * n
|
|
@@ -461,7 +461,7 @@ class TestWilcoxonValidation:
|
|
| 461 |
|
| 462 |
def test_w_plus_w_minus_sum_invariant(self):
|
| 463 |
"""W⁺ + W⁻ doit toujours être égal à n(n+1)/2 (n = nombre de paires non nulles)."""
|
| 464 |
-
from picarones.
|
| 465 |
a = [0.10, 0.25, 0.05, 0.40, 0.30, 0.15, 0.20, 0.35, 0.08, 0.18]
|
| 466 |
b = [0.12, 0.20, 0.08, 0.35, 0.28, 0.18, 0.15, 0.40, 0.10, 0.20]
|
| 467 |
r = wilcoxon_test(a, b)
|
|
@@ -474,7 +474,7 @@ class TestWilcoxonValidation:
|
|
| 474 |
|
| 475 |
def test_clearly_different_sequences_significant(self):
|
| 476 |
"""Deux séquences très différentes (n=15) doivent donner p < 0.05."""
|
| 477 |
-
from picarones.
|
| 478 |
a = [0.05] * 15 # moteur A très performant
|
| 479 |
b = [0.60] * 15 # moteur B peu performant — toutes diff = −0.55
|
| 480 |
# Diffs a−b = −0.55 pour tous → W⁺ = 0 → devrait être significatif
|
|
@@ -484,7 +484,7 @@ class TestWilcoxonValidation:
|
|
| 484 |
|
| 485 |
def test_large_n_normal_approximation_reasonable(self):
|
| 486 |
"""Pour n = 20, l'approximation normale doit donner une p-value dans [0, 1]."""
|
| 487 |
-
from picarones.
|
| 488 |
import random
|
| 489 |
rng = random.Random(42)
|
| 490 |
a = [rng.uniform(0.1, 0.5) for _ in range(20)]
|
|
@@ -495,7 +495,7 @@ class TestWilcoxonValidation:
|
|
| 495 |
|
| 496 |
def test_small_n_returns_conservative_p(self):
|
| 497 |
"""Pour n < 10, la p-value doit être 0.04 (significatif) ou 0.20 (non sign.)."""
|
| 498 |
-
from picarones.
|
| 499 |
if _SCIPY_AVAILABLE:
|
| 500 |
pytest.skip("scipy disponible — la table exacte n'est pas utilisée")
|
| 501 |
a = [0.1, 0.2, 0.3]
|
|
@@ -506,7 +506,7 @@ class TestWilcoxonValidation:
|
|
| 506 |
|
| 507 |
def test_result_keys_complete(self):
|
| 508 |
"""Le dict retourné doit contenir toutes les clés documentées."""
|
| 509 |
-
from picarones.
|
| 510 |
r = wilcoxon_test([0.1, 0.3, 0.2, 0.4, 0.15, 0.35, 0.25, 0.5, 0.45, 0.05],
|
| 511 |
[0.2, 0.2, 0.3, 0.3, 0.25, 0.25, 0.35, 0.35, 0.40, 0.15])
|
| 512 |
for key in ("statistic", "p_value", "significant", "interpretation", "n_pairs", "W_plus", "W_minus"):
|
|
@@ -521,12 +521,12 @@ class TestWilcoxonScipyIntegration:
|
|
| 521 |
|
| 522 |
def test_scipy_available_flag_is_bool(self):
|
| 523 |
"""_SCIPY_AVAILABLE doit être un booléen."""
|
| 524 |
-
from picarones.
|
| 525 |
assert isinstance(_SCIPY_AVAILABLE, bool)
|
| 526 |
|
| 527 |
def test_scipy_and_native_agree_on_significance(self):
|
| 528 |
"""Scipy et l'implémentation native doivent s'accorder sur la significativité."""
|
| 529 |
-
from picarones.
|
| 530 |
if not _SCIPY_AVAILABLE:
|
| 531 |
pytest.skip("scipy non disponible")
|
| 532 |
|
|
@@ -542,7 +542,7 @@ class TestWilcoxonScipyIntegration:
|
|
| 542 |
|
| 543 |
def test_scipy_p_value_in_valid_range(self):
|
| 544 |
"""La p-value fournie par scipy doit être dans [0, 1]."""
|
| 545 |
-
from picarones.
|
| 546 |
if not _SCIPY_AVAILABLE:
|
| 547 |
pytest.skip("scipy non disponible")
|
| 548 |
|
|
|
|
| 418 |
|
| 419 |
# Créer un doc_result avec des données de confusion corrompues
|
| 420 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 421 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 422 |
bad_dr = DocumentResult(
|
| 423 |
doc_id="x", image_path="x.png", ground_truth="gt", hypothesis="hyp",
|
| 424 |
metrics=MetricsResult(cer=0.1, cer_nfc=0.1, cer_caseless=0.1,
|
|
|
|
| 441 |
|
| 442 |
def test_identical_sequences_not_significant(self):
|
| 443 |
"""Séquences identiques → pas de différence, p = 1.0, significant = False."""
|
| 444 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 445 |
a = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
|
| 446 |
r = wilcoxon_test(a, a)
|
| 447 |
assert r["significant"] is False
|
|
|
|
| 450 |
|
| 451 |
def test_all_positive_diffs_w_minus_is_zero(self):
|
| 452 |
"""Si toutes les différences a−b sont positives : W⁻ = 0, W⁺ = n(n+1)/2."""
|
| 453 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 454 |
n = 10
|
| 455 |
a = [float(i) for i in range(1, n + 1)]
|
| 456 |
b = [0.0] * n
|
|
|
|
| 461 |
|
| 462 |
def test_w_plus_w_minus_sum_invariant(self):
|
| 463 |
"""W⁺ + W⁻ doit toujours être égal à n(n+1)/2 (n = nombre de paires non nulles)."""
|
| 464 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 465 |
a = [0.10, 0.25, 0.05, 0.40, 0.30, 0.15, 0.20, 0.35, 0.08, 0.18]
|
| 466 |
b = [0.12, 0.20, 0.08, 0.35, 0.28, 0.18, 0.15, 0.40, 0.10, 0.20]
|
| 467 |
r = wilcoxon_test(a, b)
|
|
|
|
| 474 |
|
| 475 |
def test_clearly_different_sequences_significant(self):
|
| 476 |
"""Deux séquences très différentes (n=15) doivent donner p < 0.05."""
|
| 477 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 478 |
a = [0.05] * 15 # moteur A très performant
|
| 479 |
b = [0.60] * 15 # moteur B peu performant — toutes diff = −0.55
|
| 480 |
# Diffs a−b = −0.55 pour tous → W⁺ = 0 → devrait être significatif
|
|
|
|
| 484 |
|
| 485 |
def test_large_n_normal_approximation_reasonable(self):
|
| 486 |
"""Pour n = 20, l'approximation normale doit donner une p-value dans [0, 1]."""
|
| 487 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 488 |
import random
|
| 489 |
rng = random.Random(42)
|
| 490 |
a = [rng.uniform(0.1, 0.5) for _ in range(20)]
|
|
|
|
| 495 |
|
| 496 |
def test_small_n_returns_conservative_p(self):
|
| 497 |
"""Pour n < 10, la p-value doit être 0.04 (significatif) ou 0.20 (non sign.)."""
|
| 498 |
+
from picarones.evaluation.statistics import wilcoxon_test, _SCIPY_AVAILABLE
|
| 499 |
if _SCIPY_AVAILABLE:
|
| 500 |
pytest.skip("scipy disponible — la table exacte n'est pas utilisée")
|
| 501 |
a = [0.1, 0.2, 0.3]
|
|
|
|
| 506 |
|
| 507 |
def test_result_keys_complete(self):
|
| 508 |
"""Le dict retourné doit contenir toutes les clés documentées."""
|
| 509 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 510 |
r = wilcoxon_test([0.1, 0.3, 0.2, 0.4, 0.15, 0.35, 0.25, 0.5, 0.45, 0.05],
|
| 511 |
[0.2, 0.2, 0.3, 0.3, 0.25, 0.25, 0.35, 0.35, 0.40, 0.15])
|
| 512 |
for key in ("statistic", "p_value", "significant", "interpretation", "n_pairs", "W_plus", "W_minus"):
|
|
|
|
| 521 |
|
| 522 |
def test_scipy_available_flag_is_bool(self):
|
| 523 |
"""_SCIPY_AVAILABLE doit être un booléen."""
|
| 524 |
+
from picarones.evaluation.statistics import _SCIPY_AVAILABLE
|
| 525 |
assert isinstance(_SCIPY_AVAILABLE, bool)
|
| 526 |
|
| 527 |
def test_scipy_and_native_agree_on_significance(self):
|
| 528 |
"""Scipy et l'implémentation native doivent s'accorder sur la significativité."""
|
| 529 |
+
from picarones.evaluation.statistics import wilcoxon_test, _SCIPY_AVAILABLE
|
| 530 |
if not _SCIPY_AVAILABLE:
|
| 531 |
pytest.skip("scipy non disponible")
|
| 532 |
|
|
|
|
| 542 |
|
| 543 |
def test_scipy_p_value_in_valid_range(self):
|
| 544 |
"""La p-value fournie par scipy doit être dans [0, 1]."""
|
| 545 |
+
from picarones.evaluation.statistics import wilcoxon_test, _SCIPY_AVAILABLE
|
| 546 |
if not _SCIPY_AVAILABLE:
|
| 547 |
pytest.skip("scipy non disponible")
|
| 548 |
|
|
@@ -2,7 +2,8 @@
|
|
| 2 |
|
| 3 |
import pytest
|
| 4 |
|
| 5 |
-
from picarones.
|
|
|
|
| 6 |
|
| 7 |
|
| 8 |
class TestComputeMetrics:
|
|
|
|
| 2 |
|
| 3 |
import pytest
|
| 4 |
|
| 5 |
+
from picarones.evaluation.metric_result import aggregate_metrics, MetricsResult
|
| 6 |
+
from picarones.measurements.metrics import compute_metrics
|
| 7 |
|
| 8 |
|
| 9 |
class TestComputeMetrics:
|
|
@@ -26,7 +26,7 @@ from picarones.evaluation.metrics.pricing import (
|
|
| 26 |
estimate_cost,
|
| 27 |
load_pricing_database,
|
| 28 |
)
|
| 29 |
-
from picarones.
|
| 30 |
|
| 31 |
|
| 32 |
# ---------------------------------------------------------------------------
|
|
|
|
| 26 |
estimate_cost,
|
| 27 |
load_pricing_database,
|
| 28 |
)
|
| 29 |
+
from picarones.evaluation.statistics import compute_pareto_front
|
| 30 |
|
| 31 |
|
| 32 |
# ---------------------------------------------------------------------------
|
|
@@ -3,7 +3,7 @@
|
|
| 3 |
import json
|
| 4 |
import pytest
|
| 5 |
|
| 6 |
-
from picarones.
|
| 7 |
from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
|
| 8 |
|
| 9 |
|
|
|
|
| 3 |
import json
|
| 4 |
import pytest
|
| 5 |
|
| 6 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 7 |
from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
|
| 8 |
|
| 9 |
|
|
@@ -225,7 +225,7 @@ class TestLineMetricsInResults:
|
|
| 225 |
|
| 226 |
def test_document_result_has_line_metrics_field(self):
|
| 227 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 228 |
-
from picarones.
|
| 229 |
dr = DocumentResult(
|
| 230 |
doc_id="test_001",
|
| 231 |
image_path="/test/img.jpg",
|
|
@@ -245,7 +245,7 @@ class TestLineMetricsInResults:
|
|
| 245 |
|
| 246 |
def test_document_result_has_hallucination_metrics_field(self):
|
| 247 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 248 |
-
from picarones.
|
| 249 |
dr = DocumentResult(
|
| 250 |
doc_id="test_002",
|
| 251 |
image_path="/test/img.jpg",
|
|
@@ -265,7 +265,7 @@ class TestLineMetricsInResults:
|
|
| 265 |
|
| 266 |
def test_document_result_as_dict_includes_sprint10_fields(self):
|
| 267 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 268 |
-
from picarones.
|
| 269 |
dr = DocumentResult(
|
| 270 |
doc_id="test_003",
|
| 271 |
image_path="/test/img.jpg",
|
|
@@ -287,7 +287,7 @@ class TestLineMetricsInResults:
|
|
| 287 |
|
| 288 |
def test_engine_report_has_aggregated_sprint10_fields(self):
|
| 289 |
from picarones.evaluation.benchmark_result import EngineReport, DocumentResult
|
| 290 |
-
from picarones.
|
| 291 |
dr = DocumentResult(
|
| 292 |
doc_id="test_004",
|
| 293 |
image_path="/test/img.jpg",
|
|
|
|
| 225 |
|
| 226 |
def test_document_result_has_line_metrics_field(self):
|
| 227 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 228 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 229 |
dr = DocumentResult(
|
| 230 |
doc_id="test_001",
|
| 231 |
image_path="/test/img.jpg",
|
|
|
|
| 245 |
|
| 246 |
def test_document_result_has_hallucination_metrics_field(self):
|
| 247 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 248 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 249 |
dr = DocumentResult(
|
| 250 |
doc_id="test_002",
|
| 251 |
image_path="/test/img.jpg",
|
|
|
|
| 265 |
|
| 266 |
def test_document_result_as_dict_includes_sprint10_fields(self):
|
| 267 |
from picarones.evaluation.benchmark_result import DocumentResult
|
| 268 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 269 |
dr = DocumentResult(
|
| 270 |
doc_id="test_003",
|
| 271 |
image_path="/test/img.jpg",
|
|
|
|
| 287 |
|
| 288 |
def test_engine_report_has_aggregated_sprint10_fields(self):
|
| 289 |
from picarones.evaluation.benchmark_result import EngineReport, DocumentResult
|
| 290 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 291 |
dr = DocumentResult(
|
| 292 |
doc_id="test_004",
|
| 293 |
image_path="/test/img.jpg",
|
|
@@ -195,7 +195,7 @@ def sample_generator():
|
|
| 195 |
"""Fixture partagée : crée un ReportGenerator avec des données fictives."""
|
| 196 |
from picarones.reports_v2.html.generator import ReportGenerator
|
| 197 |
from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
|
| 198 |
-
from picarones.
|
| 199 |
|
| 200 |
def _make_metric(cer=0.1):
|
| 201 |
return MetricsResult(
|
|
|
|
| 195 |
"""Fixture partagée : crée un ReportGenerator avec des données fictives."""
|
| 196 |
from picarones.reports_v2.html.generator import ReportGenerator
|
| 197 |
from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
|
| 198 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 199 |
|
| 200 |
def _make_metric(cer=0.1):
|
| 201 |
return MetricsResult(
|
|
@@ -14,7 +14,7 @@ import re
|
|
| 14 |
|
| 15 |
import pytest
|
| 16 |
|
| 17 |
-
from picarones.
|
| 18 |
build_critical_difference_svg,
|
| 19 |
friedman_test,
|
| 20 |
nemenyi_posthoc,
|
|
|
|
| 14 |
|
| 15 |
import pytest
|
| 16 |
|
| 17 |
+
from picarones.evaluation.statistics import (
|
| 18 |
build_critical_difference_svg,
|
| 19 |
friedman_test,
|
| 20 |
nemenyi_posthoc,
|
|
@@ -26,7 +26,7 @@ from picarones.evaluation.metrics.pricing import (
|
|
| 26 |
estimate_cost,
|
| 27 |
load_pricing_database,
|
| 28 |
)
|
| 29 |
-
from picarones.
|
| 30 |
|
| 31 |
|
| 32 |
# ---------------------------------------------------------------------------
|
|
|
|
| 26 |
estimate_cost,
|
| 27 |
load_pricing_database,
|
| 28 |
)
|
| 29 |
+
from picarones.evaluation.statistics import compute_pareto_front
|
| 30 |
|
| 31 |
|
| 32 |
# ---------------------------------------------------------------------------
|
|
@@ -38,7 +38,7 @@ from picarones.measurements.narrative import (
|
|
| 38 |
select_facts,
|
| 39 |
)
|
| 40 |
from picarones.measurements.narrative.arbiter import DEFAULT_TYPE_ORDER
|
| 41 |
-
from picarones.
|
| 42 |
|
| 43 |
ROOT = Path(__file__).parent.parent.parent
|
| 44 |
TEMPLATES_DIR = ROOT / "picarones" / "measurements" / "narrative" / "templates"
|
|
|
|
| 38 |
select_facts,
|
| 39 |
)
|
| 40 |
from picarones.measurements.narrative.arbiter import DEFAULT_TYPE_ORDER
|
| 41 |
+
from picarones.evaluation.statistics import bootstrap_ci
|
| 42 |
|
| 43 |
ROOT = Path(__file__).parent.parent.parent
|
| 44 |
TEMPLATES_DIR = ROOT / "picarones" / "measurements" / "narrative" / "templates"
|
|
@@ -97,7 +97,7 @@ def _make_document_result(
|
|
| 97 |
hypothesis: str = "Marie de Bourgogne en 1477.",
|
| 98 |
ner_metrics: dict | None = None,
|
| 99 |
) -> DocumentResult:
|
| 100 |
-
from picarones.
|
| 101 |
|
| 102 |
return DocumentResult(
|
| 103 |
doc_id=doc_id,
|
|
|
|
| 97 |
hypothesis: str = "Marie de Bourgogne en 1477.",
|
| 98 |
ner_metrics: dict | None = None,
|
| 99 |
) -> DocumentResult:
|
| 100 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 101 |
|
| 102 |
return DocumentResult(
|
| 103 |
doc_id=doc_id,
|
|
@@ -59,7 +59,7 @@ class TestEngineResultExtension:
|
|
| 59 |
|
| 60 |
|
| 61 |
def _make_dr(calibration_metrics: dict | None = None) -> DocumentResult:
|
| 62 |
-
from picarones.
|
| 63 |
|
| 64 |
return DocumentResult(
|
| 65 |
doc_id="d1", image_path="/tmp/x.png",
|
|
|
|
| 59 |
|
| 60 |
|
| 61 |
def _make_dr(calibration_metrics: dict | None = None) -> DocumentResult:
|
| 62 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 63 |
|
| 64 |
return DocumentResult(
|
| 65 |
doc_id="d1", image_path="/tmp/x.png",
|
|
@@ -23,7 +23,7 @@ import re
|
|
| 23 |
|
| 24 |
import pytest
|
| 25 |
|
| 26 |
-
from picarones.
|
| 27 |
from picarones.measurements.narrative.detectors import detect_median_mean_gap_warning
|
| 28 |
from picarones.domain.facts import FactImportance, FactType
|
| 29 |
from picarones.measurements.narrative.renderer import extract_numbers, render_fact
|
|
|
|
| 23 |
|
| 24 |
import pytest
|
| 25 |
|
| 26 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 27 |
from picarones.measurements.narrative.detectors import detect_median_mean_gap_warning
|
| 28 |
from picarones.domain.facts import FactImportance, FactType
|
| 29 |
from picarones.measurements.narrative.renderer import extract_numbers, render_fact
|
|
@@ -26,7 +26,7 @@ from __future__ import annotations
|
|
| 26 |
|
| 27 |
import pytest
|
| 28 |
|
| 29 |
-
from picarones.
|
| 30 |
from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
|
| 31 |
|
| 32 |
|
|
|
|
| 26 |
|
| 27 |
import pytest
|
| 28 |
|
| 29 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 30 |
from picarones.evaluation.benchmark_result import BenchmarkResult, DocumentResult, EngineReport
|
| 31 |
|
| 32 |
|
|
@@ -29,7 +29,7 @@ from picarones.measurements.philological_hooks import (
|
|
| 29 |
compute_philological_metrics,
|
| 30 |
)
|
| 31 |
from picarones.evaluation.benchmark_result import DocumentResult, EngineReport
|
| 32 |
-
from picarones.
|
| 33 |
|
| 34 |
|
| 35 |
def _make_doc(
|
|
|
|
| 29 |
compute_philological_metrics,
|
| 30 |
)
|
| 31 |
from picarones.evaluation.benchmark_result import DocumentResult, EngineReport
|
| 32 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 33 |
|
| 34 |
|
| 35 |
def _make_doc(
|
|
@@ -26,7 +26,7 @@ from pathlib import Path
|
|
| 26 |
|
| 27 |
import pytest
|
| 28 |
|
| 29 |
-
from picarones.
|
| 30 |
from picarones.measurements.narrative.detectors import detect_stratification_recommended
|
| 31 |
from picarones.domain.facts import FactImportance, FactType
|
| 32 |
from picarones.measurements.narrative.renderer import extract_numbers, render_fact
|
|
|
|
| 26 |
|
| 27 |
import pytest
|
| 28 |
|
| 29 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 30 |
from picarones.measurements.narrative.detectors import detect_stratification_recommended
|
| 31 |
from picarones.domain.facts import FactImportance, FactType
|
| 32 |
from picarones.measurements.narrative.renderer import extract_numbers, render_fact
|
|
@@ -53,41 +53,41 @@ def html_s7(sample_benchmark_s7):
|
|
| 53 |
|
| 54 |
class TestBootstrapCI:
|
| 55 |
def test_returns_tuple_of_two(self):
|
| 56 |
-
from picarones.
|
| 57 |
result = bootstrap_ci([0.1, 0.2, 0.3])
|
| 58 |
assert isinstance(result, tuple) and len(result) == 2
|
| 59 |
|
| 60 |
def test_lower_le_upper(self):
|
| 61 |
-
from picarones.
|
| 62 |
lo, hi = bootstrap_ci([0.1, 0.2, 0.3, 0.4, 0.5])
|
| 63 |
assert lo <= hi
|
| 64 |
|
| 65 |
def test_ci_contains_mean(self):
|
| 66 |
-
from picarones.
|
| 67 |
values = [0.1, 0.15, 0.2, 0.12, 0.18, 0.13, 0.17]
|
| 68 |
lo, hi = bootstrap_ci(values)
|
| 69 |
mean = sum(values) / len(values)
|
| 70 |
assert lo <= mean <= hi
|
| 71 |
|
| 72 |
def test_empty_returns_zeros(self):
|
| 73 |
-
from picarones.
|
| 74 |
lo, hi = bootstrap_ci([])
|
| 75 |
assert lo == 0.0 and hi == 0.0
|
| 76 |
|
| 77 |
def test_single_value(self):
|
| 78 |
-
from picarones.
|
| 79 |
lo, hi = bootstrap_ci([0.25])
|
| 80 |
assert lo <= 0.25 <= hi
|
| 81 |
|
| 82 |
def test_reproducible_with_seed(self):
|
| 83 |
-
from picarones.
|
| 84 |
vals = [0.1, 0.2, 0.3, 0.15, 0.25]
|
| 85 |
r1 = bootstrap_ci(vals, seed=1)
|
| 86 |
r2 = bootstrap_ci(vals, seed=1)
|
| 87 |
assert r1 == r2
|
| 88 |
|
| 89 |
def test_wider_with_more_variance(self):
|
| 90 |
-
from picarones.
|
| 91 |
narrow = [0.10, 0.11, 0.10, 0.11, 0.10]
|
| 92 |
wide = [0.01, 0.50, 0.02, 0.49, 0.01]
|
| 93 |
lo_n, hi_n = bootstrap_ci(narrow, n_iter=500)
|
|
@@ -101,7 +101,7 @@ class TestBootstrapCI:
|
|
| 101 |
|
| 102 |
class TestWilcoxonTest:
|
| 103 |
def test_returns_dict_with_keys(self):
|
| 104 |
-
from picarones.
|
| 105 |
r = wilcoxon_test([0.1]*5, [0.1]*5)
|
| 106 |
assert "statistic" in r
|
| 107 |
assert "p_value" in r
|
|
@@ -109,13 +109,13 @@ class TestWilcoxonTest:
|
|
| 109 |
assert "interpretation" in r
|
| 110 |
|
| 111 |
def test_identical_series_not_significant(self):
|
| 112 |
-
from picarones.
|
| 113 |
vals = [0.1, 0.2, 0.3, 0.15, 0.05]
|
| 114 |
r = wilcoxon_test(vals, vals)
|
| 115 |
assert not r["significant"]
|
| 116 |
|
| 117 |
def test_clearly_different_series_significant(self):
|
| 118 |
-
from picarones.
|
| 119 |
a = [0.01]*12
|
| 120 |
b = [0.80]*12
|
| 121 |
r = wilcoxon_test(a, b)
|
|
@@ -123,37 +123,37 @@ class TestWilcoxonTest:
|
|
| 123 |
assert r["p_value"] < 0.05
|
| 124 |
|
| 125 |
def test_p_value_in_range(self):
|
| 126 |
-
from picarones.
|
| 127 |
a = [0.1, 0.15, 0.2, 0.08]
|
| 128 |
b = [0.2, 0.25, 0.3, 0.18]
|
| 129 |
r = wilcoxon_test(a, b)
|
| 130 |
assert 0.0 <= r["p_value"] <= 1.0
|
| 131 |
|
| 132 |
def test_interpretation_is_string(self):
|
| 133 |
-
from picarones.
|
| 134 |
r = wilcoxon_test([0.1, 0.2], [0.1, 0.2])
|
| 135 |
assert isinstance(r["interpretation"], str) and len(r["interpretation"]) > 10
|
| 136 |
|
| 137 |
def test_n_pairs_correct(self):
|
| 138 |
-
from picarones.
|
| 139 |
r = wilcoxon_test([0.1, 0.2, 0.3], [0.1, 0.2, 0.3])
|
| 140 |
# tous les diffs = 0, filtrés en mode wilcox
|
| 141 |
assert r["n_pairs"] == 0
|
| 142 |
|
| 143 |
def test_mismatched_lengths_raises(self):
|
| 144 |
-
from picarones.
|
| 145 |
with pytest.raises(ValueError):
|
| 146 |
wilcoxon_test([0.1, 0.2], [0.1])
|
| 147 |
|
| 148 |
def test_w_plus_w_minus_present(self):
|
| 149 |
-
from picarones.
|
| 150 |
a = [0.1, 0.2, 0.3, 0.15, 0.25, 0.18, 0.12, 0.22, 0.08, 0.27]
|
| 151 |
b = [0.2, 0.3, 0.4, 0.25, 0.35, 0.28, 0.22, 0.32, 0.18, 0.37]
|
| 152 |
r = wilcoxon_test(a, b)
|
| 153 |
assert "W_plus" in r and "W_minus" in r
|
| 154 |
|
| 155 |
def test_significant_larger_sample(self):
|
| 156 |
-
from picarones.
|
| 157 |
import random
|
| 158 |
rng = random.Random(0)
|
| 159 |
a = [rng.uniform(0.0, 0.05) for _ in range(15)]
|
|
@@ -162,7 +162,7 @@ class TestWilcoxonTest:
|
|
| 162 |
assert r["significant"]
|
| 163 |
|
| 164 |
def test_symmetry(self):
|
| 165 |
-
from picarones.
|
| 166 |
a = [0.1, 0.2, 0.3, 0.15, 0.25, 0.18, 0.22, 0.08, 0.27, 0.14]
|
| 167 |
b = [0.2, 0.3, 0.4, 0.25, 0.35, 0.28, 0.32, 0.18, 0.37, 0.24]
|
| 168 |
r_ab = wilcoxon_test(a, b)
|
|
@@ -177,35 +177,35 @@ class TestWilcoxonTest:
|
|
| 177 |
|
| 178 |
class TestPairwiseStats:
|
| 179 |
def test_returns_list(self):
|
| 180 |
-
from picarones.
|
| 181 |
r = compute_pairwise_stats({"A": [0.1, 0.2], "B": [0.3, 0.4]})
|
| 182 |
assert isinstance(r, list)
|
| 183 |
|
| 184 |
def test_correct_pair_count_2_engines(self):
|
| 185 |
-
from picarones.
|
| 186 |
r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
|
| 187 |
assert len(r) == 1
|
| 188 |
|
| 189 |
def test_correct_pair_count_3_engines(self):
|
| 190 |
-
from picarones.
|
| 191 |
r = compute_pairwise_stats({
|
| 192 |
"A": [0.1]*5, "B": [0.2]*5, "C": [0.3]*5
|
| 193 |
})
|
| 194 |
assert len(r) == 3
|
| 195 |
|
| 196 |
def test_pair_has_engine_names(self):
|
| 197 |
-
from picarones.
|
| 198 |
r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
|
| 199 |
assert r[0]["engine_a"] in ["A", "B"]
|
| 200 |
assert r[0]["engine_b"] in ["A", "B"]
|
| 201 |
|
| 202 |
def test_pair_has_p_value(self):
|
| 203 |
-
from picarones.
|
| 204 |
r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
|
| 205 |
assert "p_value" in r[0]
|
| 206 |
|
| 207 |
def test_single_engine_returns_empty(self):
|
| 208 |
-
from picarones.
|
| 209 |
r = compute_pairwise_stats({"A": [0.1]*5})
|
| 210 |
assert r == []
|
| 211 |
|
|
@@ -216,33 +216,33 @@ class TestPairwiseStats:
|
|
| 216 |
|
| 217 |
class TestReliabilityCurve:
|
| 218 |
def test_returns_list(self):
|
| 219 |
-
from picarones.
|
| 220 |
r = compute_reliability_curve([0.1, 0.2, 0.3])
|
| 221 |
assert isinstance(r, list)
|
| 222 |
|
| 223 |
def test_correct_number_of_steps(self):
|
| 224 |
-
from picarones.
|
| 225 |
r = compute_reliability_curve([0.1]*10, steps=5)
|
| 226 |
assert len(r) == 5
|
| 227 |
|
| 228 |
def test_pct_docs_increases(self):
|
| 229 |
-
from picarones.
|
| 230 |
r = compute_reliability_curve([0.1, 0.2, 0.3, 0.4, 0.5], steps=5)
|
| 231 |
pcts = [p["pct_docs"] for p in r]
|
| 232 |
assert pcts == sorted(pcts)
|
| 233 |
|
| 234 |
def test_mean_cer_increases(self):
|
| 235 |
-
from picarones.
|
| 236 |
r = compute_reliability_curve([0.05, 0.10, 0.20, 0.30, 0.50], steps=5)
|
| 237 |
cers = [p["mean_cer"] for p in r]
|
| 238 |
assert cers[0] <= cers[-1]
|
| 239 |
|
| 240 |
def test_empty_returns_empty(self):
|
| 241 |
-
from picarones.
|
| 242 |
assert compute_reliability_curve([]) == []
|
| 243 |
|
| 244 |
def test_last_point_includes_all(self):
|
| 245 |
-
from picarones.
|
| 246 |
vals = [0.1, 0.2, 0.3]
|
| 247 |
r = compute_reliability_curve(vals, steps=4)
|
| 248 |
last = r[-1]
|
|
@@ -250,7 +250,7 @@ class TestReliabilityCurve:
|
|
| 250 |
assert last["mean_cer"] == pytest.approx(expected, rel=1e-4)
|
| 251 |
|
| 252 |
def test_each_point_has_required_keys(self):
|
| 253 |
-
from picarones.
|
| 254 |
r = compute_reliability_curve([0.1, 0.2, 0.3], steps=3)
|
| 255 |
for p in r:
|
| 256 |
assert "pct_docs" in p and "mean_cer" in p
|
|
@@ -262,47 +262,47 @@ class TestReliabilityCurve:
|
|
| 262 |
|
| 263 |
class TestVennData:
|
| 264 |
def test_venn2_type(self):
|
| 265 |
-
from picarones.
|
| 266 |
r = compute_venn_data({"A": {"e1","e2"}, "B": {"e2","e3"}})
|
| 267 |
assert r["type"] == "venn2"
|
| 268 |
|
| 269 |
def test_venn3_type(self):
|
| 270 |
-
from picarones.
|
| 271 |
r = compute_venn_data({"A": {"e1"}, "B": {"e2"}, "C": {"e3"}})
|
| 272 |
assert r["type"] == "venn3"
|
| 273 |
|
| 274 |
def test_venn2_counts_correct(self):
|
| 275 |
-
from picarones.
|
| 276 |
r = compute_venn_data({"A": {"e1","e2","e3"}, "B": {"e2","e3","e4"}})
|
| 277 |
assert r["only_a"] == 1
|
| 278 |
assert r["only_b"] == 1
|
| 279 |
assert r["both"] == 2
|
| 280 |
|
| 281 |
def test_venn2_disjoint(self):
|
| 282 |
-
from picarones.
|
| 283 |
r = compute_venn_data({"A": {"e1"}, "B": {"e2"}})
|
| 284 |
assert r["both"] == 0
|
| 285 |
assert r["only_a"] == 1
|
| 286 |
assert r["only_b"] == 1
|
| 287 |
|
| 288 |
def test_venn2_subset(self):
|
| 289 |
-
from picarones.
|
| 290 |
r = compute_venn_data({"A": {"e1","e2"}, "B": {"e1","e2","e3"}})
|
| 291 |
assert r["only_a"] == 0
|
| 292 |
|
| 293 |
def test_venn3_abc_count(self):
|
| 294 |
-
from picarones.
|
| 295 |
shared = {"e1","e2"}
|
| 296 |
r = compute_venn_data({"A": shared, "B": shared, "C": shared})
|
| 297 |
assert r["abc"] == 2
|
| 298 |
|
| 299 |
def test_empty_returns_empty(self):
|
| 300 |
-
from picarones.
|
| 301 |
r = compute_venn_data({})
|
| 302 |
assert r == {}
|
| 303 |
|
| 304 |
def test_labels_present(self):
|
| 305 |
-
from picarones.
|
| 306 |
r = compute_venn_data({"moteur_a": {"e1"}, "moteur_b": {"e2"}})
|
| 307 |
assert r["label_a"] == "moteur_a"
|
| 308 |
assert r["label_b"] == "moteur_b"
|
|
@@ -324,17 +324,17 @@ class TestErrorClustering:
|
|
| 324 |
]
|
| 325 |
|
| 326 |
def test_returns_list(self):
|
| 327 |
-
from picarones.
|
| 328 |
result = cluster_errors(self._sample_data())
|
| 329 |
assert isinstance(result, list)
|
| 330 |
|
| 331 |
def test_max_clusters_respected(self):
|
| 332 |
-
from picarones.
|
| 333 |
result = cluster_errors(self._sample_data(), max_clusters=3)
|
| 334 |
assert len(result) <= 3
|
| 335 |
|
| 336 |
def test_cluster_has_required_keys(self):
|
| 337 |
-
from picarones.
|
| 338 |
result = cluster_errors(self._sample_data())
|
| 339 |
if result:
|
| 340 |
c = result[0]
|
|
@@ -344,7 +344,7 @@ class TestErrorClustering:
|
|
| 344 |
assert hasattr(c, "examples")
|
| 345 |
|
| 346 |
def test_as_dict_method(self):
|
| 347 |
-
from picarones.
|
| 348 |
result = cluster_errors(self._sample_data())
|
| 349 |
if result:
|
| 350 |
d = result[0].as_dict()
|
|
@@ -354,24 +354,24 @@ class TestErrorClustering:
|
|
| 354 |
assert "examples" in d
|
| 355 |
|
| 356 |
def test_sorted_by_count_descending(self):
|
| 357 |
-
from picarones.
|
| 358 |
result = cluster_errors(self._sample_data())
|
| 359 |
if len(result) >= 2:
|
| 360 |
assert result[0].count >= result[1].count
|
| 361 |
|
| 362 |
def test_examples_capped_at_5(self):
|
| 363 |
-
from picarones.
|
| 364 |
result = cluster_errors(self._sample_data())
|
| 365 |
for c in result:
|
| 366 |
assert len(c.as_dict()["examples"]) <= 5
|
| 367 |
|
| 368 |
def test_empty_data_returns_empty(self):
|
| 369 |
-
from picarones.
|
| 370 |
result = cluster_errors([])
|
| 371 |
assert result == []
|
| 372 |
|
| 373 |
def test_cluster_id_unique(self):
|
| 374 |
-
from picarones.
|
| 375 |
result = cluster_errors(self._sample_data())
|
| 376 |
ids = [c.cluster_id for c in result]
|
| 377 |
assert len(ids) == len(set(ids))
|
|
@@ -392,12 +392,12 @@ class TestCorrelationMatrix:
|
|
| 392 |
]
|
| 393 |
|
| 394 |
def test_returns_dict_with_labels_and_matrix(self):
|
| 395 |
-
from picarones.
|
| 396 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 397 |
assert "labels" in r and "matrix" in r
|
| 398 |
|
| 399 |
def test_matrix_is_square(self):
|
| 400 |
-
from picarones.
|
| 401 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 402 |
n = len(r["labels"])
|
| 403 |
assert len(r["matrix"]) == n
|
|
@@ -405,13 +405,13 @@ class TestCorrelationMatrix:
|
|
| 405 |
assert len(row) == n
|
| 406 |
|
| 407 |
def test_diagonal_is_one(self):
|
| 408 |
-
from picarones.
|
| 409 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 410 |
for i in range(len(r["labels"])):
|
| 411 |
assert r["matrix"][i][i] == pytest.approx(1.0)
|
| 412 |
|
| 413 |
def test_cer_quality_negatively_correlated(self):
|
| 414 |
-
from picarones.
|
| 415 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 416 |
labels = r["labels"]
|
| 417 |
if "cer" in labels and "quality_score" in labels:
|
|
@@ -420,7 +420,7 @@ class TestCorrelationMatrix:
|
|
| 420 |
assert r["matrix"][i][j] < 0 # plus la qualité est bonne, plus le CER est bas
|
| 421 |
|
| 422 |
def test_symmetric_matrix(self):
|
| 423 |
-
from picarones.
|
| 424 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 425 |
n = len(r["labels"])
|
| 426 |
for i in range(n):
|
|
@@ -428,18 +428,18 @@ class TestCorrelationMatrix:
|
|
| 428 |
assert r["matrix"][i][j] == pytest.approx(r["matrix"][j][i], abs=1e-6)
|
| 429 |
|
| 430 |
def test_empty_returns_empty(self):
|
| 431 |
-
from picarones.
|
| 432 |
r = compute_correlation_matrix([])
|
| 433 |
assert r == {"labels": [], "matrix": []}
|
| 434 |
|
| 435 |
def test_custom_metric_keys(self):
|
| 436 |
-
from picarones.
|
| 437 |
data = [{"a": 1.0, "b": 2.0, "c": 3.0}] * 5
|
| 438 |
r = compute_correlation_matrix(data, metric_keys=["a", "b"])
|
| 439 |
assert r["labels"] == ["a", "b"]
|
| 440 |
|
| 441 |
def test_values_in_range(self):
|
| 442 |
-
from picarones.
|
| 443 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 444 |
for row in r["matrix"]:
|
| 445 |
for v in row:
|
|
|
|
| 53 |
|
| 54 |
class TestBootstrapCI:
|
| 55 |
def test_returns_tuple_of_two(self):
|
| 56 |
+
from picarones.evaluation.statistics import bootstrap_ci
|
| 57 |
result = bootstrap_ci([0.1, 0.2, 0.3])
|
| 58 |
assert isinstance(result, tuple) and len(result) == 2
|
| 59 |
|
| 60 |
def test_lower_le_upper(self):
|
| 61 |
+
from picarones.evaluation.statistics import bootstrap_ci
|
| 62 |
lo, hi = bootstrap_ci([0.1, 0.2, 0.3, 0.4, 0.5])
|
| 63 |
assert lo <= hi
|
| 64 |
|
| 65 |
def test_ci_contains_mean(self):
|
| 66 |
+
from picarones.evaluation.statistics import bootstrap_ci
|
| 67 |
values = [0.1, 0.15, 0.2, 0.12, 0.18, 0.13, 0.17]
|
| 68 |
lo, hi = bootstrap_ci(values)
|
| 69 |
mean = sum(values) / len(values)
|
| 70 |
assert lo <= mean <= hi
|
| 71 |
|
| 72 |
def test_empty_returns_zeros(self):
|
| 73 |
+
from picarones.evaluation.statistics import bootstrap_ci
|
| 74 |
lo, hi = bootstrap_ci([])
|
| 75 |
assert lo == 0.0 and hi == 0.0
|
| 76 |
|
| 77 |
def test_single_value(self):
|
| 78 |
+
from picarones.evaluation.statistics import bootstrap_ci
|
| 79 |
lo, hi = bootstrap_ci([0.25])
|
| 80 |
assert lo <= 0.25 <= hi
|
| 81 |
|
| 82 |
def test_reproducible_with_seed(self):
|
| 83 |
+
from picarones.evaluation.statistics import bootstrap_ci
|
| 84 |
vals = [0.1, 0.2, 0.3, 0.15, 0.25]
|
| 85 |
r1 = bootstrap_ci(vals, seed=1)
|
| 86 |
r2 = bootstrap_ci(vals, seed=1)
|
| 87 |
assert r1 == r2
|
| 88 |
|
| 89 |
def test_wider_with_more_variance(self):
|
| 90 |
+
from picarones.evaluation.statistics import bootstrap_ci
|
| 91 |
narrow = [0.10, 0.11, 0.10, 0.11, 0.10]
|
| 92 |
wide = [0.01, 0.50, 0.02, 0.49, 0.01]
|
| 93 |
lo_n, hi_n = bootstrap_ci(narrow, n_iter=500)
|
|
|
|
| 101 |
|
| 102 |
class TestWilcoxonTest:
|
| 103 |
def test_returns_dict_with_keys(self):
|
| 104 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 105 |
r = wilcoxon_test([0.1]*5, [0.1]*5)
|
| 106 |
assert "statistic" in r
|
| 107 |
assert "p_value" in r
|
|
|
|
| 109 |
assert "interpretation" in r
|
| 110 |
|
| 111 |
def test_identical_series_not_significant(self):
|
| 112 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 113 |
vals = [0.1, 0.2, 0.3, 0.15, 0.05]
|
| 114 |
r = wilcoxon_test(vals, vals)
|
| 115 |
assert not r["significant"]
|
| 116 |
|
| 117 |
def test_clearly_different_series_significant(self):
|
| 118 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 119 |
a = [0.01]*12
|
| 120 |
b = [0.80]*12
|
| 121 |
r = wilcoxon_test(a, b)
|
|
|
|
| 123 |
assert r["p_value"] < 0.05
|
| 124 |
|
| 125 |
def test_p_value_in_range(self):
|
| 126 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 127 |
a = [0.1, 0.15, 0.2, 0.08]
|
| 128 |
b = [0.2, 0.25, 0.3, 0.18]
|
| 129 |
r = wilcoxon_test(a, b)
|
| 130 |
assert 0.0 <= r["p_value"] <= 1.0
|
| 131 |
|
| 132 |
def test_interpretation_is_string(self):
|
| 133 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 134 |
r = wilcoxon_test([0.1, 0.2], [0.1, 0.2])
|
| 135 |
assert isinstance(r["interpretation"], str) and len(r["interpretation"]) > 10
|
| 136 |
|
| 137 |
def test_n_pairs_correct(self):
|
| 138 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 139 |
r = wilcoxon_test([0.1, 0.2, 0.3], [0.1, 0.2, 0.3])
|
| 140 |
# tous les diffs = 0, filtrés en mode wilcox
|
| 141 |
assert r["n_pairs"] == 0
|
| 142 |
|
| 143 |
def test_mismatched_lengths_raises(self):
|
| 144 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 145 |
with pytest.raises(ValueError):
|
| 146 |
wilcoxon_test([0.1, 0.2], [0.1])
|
| 147 |
|
| 148 |
def test_w_plus_w_minus_present(self):
|
| 149 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 150 |
a = [0.1, 0.2, 0.3, 0.15, 0.25, 0.18, 0.12, 0.22, 0.08, 0.27]
|
| 151 |
b = [0.2, 0.3, 0.4, 0.25, 0.35, 0.28, 0.22, 0.32, 0.18, 0.37]
|
| 152 |
r = wilcoxon_test(a, b)
|
| 153 |
assert "W_plus" in r and "W_minus" in r
|
| 154 |
|
| 155 |
def test_significant_larger_sample(self):
|
| 156 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 157 |
import random
|
| 158 |
rng = random.Random(0)
|
| 159 |
a = [rng.uniform(0.0, 0.05) for _ in range(15)]
|
|
|
|
| 162 |
assert r["significant"]
|
| 163 |
|
| 164 |
def test_symmetry(self):
|
| 165 |
+
from picarones.evaluation.statistics import wilcoxon_test
|
| 166 |
a = [0.1, 0.2, 0.3, 0.15, 0.25, 0.18, 0.22, 0.08, 0.27, 0.14]
|
| 167 |
b = [0.2, 0.3, 0.4, 0.25, 0.35, 0.28, 0.32, 0.18, 0.37, 0.24]
|
| 168 |
r_ab = wilcoxon_test(a, b)
|
|
|
|
| 177 |
|
| 178 |
class TestPairwiseStats:
|
| 179 |
def test_returns_list(self):
|
| 180 |
+
from picarones.evaluation.statistics import compute_pairwise_stats
|
| 181 |
r = compute_pairwise_stats({"A": [0.1, 0.2], "B": [0.3, 0.4]})
|
| 182 |
assert isinstance(r, list)
|
| 183 |
|
| 184 |
def test_correct_pair_count_2_engines(self):
|
| 185 |
+
from picarones.evaluation.statistics import compute_pairwise_stats
|
| 186 |
r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
|
| 187 |
assert len(r) == 1
|
| 188 |
|
| 189 |
def test_correct_pair_count_3_engines(self):
|
| 190 |
+
from picarones.evaluation.statistics import compute_pairwise_stats
|
| 191 |
r = compute_pairwise_stats({
|
| 192 |
"A": [0.1]*5, "B": [0.2]*5, "C": [0.3]*5
|
| 193 |
})
|
| 194 |
assert len(r) == 3
|
| 195 |
|
| 196 |
def test_pair_has_engine_names(self):
|
| 197 |
+
from picarones.evaluation.statistics import compute_pairwise_stats
|
| 198 |
r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
|
| 199 |
assert r[0]["engine_a"] in ["A", "B"]
|
| 200 |
assert r[0]["engine_b"] in ["A", "B"]
|
| 201 |
|
| 202 |
def test_pair_has_p_value(self):
|
| 203 |
+
from picarones.evaluation.statistics import compute_pairwise_stats
|
| 204 |
r = compute_pairwise_stats({"A": [0.1]*5, "B": [0.2]*5})
|
| 205 |
assert "p_value" in r[0]
|
| 206 |
|
| 207 |
def test_single_engine_returns_empty(self):
|
| 208 |
+
from picarones.evaluation.statistics import compute_pairwise_stats
|
| 209 |
r = compute_pairwise_stats({"A": [0.1]*5})
|
| 210 |
assert r == []
|
| 211 |
|
|
|
|
| 216 |
|
| 217 |
class TestReliabilityCurve:
|
| 218 |
def test_returns_list(self):
|
| 219 |
+
from picarones.evaluation.statistics import compute_reliability_curve
|
| 220 |
r = compute_reliability_curve([0.1, 0.2, 0.3])
|
| 221 |
assert isinstance(r, list)
|
| 222 |
|
| 223 |
def test_correct_number_of_steps(self):
|
| 224 |
+
from picarones.evaluation.statistics import compute_reliability_curve
|
| 225 |
r = compute_reliability_curve([0.1]*10, steps=5)
|
| 226 |
assert len(r) == 5
|
| 227 |
|
| 228 |
def test_pct_docs_increases(self):
|
| 229 |
+
from picarones.evaluation.statistics import compute_reliability_curve
|
| 230 |
r = compute_reliability_curve([0.1, 0.2, 0.3, 0.4, 0.5], steps=5)
|
| 231 |
pcts = [p["pct_docs"] for p in r]
|
| 232 |
assert pcts == sorted(pcts)
|
| 233 |
|
| 234 |
def test_mean_cer_increases(self):
|
| 235 |
+
from picarones.evaluation.statistics import compute_reliability_curve
|
| 236 |
r = compute_reliability_curve([0.05, 0.10, 0.20, 0.30, 0.50], steps=5)
|
| 237 |
cers = [p["mean_cer"] for p in r]
|
| 238 |
assert cers[0] <= cers[-1]
|
| 239 |
|
| 240 |
def test_empty_returns_empty(self):
|
| 241 |
+
from picarones.evaluation.statistics import compute_reliability_curve
|
| 242 |
assert compute_reliability_curve([]) == []
|
| 243 |
|
| 244 |
def test_last_point_includes_all(self):
|
| 245 |
+
from picarones.evaluation.statistics import compute_reliability_curve
|
| 246 |
vals = [0.1, 0.2, 0.3]
|
| 247 |
r = compute_reliability_curve(vals, steps=4)
|
| 248 |
last = r[-1]
|
|
|
|
| 250 |
assert last["mean_cer"] == pytest.approx(expected, rel=1e-4)
|
| 251 |
|
| 252 |
def test_each_point_has_required_keys(self):
|
| 253 |
+
from picarones.evaluation.statistics import compute_reliability_curve
|
| 254 |
r = compute_reliability_curve([0.1, 0.2, 0.3], steps=3)
|
| 255 |
for p in r:
|
| 256 |
assert "pct_docs" in p and "mean_cer" in p
|
|
|
|
| 262 |
|
| 263 |
class TestVennData:
|
| 264 |
def test_venn2_type(self):
|
| 265 |
+
from picarones.evaluation.statistics import compute_venn_data
|
| 266 |
r = compute_venn_data({"A": {"e1","e2"}, "B": {"e2","e3"}})
|
| 267 |
assert r["type"] == "venn2"
|
| 268 |
|
| 269 |
def test_venn3_type(self):
|
| 270 |
+
from picarones.evaluation.statistics import compute_venn_data
|
| 271 |
r = compute_venn_data({"A": {"e1"}, "B": {"e2"}, "C": {"e3"}})
|
| 272 |
assert r["type"] == "venn3"
|
| 273 |
|
| 274 |
def test_venn2_counts_correct(self):
|
| 275 |
+
from picarones.evaluation.statistics import compute_venn_data
|
| 276 |
r = compute_venn_data({"A": {"e1","e2","e3"}, "B": {"e2","e3","e4"}})
|
| 277 |
assert r["only_a"] == 1
|
| 278 |
assert r["only_b"] == 1
|
| 279 |
assert r["both"] == 2
|
| 280 |
|
| 281 |
def test_venn2_disjoint(self):
|
| 282 |
+
from picarones.evaluation.statistics import compute_venn_data
|
| 283 |
r = compute_venn_data({"A": {"e1"}, "B": {"e2"}})
|
| 284 |
assert r["both"] == 0
|
| 285 |
assert r["only_a"] == 1
|
| 286 |
assert r["only_b"] == 1
|
| 287 |
|
| 288 |
def test_venn2_subset(self):
|
| 289 |
+
from picarones.evaluation.statistics import compute_venn_data
|
| 290 |
r = compute_venn_data({"A": {"e1","e2"}, "B": {"e1","e2","e3"}})
|
| 291 |
assert r["only_a"] == 0
|
| 292 |
|
| 293 |
def test_venn3_abc_count(self):
|
| 294 |
+
from picarones.evaluation.statistics import compute_venn_data
|
| 295 |
shared = {"e1","e2"}
|
| 296 |
r = compute_venn_data({"A": shared, "B": shared, "C": shared})
|
| 297 |
assert r["abc"] == 2
|
| 298 |
|
| 299 |
def test_empty_returns_empty(self):
|
| 300 |
+
from picarones.evaluation.statistics import compute_venn_data
|
| 301 |
r = compute_venn_data({})
|
| 302 |
assert r == {}
|
| 303 |
|
| 304 |
def test_labels_present(self):
|
| 305 |
+
from picarones.evaluation.statistics import compute_venn_data
|
| 306 |
r = compute_venn_data({"moteur_a": {"e1"}, "moteur_b": {"e2"}})
|
| 307 |
assert r["label_a"] == "moteur_a"
|
| 308 |
assert r["label_b"] == "moteur_b"
|
|
|
|
| 324 |
]
|
| 325 |
|
| 326 |
def test_returns_list(self):
|
| 327 |
+
from picarones.evaluation.statistics import cluster_errors
|
| 328 |
result = cluster_errors(self._sample_data())
|
| 329 |
assert isinstance(result, list)
|
| 330 |
|
| 331 |
def test_max_clusters_respected(self):
|
| 332 |
+
from picarones.evaluation.statistics import cluster_errors
|
| 333 |
result = cluster_errors(self._sample_data(), max_clusters=3)
|
| 334 |
assert len(result) <= 3
|
| 335 |
|
| 336 |
def test_cluster_has_required_keys(self):
|
| 337 |
+
from picarones.evaluation.statistics import cluster_errors
|
| 338 |
result = cluster_errors(self._sample_data())
|
| 339 |
if result:
|
| 340 |
c = result[0]
|
|
|
|
| 344 |
assert hasattr(c, "examples")
|
| 345 |
|
| 346 |
def test_as_dict_method(self):
|
| 347 |
+
from picarones.evaluation.statistics import cluster_errors
|
| 348 |
result = cluster_errors(self._sample_data())
|
| 349 |
if result:
|
| 350 |
d = result[0].as_dict()
|
|
|
|
| 354 |
assert "examples" in d
|
| 355 |
|
| 356 |
def test_sorted_by_count_descending(self):
|
| 357 |
+
from picarones.evaluation.statistics import cluster_errors
|
| 358 |
result = cluster_errors(self._sample_data())
|
| 359 |
if len(result) >= 2:
|
| 360 |
assert result[0].count >= result[1].count
|
| 361 |
|
| 362 |
def test_examples_capped_at_5(self):
|
| 363 |
+
from picarones.evaluation.statistics import cluster_errors
|
| 364 |
result = cluster_errors(self._sample_data())
|
| 365 |
for c in result:
|
| 366 |
assert len(c.as_dict()["examples"]) <= 5
|
| 367 |
|
| 368 |
def test_empty_data_returns_empty(self):
|
| 369 |
+
from picarones.evaluation.statistics import cluster_errors
|
| 370 |
result = cluster_errors([])
|
| 371 |
assert result == []
|
| 372 |
|
| 373 |
def test_cluster_id_unique(self):
|
| 374 |
+
from picarones.evaluation.statistics import cluster_errors
|
| 375 |
result = cluster_errors(self._sample_data())
|
| 376 |
ids = [c.cluster_id for c in result]
|
| 377 |
assert len(ids) == len(set(ids))
|
|
|
|
| 392 |
]
|
| 393 |
|
| 394 |
def test_returns_dict_with_labels_and_matrix(self):
|
| 395 |
+
from picarones.evaluation.statistics import compute_correlation_matrix
|
| 396 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 397 |
assert "labels" in r and "matrix" in r
|
| 398 |
|
| 399 |
def test_matrix_is_square(self):
|
| 400 |
+
from picarones.evaluation.statistics import compute_correlation_matrix
|
| 401 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 402 |
n = len(r["labels"])
|
| 403 |
assert len(r["matrix"]) == n
|
|
|
|
| 405 |
assert len(row) == n
|
| 406 |
|
| 407 |
def test_diagonal_is_one(self):
|
| 408 |
+
from picarones.evaluation.statistics import compute_correlation_matrix
|
| 409 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 410 |
for i in range(len(r["labels"])):
|
| 411 |
assert r["matrix"][i][i] == pytest.approx(1.0)
|
| 412 |
|
| 413 |
def test_cer_quality_negatively_correlated(self):
|
| 414 |
+
from picarones.evaluation.statistics import compute_correlation_matrix
|
| 415 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 416 |
labels = r["labels"]
|
| 417 |
if "cer" in labels and "quality_score" in labels:
|
|
|
|
| 420 |
assert r["matrix"][i][j] < 0 # plus la qualité est bonne, plus le CER est bas
|
| 421 |
|
| 422 |
def test_symmetric_matrix(self):
|
| 423 |
+
from picarones.evaluation.statistics import compute_correlation_matrix
|
| 424 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 425 |
n = len(r["labels"])
|
| 426 |
for i in range(n):
|
|
|
|
| 428 |
assert r["matrix"][i][j] == pytest.approx(r["matrix"][j][i], abs=1e-6)
|
| 429 |
|
| 430 |
def test_empty_returns_empty(self):
|
| 431 |
+
from picarones.evaluation.statistics import compute_correlation_matrix
|
| 432 |
r = compute_correlation_matrix([])
|
| 433 |
assert r == {"labels": [], "matrix": []}
|
| 434 |
|
| 435 |
def test_custom_metric_keys(self):
|
| 436 |
+
from picarones.evaluation.statistics import compute_correlation_matrix
|
| 437 |
data = [{"a": 1.0, "b": 2.0, "c": 3.0}] * 5
|
| 438 |
r = compute_correlation_matrix(data, metric_keys=["a", "b"])
|
| 439 |
assert r["labels"] == ["a", "b"]
|
| 440 |
|
| 441 |
def test_values_in_range(self):
|
| 442 |
+
from picarones.evaluation.statistics import compute_correlation_matrix
|
| 443 |
r = compute_correlation_matrix(self._sample_metrics())
|
| 444 |
for row in r["matrix"]:
|
| 445 |
for v in row:
|
|
@@ -22,7 +22,7 @@ from picarones.measurements.numerical_sequences_hooks import (
|
|
| 22 |
aggregate_numerical_sequence_metrics,
|
| 23 |
compute_numerical_sequence_metrics_adaptive,
|
| 24 |
)
|
| 25 |
-
from picarones.
|
| 26 |
from picarones.evaluation.benchmark_result import DocumentResult, EngineReport
|
| 27 |
|
| 28 |
|
|
|
|
| 22 |
aggregate_numerical_sequence_metrics,
|
| 23 |
compute_numerical_sequence_metrics_adaptive,
|
| 24 |
)
|
| 25 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 26 |
from picarones.evaluation.benchmark_result import DocumentResult, EngineReport
|
| 27 |
|
| 28 |
|
|
@@ -16,7 +16,7 @@ from __future__ import annotations
|
|
| 16 |
import json
|
| 17 |
from pathlib import Path
|
| 18 |
|
| 19 |
-
from picarones.
|
| 20 |
from picarones.measurements.readability_hooks import (
|
| 21 |
aggregate_readability_metrics,
|
| 22 |
compute_readability_metrics,
|
|
|
|
| 16 |
import json
|
| 17 |
from pathlib import Path
|
| 18 |
|
| 19 |
+
from picarones.evaluation.metric_result import MetricsResult
|
| 20 |
from picarones.measurements.readability_hooks import (
|
| 21 |
aggregate_readability_metrics,
|
| 22 |
compute_readability_metrics,
|
|
@@ -57,13 +57,13 @@ def client():
|
|
| 57 |
|
| 58 |
@pytest.fixture
|
| 59 |
def htr_catalogue():
|
| 60 |
-
from picarones.
|
| 61 |
return HTRUnitedCatalogue.from_demo()
|
| 62 |
|
| 63 |
|
| 64 |
@pytest.fixture
|
| 65 |
def hf_importer():
|
| 66 |
-
from picarones.
|
| 67 |
return HuggingFaceImporter()
|
| 68 |
|
| 69 |
|
|
@@ -74,7 +74,7 @@ def hf_importer():
|
|
| 74 |
class TestHTRUnitedEntry:
|
| 75 |
|
| 76 |
def test_from_dict_basic(self):
|
| 77 |
-
from picarones.
|
| 78 |
d = {
|
| 79 |
"id": "test-corpus", "title": "Test Corpus", "url": "https://github.com/test/corpus",
|
| 80 |
"language": ["French"], "script": ["Gothic"], "century": [14, 15],
|
|
@@ -88,7 +88,7 @@ class TestHTRUnitedEntry:
|
|
| 88 |
assert e.lines == 5000
|
| 89 |
|
| 90 |
def test_as_dict_roundtrip(self):
|
| 91 |
-
from picarones.
|
| 92 |
d = {
|
| 93 |
"id": "rtrip", "title": "Round Trip", "url": "https://github.com/a/b",
|
| 94 |
"language": ["Latin"], "script": ["Caroline"], "century": [9],
|
|
@@ -102,19 +102,19 @@ class TestHTRUnitedEntry:
|
|
| 102 |
assert out["format"] == "PAGE"
|
| 103 |
|
| 104 |
def test_century_str_roman(self):
|
| 105 |
-
from picarones.
|
| 106 |
e = HTRUnitedEntry(id="x", title="x", url="x", century=[12, 14])
|
| 107 |
cs = e.century_str
|
| 108 |
assert "XIIe" in cs
|
| 109 |
assert "XIVe" in cs
|
| 110 |
|
| 111 |
def test_century_str_single(self):
|
| 112 |
-
from picarones.
|
| 113 |
e = HTRUnitedEntry(id="x", title="x", url="x", century=[19])
|
| 114 |
assert "XIXe" in e.century_str
|
| 115 |
|
| 116 |
def test_default_fields(self):
|
| 117 |
-
from picarones.
|
| 118 |
e = HTRUnitedEntry(id="minimal", title="Min", url="http://x")
|
| 119 |
assert e.language == []
|
| 120 |
assert e.lines == 0
|
|
@@ -122,14 +122,14 @@ class TestHTRUnitedEntry:
|
|
| 122 |
assert e.tags == []
|
| 123 |
|
| 124 |
def test_from_dict_missing_fields(self):
|
| 125 |
-
from picarones.
|
| 126 |
e = HTRUnitedEntry.from_dict({"id": "sparse", "title": "Sparse"})
|
| 127 |
assert e.id == "sparse"
|
| 128 |
assert e.institution == ""
|
| 129 |
assert e.lines == 0
|
| 130 |
|
| 131 |
def test_as_dict_has_all_keys(self):
|
| 132 |
-
from picarones.
|
| 133 |
e = HTRUnitedEntry(id="k", title="K", url="http://k")
|
| 134 |
d = e.as_dict()
|
| 135 |
for key in ["id", "title", "url", "language", "script", "century",
|
|
@@ -137,7 +137,7 @@ class TestHTRUnitedEntry:
|
|
| 137 |
assert key in d, f"Missing key: {key}"
|
| 138 |
|
| 139 |
def test_url_preserved(self):
|
| 140 |
-
from picarones.
|
| 141 |
url = "https://github.com/HTR-United/cremma-medieval"
|
| 142 |
e = HTRUnitedEntry(id="c", title="CREMMA", url=url)
|
| 143 |
assert e.url == url
|
|
@@ -250,14 +250,14 @@ class TestHTRUnitedImport:
|
|
| 250 |
"""
|
| 251 |
|
| 252 |
def test_import_creates_meta_file(self, tmp_path, htr_catalogue):
|
| 253 |
-
from picarones.
|
| 254 |
entry = htr_catalogue.entries[0]
|
| 255 |
result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
|
| 256 |
meta_file = Path(result["metadata_file"])
|
| 257 |
assert meta_file.exists()
|
| 258 |
|
| 259 |
def test_import_meta_content(self, tmp_path, htr_catalogue):
|
| 260 |
-
from picarones.
|
| 261 |
entry = htr_catalogue.entries[0]
|
| 262 |
result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
|
| 263 |
meta = json.loads(Path(result["metadata_file"]).read_text())
|
|
@@ -265,14 +265,14 @@ class TestHTRUnitedImport:
|
|
| 265 |
assert meta["entry_id"] == entry.id
|
| 266 |
|
| 267 |
def test_import_returns_dict_keys(self, tmp_path, htr_catalogue):
|
| 268 |
-
from picarones.
|
| 269 |
entry = htr_catalogue.entries[0]
|
| 270 |
result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
|
| 271 |
for k in ["entry_id", "title", "output_dir", "files_imported", "metadata_file"]:
|
| 272 |
assert k in result, f"Missing key: {k}"
|
| 273 |
|
| 274 |
def test_import_creates_output_dir(self, tmp_path, htr_catalogue):
|
| 275 |
-
from picarones.
|
| 276 |
entry = htr_catalogue.entries[0]
|
| 277 |
new_dir = tmp_path / "new_subdir" / "corpus"
|
| 278 |
import_htr_united_corpus(entry, new_dir, max_samples=5)
|
|
@@ -286,7 +286,7 @@ class TestHTRUnitedImport:
|
|
| 286 |
class TestHuggingFaceDataset:
|
| 287 |
|
| 288 |
def test_from_dict_basic(self):
|
| 289 |
-
from picarones.
|
| 290 |
d = {
|
| 291 |
"dataset_id": "test/dataset", "title": "Test Dataset",
|
| 292 |
"description": "A test dataset.", "language": ["French"],
|
|
@@ -299,7 +299,7 @@ class TestHuggingFaceDataset:
|
|
| 299 |
assert ds.downloads == 500
|
| 300 |
|
| 301 |
def test_as_dict_roundtrip(self):
|
| 302 |
-
from picarones.
|
| 303 |
ds = HuggingFaceDataset(
|
| 304 |
dataset_id="a/b", title="AB", description="desc",
|
| 305 |
language=["Latin"], tags=["htr"],
|
|
@@ -309,12 +309,12 @@ class TestHuggingFaceDataset:
|
|
| 309 |
assert d["language"] == ["Latin"]
|
| 310 |
|
| 311 |
def test_hf_url(self):
|
| 312 |
-
from picarones.
|
| 313 |
ds = HuggingFaceDataset(dataset_id="CATMuS/medieval", title="CATMuS")
|
| 314 |
assert ds.hf_url == "https://huggingface.co/datasets/CATMuS/medieval"
|
| 315 |
|
| 316 |
def test_as_dict_has_all_keys(self):
|
| 317 |
-
from picarones.
|
| 318 |
ds = HuggingFaceDataset(dataset_id="x/y", title="XY")
|
| 319 |
d = ds.as_dict()
|
| 320 |
for k in ["dataset_id", "title", "description", "language", "tags",
|
|
@@ -322,17 +322,17 @@ class TestHuggingFaceDataset:
|
|
| 322 |
assert k in d, f"Missing: {k}"
|
| 323 |
|
| 324 |
def test_default_source(self):
|
| 325 |
-
from picarones.
|
| 326 |
ds = HuggingFaceDataset(dataset_id="x/y", title="XY")
|
| 327 |
assert ds.source == "reference"
|
| 328 |
|
| 329 |
def test_from_dict_uses_id_as_fallback_title(self):
|
| 330 |
-
from picarones.
|
| 331 |
ds = HuggingFaceDataset.from_dict({"dataset_id": "owner/repo"})
|
| 332 |
assert ds.title == "owner/repo"
|
| 333 |
|
| 334 |
def test_replace_source_helper(self):
|
| 335 |
-
from picarones.
|
| 336 |
ds = HuggingFaceDataset(dataset_id="x/y", title="XY", source="reference")
|
| 337 |
ds2 = ds._replace_source("api")
|
| 338 |
assert ds2.source == "api"
|
|
@@ -399,23 +399,23 @@ class TestHuggingFaceImporter:
|
|
| 399 |
class TestHuggingFaceReferenceData:
|
| 400 |
|
| 401 |
def test_reference_datasets_loaded(self):
|
| 402 |
-
from picarones.
|
| 403 |
assert len(_REFERENCE_DATASETS) >= 5
|
| 404 |
|
| 405 |
def test_catmus_present(self):
|
| 406 |
-
from picarones.
|
| 407 |
ids = [d["dataset_id"] for d in _REFERENCE_DATASETS]
|
| 408 |
assert any("CATMuS" in did or "catmus" in did.lower() for did in ids)
|
| 409 |
|
| 410 |
def test_all_have_required_fields(self):
|
| 411 |
-
from picarones.
|
| 412 |
for d in _REFERENCE_DATASETS:
|
| 413 |
assert "dataset_id" in d
|
| 414 |
assert "title" in d
|
| 415 |
assert "language" in d
|
| 416 |
|
| 417 |
def test_all_are_image_to_text(self):
|
| 418 |
-
from picarones.
|
| 419 |
for d in _REFERENCE_DATASETS:
|
| 420 |
assert d.get("task", "image-to-text") == "image-to-text"
|
| 421 |
|
|
|
|
| 57 |
|
| 58 |
@pytest.fixture
|
| 59 |
def htr_catalogue():
|
| 60 |
+
from picarones.adapters.corpus.htr_united import HTRUnitedCatalogue
|
| 61 |
return HTRUnitedCatalogue.from_demo()
|
| 62 |
|
| 63 |
|
| 64 |
@pytest.fixture
|
| 65 |
def hf_importer():
|
| 66 |
+
from picarones.adapters.corpus.huggingface import HuggingFaceImporter
|
| 67 |
return HuggingFaceImporter()
|
| 68 |
|
| 69 |
|
|
|
|
| 74 |
class TestHTRUnitedEntry:
|
| 75 |
|
| 76 |
def test_from_dict_basic(self):
|
| 77 |
+
from picarones.adapters.corpus.htr_united import HTRUnitedEntry
|
| 78 |
d = {
|
| 79 |
"id": "test-corpus", "title": "Test Corpus", "url": "https://github.com/test/corpus",
|
| 80 |
"language": ["French"], "script": ["Gothic"], "century": [14, 15],
|
|
|
|
| 88 |
assert e.lines == 5000
|
| 89 |
|
| 90 |
def test_as_dict_roundtrip(self):
|
| 91 |
+
from picarones.adapters.corpus.htr_united import HTRUnitedEntry
|
| 92 |
d = {
|
| 93 |
"id": "rtrip", "title": "Round Trip", "url": "https://github.com/a/b",
|
| 94 |
"language": ["Latin"], "script": ["Caroline"], "century": [9],
|
|
|
|
| 102 |
assert out["format"] == "PAGE"
|
| 103 |
|
| 104 |
def test_century_str_roman(self):
|
| 105 |
+
from picarones.adapters.corpus.htr_united import HTRUnitedEntry
|
| 106 |
e = HTRUnitedEntry(id="x", title="x", url="x", century=[12, 14])
|
| 107 |
cs = e.century_str
|
| 108 |
assert "XIIe" in cs
|
| 109 |
assert "XIVe" in cs
|
| 110 |
|
| 111 |
def test_century_str_single(self):
|
| 112 |
+
from picarones.adapters.corpus.htr_united import HTRUnitedEntry
|
| 113 |
e = HTRUnitedEntry(id="x", title="x", url="x", century=[19])
|
| 114 |
assert "XIXe" in e.century_str
|
| 115 |
|
| 116 |
def test_default_fields(self):
|
| 117 |
+
from picarones.adapters.corpus.htr_united import HTRUnitedEntry
|
| 118 |
e = HTRUnitedEntry(id="minimal", title="Min", url="http://x")
|
| 119 |
assert e.language == []
|
| 120 |
assert e.lines == 0
|
|
|
|
| 122 |
assert e.tags == []
|
| 123 |
|
| 124 |
def test_from_dict_missing_fields(self):
|
| 125 |
+
from picarones.adapters.corpus.htr_united import HTRUnitedEntry
|
| 126 |
e = HTRUnitedEntry.from_dict({"id": "sparse", "title": "Sparse"})
|
| 127 |
assert e.id == "sparse"
|
| 128 |
assert e.institution == ""
|
| 129 |
assert e.lines == 0
|
| 130 |
|
| 131 |
def test_as_dict_has_all_keys(self):
|
| 132 |
+
from picarones.adapters.corpus.htr_united import HTRUnitedEntry
|
| 133 |
e = HTRUnitedEntry(id="k", title="K", url="http://k")
|
| 134 |
d = e.as_dict()
|
| 135 |
for key in ["id", "title", "url", "language", "script", "century",
|
|
|
|
| 137 |
assert key in d, f"Missing key: {key}"
|
| 138 |
|
| 139 |
def test_url_preserved(self):
|
| 140 |
+
from picarones.adapters.corpus.htr_united import HTRUnitedEntry
|
| 141 |
url = "https://github.com/HTR-United/cremma-medieval"
|
| 142 |
e = HTRUnitedEntry(id="c", title="CREMMA", url=url)
|
| 143 |
assert e.url == url
|
|
|
|
| 250 |
"""
|
| 251 |
|
| 252 |
def test_import_creates_meta_file(self, tmp_path, htr_catalogue):
|
| 253 |
+
from picarones.adapters.corpus.htr_united import import_htr_united_corpus
|
| 254 |
entry = htr_catalogue.entries[0]
|
| 255 |
result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
|
| 256 |
meta_file = Path(result["metadata_file"])
|
| 257 |
assert meta_file.exists()
|
| 258 |
|
| 259 |
def test_import_meta_content(self, tmp_path, htr_catalogue):
|
| 260 |
+
from picarones.adapters.corpus.htr_united import import_htr_united_corpus
|
| 261 |
entry = htr_catalogue.entries[0]
|
| 262 |
result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
|
| 263 |
meta = json.loads(Path(result["metadata_file"]).read_text())
|
|
|
|
| 265 |
assert meta["entry_id"] == entry.id
|
| 266 |
|
| 267 |
def test_import_returns_dict_keys(self, tmp_path, htr_catalogue):
|
| 268 |
+
from picarones.adapters.corpus.htr_united import import_htr_united_corpus
|
| 269 |
entry = htr_catalogue.entries[0]
|
| 270 |
result = import_htr_united_corpus(entry, tmp_path, max_samples=5)
|
| 271 |
for k in ["entry_id", "title", "output_dir", "files_imported", "metadata_file"]:
|
| 272 |
assert k in result, f"Missing key: {k}"
|
| 273 |
|
| 274 |
def test_import_creates_output_dir(self, tmp_path, htr_catalogue):
|
| 275 |
+
from picarones.adapters.corpus.htr_united import import_htr_united_corpus
|
| 276 |
entry = htr_catalogue.entries[0]
|
| 277 |
new_dir = tmp_path / "new_subdir" / "corpus"
|
| 278 |
import_htr_united_corpus(entry, new_dir, max_samples=5)
|
|
|
|
| 286 |
class TestHuggingFaceDataset:
|
| 287 |
|
| 288 |
def test_from_dict_basic(self):
|
| 289 |
+
from picarones.adapters.corpus.huggingface import HuggingFaceDataset
|
| 290 |
d = {
|
| 291 |
"dataset_id": "test/dataset", "title": "Test Dataset",
|
| 292 |
"description": "A test dataset.", "language": ["French"],
|
|
|
|
| 299 |
assert ds.downloads == 500
|
| 300 |
|
| 301 |
def test_as_dict_roundtrip(self):
|
| 302 |
+
from picarones.adapters.corpus.huggingface import HuggingFaceDataset
|
| 303 |
ds = HuggingFaceDataset(
|
| 304 |
dataset_id="a/b", title="AB", description="desc",
|
| 305 |
language=["Latin"], tags=["htr"],
|
|
|
|
| 309 |
assert d["language"] == ["Latin"]
|
| 310 |
|
| 311 |
def test_hf_url(self):
|
| 312 |
+
from picarones.adapters.corpus.huggingface import HuggingFaceDataset
|
| 313 |
ds = HuggingFaceDataset(dataset_id="CATMuS/medieval", title="CATMuS")
|
| 314 |
assert ds.hf_url == "https://huggingface.co/datasets/CATMuS/medieval"
|
| 315 |
|
| 316 |
def test_as_dict_has_all_keys(self):
|
| 317 |
+
from picarones.adapters.corpus.huggingface import HuggingFaceDataset
|
| 318 |
ds = HuggingFaceDataset(dataset_id="x/y", title="XY")
|
| 319 |
d = ds.as_dict()
|
| 320 |
for k in ["dataset_id", "title", "description", "language", "tags",
|
|
|
|
| 322 |
assert k in d, f"Missing: {k}"
|
| 323 |
|
| 324 |
def test_default_source(self):
|
| 325 |
+
from picarones.adapters.corpus.huggingface import HuggingFaceDataset
|
| 326 |
ds = HuggingFaceDataset(dataset_id="x/y", title="XY")
|
| 327 |
assert ds.source == "reference"
|
| 328 |
|
| 329 |
def test_from_dict_uses_id_as_fallback_title(self):
|
| 330 |
+
from picarones.adapters.corpus.huggingface import HuggingFaceDataset
|
| 331 |
ds = HuggingFaceDataset.from_dict({"dataset_id": "owner/repo"})
|
| 332 |
assert ds.title == "owner/repo"
|
| 333 |
|
| 334 |
def test_replace_source_helper(self):
|
| 335 |
+
from picarones.adapters.corpus.huggingface import HuggingFaceDataset
|
| 336 |
ds = HuggingFaceDataset(dataset_id="x/y", title="XY", source="reference")
|
| 337 |
ds2 = ds._replace_source("api")
|
| 338 |
assert ds2.source == "api"
|
|
|
|
| 399 |
class TestHuggingFaceReferenceData:
|
| 400 |
|
| 401 |
def test_reference_datasets_loaded(self):
|
| 402 |
+
from picarones.adapters.corpus.huggingface import _REFERENCE_DATASETS
|
| 403 |
assert len(_REFERENCE_DATASETS) >= 5
|
| 404 |
|
| 405 |
def test_catmus_present(self):
|
| 406 |
+
from picarones.adapters.corpus.huggingface import _REFERENCE_DATASETS
|
| 407 |
ids = [d["dataset_id"] for d in _REFERENCE_DATASETS]
|
| 408 |
assert any("CATMuS" in did or "catmus" in did.lower() for did in ids)
|
| 409 |
|
| 410 |
def test_all_have_required_fields(self):
|
| 411 |
+
from picarones.adapters.corpus.huggingface import _REFERENCE_DATASETS
|
| 412 |
for d in _REFERENCE_DATASETS:
|
| 413 |
assert "dataset_id" in d
|
| 414 |
assert "title" in d
|
| 415 |
assert "language" in d
|
| 416 |
|
| 417 |
def test_all_are_image_to_text(self):
|
| 418 |
+
from picarones.adapters.corpus.huggingface import _REFERENCE_DATASETS
|
| 419 |
for d in _REFERENCE_DATASETS:
|
| 420 |
assert d.get("task", "image-to-text") == "image-to-text"
|
| 421 |
|