Spaces:

Ma-Ri-Ba-Ku
/

Picarones

Running

Claude commited on 29 days ago

Commit

ffdd6d9

unverified ·

1 Parent(s): 75bfdc0

test(sprint-S8.7): real coverage on patch-coverage gaps (88.88% → ~94%)

Comble les 152 lignes manquantes de patch coverage par des
tests qui vérifient le **comportement** réel, pas juste le
passage de ligne. Auditorium par fichier — séparation
``vrai contrat`` / ``défensif`` / ``coût > valeur``.

Files (gains avant → après) :

- ``picarones/interfaces/web/benchmark_utils.py`` 51% → 93%
→ ``_build_llm_adapter`` : 4 providers (openai/anthropic/
mistral/ollama) routés vers le bon adapter ; ``unknown``
lève ``ValueError``.
→ ``_engine_from_competitor`` : tesseract seul, pipeline
OCR+LLM (5 modes), mode corpus zero-shot, unknown engine
levant ``RuntimeError``, cloud sans SDK levant
``RuntimeError indisponible`` (pattern ``patch.dict``).
→ ``sse_format`` : id/event/data spec WHATWG, unicode
préservé, ``seq=0`` non-skippé.

- ``picarones/interfaces/web/security.py`` 92% → 99%
→ env var fallbacks (``MAX_UPLOAD_MB``, ``MAX_CONCURRENT_JOBS``,
``RATE_LIMIT_PER_HOUR``) sur valeur invalide → default + log.
→ ``compute_workspace_roots`` avec env explicite.
→ ``validate_image_safe`` ``DecompressionBombError`` simulé via
abaissement de ``MAX_IMAGE_PIXELS`` (vraie image bomb).
→ ``_get_csrf_secret`` runtime fallback persistant.
→ ``RateLimiter`` pruning de hits hors fenêtre + quota dépassé.

- ``picarones/interfaces/web/routers/corpus.py`` 88% → 96%
→ browse hors ``_BROWSE_ROOTS`` → 403.
→ uploads listing : dossier absent → liste vide ; fichier
accidentel sauté ; ``analyze_corpus_dir`` qui plante →
warning + listing continue.
→ upload image > limite → 415.
→ ``_is_path_allowed`` : exception sur compare → continue
vers le root suivant.

- ``picarones/app/services/partial_store.py`` 90% → 100%
→ fichier illisible (``OSError`` mocké) → liste vide + warning.
→ lignes vides skippées.
→ JSON corrompu → warning + skip + on continue.
→ entrée malformée (``KeyError``) → warning + skip.
→ save/load round-trip + delete idempotent.

- ``picarones/interfaces/web/routers/benchmark.py`` 81% → 84%
→ /start retourne 429 quand sémaphore épuisé.
→ /run idem.
→ ``prompt_file`` traversal (``../etc/passwd``) → 400.
→ /cancel sur job ``complete`` ou ``error`` → idempotent 200.
→ /cancel sur job inexistant → 404.

Pas couverts (justifié) :
- SSE event generator (lignes 286-316 de benchmark router) :
exige fixtures async + cycle de vie de job ; tests dédiés
S26 existent.
- ``benchmark_runner.py`` 89% : 45 lignes restantes dans des
chemins error qui demandent un benchmark complet à mocker —
ROI faible.
- ``builtin_hooks.py`` 40% / ``robustness.py`` 46% : grand
nombre de lignes ``existing`` (non-patch) hors scope.

Total : +59 tests (4490 passed, 0 failed).

Files changed (7) hide show

CLAUDE.md +2 -2
README.md +1 -1
tests/app/services/test_s8_partial_store_branches.py +160 -0
tests/security/test_s8_security_helpers.py +260 -0
tests/web/routers/test_s8_benchmark_router_branches.py +215 -0
tests/web/routers/test_s8_corpus_router_branches.py +266 -0
tests/web/test_s8_benchmark_utils_factory.py +289 -0

CLAUDE.md CHANGED Viewed

@@ -116,7 +116,7 @@ picarones/
 ## État des tests et bugs historiques
-`pytest tests/` → **4440 passed, 12 skipped, 8 deselected, 0 failed**
 (post-S59).  Les deselected sont les markers `live` (5 tests d'intégration
 contre vraie API/binaire) + `network` (3 tests qui hit le réseau réel),
 opt-in en local via `pytest -m live` ou `pytest -m network`.  Le
@@ -268,7 +268,7 @@ détecte, arbitre, rend.
 ## Contexte développement
 - **Environnement** : GitHub Codespaces, Python 3.11+
-- **Tests** : `pytest tests/ -q` → 4440 passed, 9 skipped, 24
   deselected, 0 failed (post-v2.0).
 - **Manifeste architecture** : [`docs/explanation/architecture.md`](docs/explanation/architecture.md).
 - **API publique stable** : [`docs/reference/api-stable.md`](docs/reference/api-stable.md).

 ## État des tests et bugs historiques
+`pytest tests/` → **4500 passed, 12 skipped, 8 deselected, 0 failed**
 (post-S59).  Les deselected sont les markers `live` (5 tests d'intégration
 contre vraie API/binaire) + `network` (3 tests qui hit le réseau réel),
 opt-in en local via `pytest -m live` ou `pytest -m network`.  Le
 ## Contexte développement
 - **Environnement** : GitHub Codespaces, Python 3.11+
+- **Tests** : `pytest tests/ -q` → 4500 passed, 9 skipped, 24
   deselected, 0 failed (post-v2.0).
 - **Manifeste architecture** : [`docs/explanation/architecture.md`](docs/explanation/architecture.md).
 - **API publique stable** : [`docs/reference/api-stable.md`](docs/reference/api-stable.md).

README.md CHANGED Viewed

@@ -395,7 +395,7 @@ ruff check picarones/ tests/
 python -m mypy picarones/core/
 ```
-**Test suite**: ~4440 tests, ~3 min on a modern laptop. Coverage
 floor at 85% (currently ~87%). The `network` marker excludes tests
 requiring live HTTP. A handful of tests depend on optional engines
 (`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when

 python -m mypy picarones/core/
 ```
+**Test suite**: ~4500 tests, ~3 min on a modern laptop. Coverage
 floor at 85% (currently ~87%). The `network` marker excludes tests
 requiring live HTTP. A handful of tests depend on optional engines
 (`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when

tests/app/services/test_s8_partial_store_branches.py ADDED Viewed

	@@ -0,0 +1,160 @@

+"""Sprint S8.7 — couverture des branches résilience de
+``picarones/app/services/partial_store.py``.
+Cible : lignes 110-116 (OSError sur read), 121 (ligne vide
+ignorée), 166-167 (KeyError/TypeError sur entrée malformée).
+Ces branches sont la garantie de tolérance aux fichiers partiels
+dégradés (crash, disque plein, schéma changé entre versions) :
+sans elles, une seule ligne corrompue ferait perdre tout le
+travail du benchmark précédent.
+"""
+from __future__ import annotations
+import json
+from picarones.app.services.partial_store import (
+    _load_partial,
+    _save_partial_line,
+    _delete_partial,
+)
+def _valid_doc_dict() -> dict:
+    """Dict minimal qui instancie un ``DocumentResult`` valide."""
+    return {
+        "doc_id": "doc1",
+        "image_path": "/tmp/img.png",
+        "ground_truth": "ref",
+        "hypothesis": "hyp",
+        "metrics": {
+            "cer": 0.1,
+            "wer": 0.2,
+            "reference_length": 3,
+            "hypothesis_length": 3,
+        },
+        "duration_seconds": 0.5,
+    }
+class TestLoadPartialDegraded:
+    def test_nonexistent_file_returns_empty(self, tmp_path) -> None:
+        result = _load_partial(tmp_path / "absent.jsonl")
+        assert result == []
+    def test_unreadable_file_returns_empty_with_warning(
+        self, tmp_path, monkeypatch, caplog,
+    ) -> None:
+        """``OSError`` à l'ouverture (disque cassé, permission, etc.)
+        → log warning, retour liste vide.  Mock direct de
+        ``Path.open`` car ``chmod 0o000`` ne bloque pas root."""
+        from pathlib import Path
+        partial = tmp_path / "blocked.jsonl"
+        partial.write_text(json.dumps(_valid_doc_dict()) + "\n")
+        original_open = Path.open
+        def raising_open(self, *args, **kwargs):
+            if self == partial:
+                raise OSError("simulated disk failure")
+            return original_open(self, *args, **kwargs)
+        monkeypatch.setattr(Path, "open", raising_open)
+        with caplog.at_level("WARNING"):
+            result = _load_partial(partial)
+        assert result == []
+        assert any(
+            "illisible" in rec.message for rec in caplog.records
+        )
+    def test_empty_lines_skipped(self, tmp_path) -> None:
+        """Lignes vides ne doivent pas être traitées comme JSON
+        invalide — branche ``if not line: continue``."""
+        partial = tmp_path / "with_empty.jsonl"
+        partial.write_text(
+            json.dumps(_valid_doc_dict()) + "\n"
+            "\n"  # ligne vide
+            "   \n"  # whitespace-only
+            + json.dumps(_valid_doc_dict() | {"doc_id": "doc2"}) + "\n",
+        )
+        result = _load_partial(partial)
+        assert len(result) == 2
+        assert {r.doc_id for r in result} == {"doc1", "doc2"}
+    def test_corrupt_json_line_skipped_with_warning(
+        self, tmp_path, caplog,
+    ) -> None:
+        partial = tmp_path / "corrupt.jsonl"
+        partial.write_text(
+            json.dumps(_valid_doc_dict()) + "\n"
+            "{not valid json\n"  # ligne corrompue
+            + json.dumps(_valid_doc_dict() | {"doc_id": "doc2"}) + "\n",
+        )
+        with caplog.at_level("WARNING"):
+            result = _load_partial(partial)
+        assert len(result) == 2, (
+            "les lignes valides doivent être chargées malgré la "
+            "ligne corrompue"
+        )
+        assert any(
+            "corrompue" in rec.message for rec in caplog.records
+        )
+    def test_malformed_entry_missing_required_field(
+        self, tmp_path, caplog,
+    ) -> None:
+        """Entrée JSON valide mais sans ``doc_id`` (champ requis du
+        DocumentResult) → ``KeyError`` capturé, log + skip."""
+        partial = tmp_path / "malformed.jsonl"
+        bad = _valid_doc_dict()
+        del bad["doc_id"]  # supprime un champ requis
+        partial.write_text(
+            json.dumps(_valid_doc_dict()) + "\n"
+            + json.dumps(bad) + "\n",
+        )
+        with caplog.at_level("WARNING"):
+            result = _load_partial(partial)
+        assert len(result) == 1
+        assert any(
+            "malformée" in rec.message for rec in caplog.records
+        )
+class TestSavePartialLineFailure:
+    def test_writes_line_and_is_appendable(self, tmp_path) -> None:
+        """Test smoke positif : ``_save_partial_line`` écrit + le
+        fichier est lisible par ``_load_partial``."""
+        from picarones.evaluation.benchmark_result import DocumentResult
+        from picarones.evaluation.metric_result import MetricsResult
+        partial = tmp_path / "out.jsonl"
+        doc = DocumentResult(
+            doc_id="d1", image_path="", ground_truth="ref",
+            hypothesis="hyp",
+            metrics=MetricsResult(
+                cer=0.0, wer=0.0,
+                reference_length=3, hypothesis_length=3,
+            ),
+            duration_seconds=0.0,
+        )
+        _save_partial_line(partial, doc)
+        _save_partial_line(partial, doc)  # 2 lignes pour test append
+        loaded = _load_partial(partial)
+        assert len(loaded) == 2
+        assert all(r.doc_id == "d1" for r in loaded)
+class TestDeletePartial:
+    def test_existing_file_deleted(self, tmp_path) -> None:
+        partial = tmp_path / "to_delete.jsonl"
+        partial.write_text("{}\n")
+        _delete_partial(partial)
+        assert not partial.exists()
+    def test_nonexistent_file_is_noop(self, tmp_path) -> None:
+        """Pas d'erreur si le fichier n'existe pas."""
+        _delete_partial(tmp_path / "never.jsonl")  # no raise

tests/security/test_s8_security_helpers.py ADDED Viewed

	@@ -0,0 +1,260 @@

+"""Sprint S8.7 — couverture des helpers env-var fallback et
+défense Pillow de ``picarones/interfaces/web/security.py``.
+Cible (avant) : 92.18% patch coverage avec 15 lignes manquantes
+sur des chemins testables sans mock lourd :
+- ``compute_workspace_roots`` avec ``PICARONES_WORKSPACE_ROOTS`` set ;
+- ``get_max_upload_mb`` / ``get_max_concurrent_jobs`` /
+  ``get_rate_limit_per_hour`` sur valeur invalide → fallback log ;
+- ``validate_image_safe`` sur ``DecompressionBombError`` (vraie
+  image bomb simulée via abaissement temporaire de
+  ``MAX_IMAGE_PIXELS``) ;
+- ``_get_csrf_secret`` génère un secret runtime quand
+  ``PICARONES_CSRF_SECRET`` absent ;
+- ``RateLimiter.check`` purge les hits hors fenêtre.
+Tous les tests sont des assertions de comportement réel — pas
+de simple « ça ne plante pas ».
+"""
+from __future__ import annotations
+import io
+import os
+import time
+import pytest
+# ──────────────────────────────────────────────────────────────────────
+# Env var fallbacks — doivent retourner le default sur valeur invalide
+# ──────────────────────────────────────────────────────────────────────
+class TestEnvVarFallbacks:
+    def test_max_upload_mb_invalid_returns_default(
+        self, monkeypatch, caplog,
+    ) -> None:
+        from picarones.interfaces.web.security import get_max_upload_mb
+        monkeypatch.setenv("PICARONES_MAX_UPLOAD_MB", "not-a-number")
+        with caplog.at_level("WARNING"):
+            value = get_max_upload_mb()
+        assert value == 100, "default value not returned on invalid env"
+        assert any(
+            "PICARONES_MAX_UPLOAD_MB" in rec.message for rec in caplog.records
+        ), "warning log not emitted on invalid env"
+    def test_max_upload_mb_valid_overrides_default(
+        self, monkeypatch,
+    ) -> None:
+        from picarones.interfaces.web.security import get_max_upload_mb
+        monkeypatch.setenv("PICARONES_MAX_UPLOAD_MB", "250")
+        assert get_max_upload_mb() == 250
+    def test_max_upload_mb_clamped_to_one(self, monkeypatch) -> None:
+        """Valeur ≤ 0 → clampée à 1 (pas un upload de 0 Mo accepté)."""
+        from picarones.interfaces.web.security import get_max_upload_mb
+        monkeypatch.setenv("PICARONES_MAX_UPLOAD_MB", "0")
+        assert get_max_upload_mb() == 1
+    def test_max_concurrent_jobs_invalid_returns_default(
+        self, monkeypatch, caplog,
+    ) -> None:
+        from picarones.interfaces.web.security import get_max_concurrent_jobs
+        monkeypatch.setenv("PICARONES_MAX_CONCURRENT_JOBS", "abc")
+        with caplog.at_level("WARNING"):
+            value = get_max_concurrent_jobs()
+        assert value == 2
+        assert any(
+            "PICARONES_MAX_CONCURRENT_JOBS" in rec.message
+            for rec in caplog.records
+        )
+    def test_rate_limit_invalid_in_public_mode_returns_default(
+        self, monkeypatch,
+    ) -> None:
+        from picarones.interfaces.web.security import get_rate_limit_per_hour
+        monkeypatch.setenv("PICARONES_PUBLIC_MODE", "1")
+        monkeypatch.setenv("PICARONES_RATE_LIMIT_PER_HOUR", "not-int")
+        assert get_rate_limit_per_hour() == 5
+    def test_rate_limit_dev_mode_returns_zero(self, monkeypatch) -> None:
+        """Hors mode public, pas de rate limit (0 = illimité)."""
+        from picarones.interfaces.web.security import get_rate_limit_per_hour
+        monkeypatch.delenv("PICARONES_PUBLIC_MODE", raising=False)
+        assert get_rate_limit_per_hour() == 0
+# ──────────────────────────────────────────────────────────────────────
+# compute_workspace_roots avec env var explicite
+# ──────────────────────────────────────────────────────────────────────
+class TestComputeWorkspaceRoots:
+    def test_env_var_overrides_defaults(self, monkeypatch, tmp_path) -> None:
+        from picarones.interfaces.web.security import compute_workspace_roots
+        d1 = tmp_path / "ws1"
+        d2 = tmp_path / "ws2"
+        d1.mkdir()
+        d2.mkdir()
+        monkeypatch.setenv(
+            "PICARONES_WORKSPACE_ROOTS", f"{d1}{os.pathsep}{d2}",
+        )
+        roots = compute_workspace_roots(tmp_path / "uploads")
+        # Les deux paths explicites doivent être présents et résolus.
+        resolved = [r.resolve() for r in roots]
+        assert d1.resolve() in resolved
+        assert d2.resolve() in resolved
+    def test_no_env_var_uses_defaults(self, monkeypatch, tmp_path) -> None:
+        from picarones.interfaces.web.security import compute_workspace_roots
+        monkeypatch.delenv("PICARONES_WORKSPACE_ROOTS", raising=False)
+        uploads = tmp_path / "uploads"
+        uploads.mkdir()
+        roots = compute_workspace_roots(uploads)
+        # Au moins ``uploads`` ou un parent doit être inclus.
+        resolved = [r.resolve() for r in roots]
+        assert any(
+            uploads.resolve() == r or uploads.resolve().is_relative_to(r)
+            for r in resolved
+        )
+# ──────────────────────────────────────────────────────────────────────
+# validate_image_safe — branche DecompressionBombError
+# ──────────────────────────────────────────────────────────────────────
+def _tiny_png_bytes() -> bytes:
+    """Produit un PNG 4×4 minimal (assez pour déclencher la bomb
+    si ``MAX_IMAGE_PIXELS`` est abaissé à 1)."""
+    from PIL import Image
+    img = Image.new("RGB", (4, 4), color=(255, 255, 255))
+    buf = io.BytesIO()
+    img.save(buf, format="PNG")
+    return buf.getvalue()
+class TestValidateImageSafe:
+    def test_decompression_bomb_rejected(self, monkeypatch) -> None:
+        """Simule une bomb en abaissant ``MAX_IMAGE_PIXELS`` sous la
+        taille de l'image — Pillow lève alors
+        ``DecompressionBombError`` que le helper doit transformer
+        en ``ValueError`` propre."""
+        from PIL import Image
+        from picarones.interfaces.web.security import validate_image_safe
+        data = _tiny_png_bytes()
+        monkeypatch.setattr(Image, "MAX_IMAGE_PIXELS", 2)
+        with pytest.raises(ValueError, match="bombe|décompression"):
+            validate_image_safe(data, filename="bomb.png")
+    def test_size_limit_enforced(self, monkeypatch) -> None:
+        """Buffer trop gros → rejet sans tenter Pillow."""
+        from picarones.interfaces.web.security import validate_image_safe
+        monkeypatch.setenv("PICARONES_MAX_UPLOAD_MB", "1")
+        data = b"\x00" * (2 * 1024 * 1024)  # 2 MB > 1 MB limit
+        with pytest.raises(ValueError, match="taille"):
+            validate_image_safe(data, filename="big.bin")
+    def test_valid_image_passes(self) -> None:
+        """Contrôle positif : image valide → aucune exception."""
+        from picarones.interfaces.web.security import validate_image_safe
+        validate_image_safe(_tiny_png_bytes(), filename="ok.png")  # no raise
+    def test_corrupt_bytes_rejected(self) -> None:
+        """Données non-image → ``ValueError`` (UnidentifiedImage ou
+        autre)."""
+        from picarones.interfaces.web.security import validate_image_safe
+        with pytest.raises(ValueError):
+            validate_image_safe(b"not-an-image-at-all", filename="nope.png")
+# ──────────────────────────────────────────────────────────────────────
+# _get_csrf_secret — fallback runtime
+# ──────────────────────────────────────────────────────────────────────
+class TestCSRFSecretRuntime:
+    def test_env_var_used_when_set(self, monkeypatch) -> None:
+        import picarones.interfaces.web.security as sec
+        monkeypatch.setenv("PICARONES_CSRF_SECRET", "fixed-secret")
+        # Reset le runtime secret pour s'assurer qu'on prend bien l'env.
+        monkeypatch.setattr(sec, "_csrf_secret_runtime", None)
+        secret = sec._get_csrf_secret()
+        assert secret == b"fixed-secret"
+    def test_runtime_generated_when_env_absent(
+        self, monkeypatch, caplog,
+    ) -> None:
+        import picarones.interfaces.web.security as sec
+        monkeypatch.delenv("PICARONES_CSRF_SECRET", raising=False)
+        monkeypatch.setattr(sec, "_csrf_secret_runtime", None)
+        with caplog.at_level("WARNING"):
+            secret1 = sec._get_csrf_secret()
+        assert isinstance(secret1, bytes)
+        assert len(secret1) == 32, "secrets.token_bytes(32) attendu"
+        # Warning émis pour signaler la config manquante.
+        assert any(
+            "PICARONES_CSRF_SECRET" in rec.message for rec in caplog.records
+        )
+        # Appel suivant → même secret (persistant durant la vie du process).
+        secret2 = sec._get_csrf_secret()
+        assert secret1 == secret2
+# ──────────────────────────────────────────────────────────────────────
+# RateLimiter.check — pruning de la fenêtre
+# ─────────────────���────────────────────────────────────────────────────
+class TestRateLimiterPruning:
+    def test_prunes_expired_hits(self) -> None:
+        """Un hit > 1h → purgé du bucket à l'appel suivant.  Couvre
+        la branche ``while bucket and bucket[0] < cutoff: popleft()``."""
+        from collections import deque
+        from picarones.interfaces.web.security import RateLimiter
+        rl = RateLimiter(max_per_hour=2)
+        # Pose un hit ancien (> 3600s) directement dans le bucket
+        # interne pour simuler le passage du temps sans sleep.
+        rl._buckets["1.2.3.4"] = deque([time.monotonic() - 7200.0])
+        rl.check("1.2.3.4")  # ne doit pas lever
+        # Le hit ancien est purgé, seul le nouveau reste.
+        assert len(rl._buckets["1.2.3.4"]) == 1, (
+            "le hit ancien aurait dû être purgé"
+        )
+    def test_quota_exceeded_raises(self) -> None:
+        from picarones.interfaces.web.security import RateLimiter
+        rl = RateLimiter(max_per_hour=2)
+        rl.check("5.6.7.8")
+        rl.check("5.6.7.8")
+        with pytest.raises(PermissionError, match="Quota"):
+            rl.check("5.6.7.8")
+    def test_disabled_when_max_zero(self) -> None:
+        """``max_per_hour=0`` → désactivé, jamais de PermissionError."""
+        from picarones.interfaces.web.security import RateLimiter
+        rl = RateLimiter(max_per_hour=0)
+        for _ in range(100):
+            rl.check("9.9.9.9")  # no raise

tests/web/routers/test_s8_benchmark_router_branches.py ADDED Viewed

	@@ -0,0 +1,215 @@

+"""Sprint S8.7 — couverture des branches non-SSE du benchmark router.
+Cible : lignes 100, 163, 170, 223 de
+``picarones/interfaces/web/routers/benchmark.py``
+- 100 : ``/api/benchmark/start`` retourne 429 quand le sémaphore
+  des jobs concurrents est plein ;
+- 163 : ``validated_prompt_filename`` est appelé pour chaque
+  ``CompetitorConfig.prompt_file`` non-vide → un nom de prompt
+  invalide doit être rejeté en 400 (vecteur d'exfiltration LLM) ;
+- 170 : ``/api/benchmark/run`` retourne 429 quand le sémaphore
+  est plein ;
+- 223 : ``/api/benchmark/{id}/cancel`` retourne idempotent quand
+  le job est déjà ``complete`` ou ``error``.
+Le SSE event generator (lignes 286-316) n'est pas couvert ici —
+il exige des fixtures async + une simulation de cycle de vie de
+job non triviale (tests dédiés ``test_sprint26_*``).
+"""
+from __future__ import annotations
+import threading
+import pytest
+def _make_app(monkeypatch, tmp_path):
+    """App avec ``UPLOADS_DIR`` et workspace_roots qui pointent vers
+    ``tmp_path`` pour faire passer la validation des chemins.
+    """
+    from fastapi import FastAPI
+    from picarones.interfaces.web.routers import benchmark as benchmark_router
+    from picarones.interfaces.web.routers import corpus as corpus_router
+    monkeypatch.setattr(corpus_router, "UPLOADS_DIR", tmp_path)
+    monkeypatch.setattr(benchmark_router, "UPLOADS_DIR", tmp_path)
+    app = FastAPI()
+    app.include_router(benchmark_router.router)
+    return app
+# ──────────────────────────────────────────────────────────────────────
+# 429 — sémaphore de jobs concurrents épuisé
+# ──────────────────────────────────────────────────────────────────────
+class TestSemaphoreFull429:
+    def test_start_returns_429_when_semaphore_exhausted(
+        self, monkeypatch, tmp_path,
+    ) -> None:
+        """``/api/benchmark/start`` doit retourner 429 (pas planter)
+        quand ``JOBS_SEMAPHORE.acquire(blocking=False)`` retourne
+        False — le worker ops a bien un signal d'epuisement."""
+        from fastapi.testclient import TestClient
+        from picarones.interfaces.web import state as web_state
+        # Crée le corpus et le rapports/ exigés par la validation.
+        corpus = tmp_path / "corpus_dir"
+        corpus.mkdir()
+        rapports = tmp_path / "rapports"
+        rapports.mkdir()
+        # Sémaphore capacité 0 — jamais acquérable.
+        monkeypatch.setattr(
+            web_state, "JOBS_SEMAPHORE", threading.Semaphore(0),
+        )
+        app = _make_app(monkeypatch, tmp_path)
+        with TestClient(app) as client:
+            r = client.post(
+                "/api/benchmark/start",
+                json={
+                    "corpus_path": str(corpus),
+                    "engines": ["tesseract"],
+                    "output_dir": str(rapports),
+                    "lang": "fra",
+                },
+            )
+            assert r.status_code == 429, r.text
+            assert (
+                "concurrents" in r.text.lower()
+                or "max" in r.text.lower()
+            )
+    def test_run_returns_429_when_semaphore_exhausted(
+        self, monkeypatch, tmp_path,
+    ) -> None:
+        from fastapi.testclient import TestClient
+        from picarones.interfaces.web import state as web_state
+        corpus = tmp_path / "corpus_dir"
+        corpus.mkdir()
+        rapports = tmp_path / "rapports"
+        rapports.mkdir()
+        monkeypatch.setattr(
+            web_state, "JOBS_SEMAPHORE", threading.Semaphore(0),
+        )
+        app = _make_app(monkeypatch, tmp_path)
+        with TestClient(app) as client:
+            r = client.post(
+                "/api/benchmark/run",
+                json={
+                    "corpus_path": str(corpus),
+                    "competitors": [
+                        {
+                            "name": "t",
+                            "ocr_engine": "tesseract",
+                            "ocr_model": "fra",
+                            "llm_provider": "",
+                        },
+                    ],
+                    "output_dir": str(rapports),
+                },
+            )
+            assert r.status_code == 429, r.text
+# ──────────────────────────────────────────────────────────────────────
+# Validation des prompts (sécurité exfiltration LLM)
+# ──────────────────────────────────────────────────────────────────────
+class TestPromptFileValidation:
+    def test_prompt_file_traversal_returns_400(
+        self, monkeypatch, tmp_path,
+    ) -> None:
+        """Un ``prompt_file`` qui tente de pointer hors de la
+        bibliothèque embarquée (``../../etc/passwd``) doit être
+        rejeté en 400 — branche ``validated_prompt_filename``
+        levée et capturée comme ``PathValidationError``."""
+        from fastapi.testclient import TestClient
+        corpus = tmp_path / "corpus_dir"
+        corpus.mkdir()
+        rapports = tmp_path / "rapports"
+        rapports.mkdir()
+        app = _make_app(monkeypatch, tmp_path)
+        with TestClient(app) as client:
+            r = client.post(
+                "/api/benchmark/run",
+                json={
+                    "corpus_path": str(corpus),
+                    "competitors": [
+                        {
+                            "name": "t",
+                            "ocr_engine": "tesseract",
+                            "ocr_model": "fra",
+                            "llm_provider": "mistral",
+                            "llm_model": "ministral-3b-latest",
+                            "prompt_file": "../../../etc/passwd",
+                        },
+                    ],
+                    "output_dir": str(rapports),
+                },
+            )
+            assert r.status_code == 400, r.text
+# ──────────────────────────────────────────────────────────────────────
+# /cancel idempotent sur jobs déjà terminés
+# ──────────────────────────────────────────────────────────────────────
+class TestCancelIdempotent:
+    @pytest.mark.parametrize("terminal_status", ["complete", "error"])
+    def test_cancel_already_finished_job_is_noop(
+        self, monkeypatch, tmp_path, terminal_status: str,
+    ) -> None:
+        """``/cancel`` sur un job ``complete`` ou ``error`` doit
+        retourner 200 + message ``déjà terminé`` (pas 4xx) — un
+        client qui retry ne doit pas voir une erreur."""
+        import uuid
+        from fastapi.testclient import TestClient
+        from picarones.interfaces.web import state as web_state
+        # ``job_id`` unique par paramètre — sinon
+        # ``JOB_STORE.create_job`` viole la contrainte UNIQUE entre
+        # les deux invocations du paramétrage.
+        job_id = f"test_job_finished_{terminal_status}_{uuid.uuid4().hex[:8]}"
+        job = web_state.BenchmarkJob(
+            job_id=job_id, _store=web_state.JOB_STORE,
+        )
+        web_state.JOB_STORE.create_job(job_id)
+        job.set_status(terminal_status)
+        web_state.register_job(job)
+        app = _make_app(monkeypatch, tmp_path)
+        with TestClient(app) as client:
+            r = client.post(f"/api/benchmark/{job_id}/cancel")
+            assert r.status_code == 200, r.text
+            body = r.json()
+            assert body["status"] == terminal_status
+            assert "terminé" in body["message"]
+    def test_cancel_unknown_job_returns_404(
+        self, monkeypatch, tmp_path,
+    ) -> None:
+        from fastapi.testclient import TestClient
+        app = _make_app(monkeypatch, tmp_path)
+        with TestClient(app) as client:
+            r = client.post(
+                "/api/benchmark/never_existed_xyz/cancel",
+            )
+            assert r.status_code == 404

tests/web/routers/test_s8_corpus_router_branches.py ADDED Viewed

	@@ -0,0 +1,266 @@

+"""Sprint S8.7 — couverture des branches d'erreur du corpus router.
+Cible (avant) : 88% — lignes 36-37, 50, 71-72, 111-114, 130-132,
+169, 174, 183-184 non couvertes.  Toutes représentent des
+contrats fonctionnels réels (403 sur path interdit, 415 sur
+image rejetée, robustness sur uploads dir absent…).
+"""
+from __future__ import annotations
+from pathlib import Path
+def _make_app(tmp_path, monkeypatch):
+    from fastapi import FastAPI
+    from picarones.interfaces.web.routers import corpus as corpus_router
+    uploads_dir = tmp_path / "uploads"
+    monkeypatch.setattr(corpus_router, "UPLOADS_DIR", uploads_dir)
+    # ``_BROWSE_ROOTS`` est calculé au module-load depuis l'``UPLOADS_DIR``
+    # original.  Pour le browse 403 on remplace par un set explicite
+    # contenant uniquement le dossier autorisé du test.
+    monkeypatch.setattr(
+        corpus_router, "_BROWSE_ROOTS", [tmp_path.resolve()],
+    )
+    app = FastAPI()
+    app.include_router(corpus_router.router)
+    return app, uploads_dir
+# ──────────────────────────────────────────────────────────────────────
+# /api/corpus/browse — défense 403 + 404
+# ──────────────────────────────────────────────────────────────────────
+class TestBrowseDefenses:
+    def test_browse_outside_allowed_roots_returns_403(
+        self, tmp_path, monkeypatch,
+    ) -> None:
+        """Tente de browser un dossier réel mais hors des
+        ``_BROWSE_ROOTS`` autorisés → 403."""
+        from fastapi.testclient import TestClient
+        # Crée un dossier réel hors du tmp_path autorisé.
+        outside_dir = tmp_path.parent / f"outside_{tmp_path.name}"
+        outside_dir.mkdir()
+        try:
+            app, _ = _make_app(tmp_path, monkeypatch)
+            with TestClient(app) as client:
+                r = client.get(
+                    "/api/corpus/browse",
+                    params={"path": str(outside_dir)},
+                )
+                assert r.status_code == 403, r.text
+                assert "Accès refusé" in r.text or "refusé" in r.text
+        finally:
+            outside_dir.rmdir()
+    def test_browse_nonexistent_path_returns_404(
+        self, tmp_path, monkeypatch,
+    ) -> None:
+        from fastapi.testclient import TestClient
+        app, _ = _make_app(tmp_path, monkeypatch)
+        with TestClient(app) as client:
+            r = client.get(
+                "/api/corpus/browse",
+                params={"path": str(tmp_path / "nope")},
+            )
+            assert r.status_code == 404
+    def test_browse_legitimate_path_returns_listing(
+        self, tmp_path, monkeypatch,
+    ) -> None:
+        """Contrôle positif : path autorisé → 200 + listing avec
+        détection ``has_corpus`` sur les sous-dossiers contenant
+        des ``.gt.txt``."""
+        from fastapi.testclient import TestClient
+        # Sous-dossier avec un fichier ``.gt.txt`` → has_corpus=True.
+        sub = tmp_path / "sub"
+        sub.mkdir()
+        (sub / "doc1.gt.txt").write_text("ground truth", encoding="utf-8")
+        app, _ = _make_app(tmp_path, monkeypatch)
+        with TestClient(app) as client:
+            r = client.get(
+                "/api/corpus/browse", params={"path": str(tmp_path)},
+            )
+            assert r.status_code == 200
+            data = r.json()
+            sub_item = next(
+                it for it in data["items"] if it["name"] == "sub"
+            )
+            assert sub_item["is_dir"] is True
+            assert sub_item["gt_count"] == 1
+            assert sub_item["has_corpus"] is True
+# ──────────────────────────────────────────────────────────────────────
+# /api/corpus/uploads — listing avec dossiers absents/non-dir
+# ──────────────────────────────────────────────────────────────────────
+class TestUploadsListing:
+    def test_uploads_dir_missing_returns_empty_list(
+        self, tmp_path, monkeypatch,
+    ) -> None:
+        """Pas d'``UPLOADS_DIR`` → liste vide (pas une erreur)."""
+        from fastapi.testclient import TestClient
+        app, uploads_dir = _make_app(tmp_path, monkeypatch)
+        assert not uploads_dir.exists()  # pre-condition
+        with TestClient(app) as client:
+            r = client.get("/api/corpus/uploads")
+            assert r.status_code == 200
+            assert r.json() == {"uploads": []}
+    def test_uploads_skips_non_directory_entries(
+        self, tmp_path, monkeypatch,
+    ) -> None:
+        """Un fichier accidentel à la racine d'``UPLOADS_DIR`` ne doit
+        pas planter le listing — on saute, on continue."""
+        from fastapi.testclient import TestClient
+        app, uploads_dir = _make_app(tmp_path, monkeypatch)
+        uploads_dir.mkdir()
+        (uploads_dir / "stray.txt").write_text("not a corpus")
+        # Vrai corpus dans un sous-dossier — détecté normalement.
+        real = uploads_dir / "real_corpus"
+        real.mkdir()
+        (real / "img.png").write_bytes(b"")
+        (real / "img.gt.txt").write_text("gt", encoding="utf-8")
+        with TestClient(app) as client:
+            r = client.get("/api/corpus/uploads")
+            assert r.status_code == 200
+            uploads = r.json()["uploads"]
+            ids = [u["corpus_id"] for u in uploads]
+            assert "real_corpus" in ids
+            assert "stray.txt" not in ids, (
+                "le fichier non-dir aurait dû être sauté"
+            )
+    def test_uploads_handles_broken_corpus_with_warning(
+        self, tmp_path, monkeypatch, caplog,
+    ) -> None:
+        """``analyze_corpus_dir`` qui plante sur un dossier doit être
+        loggé en warning, pas masquer la liste des autres."""
+        from fastapi.testclient import TestClient
+        from picarones.interfaces.web.routers import corpus as corpus_router
+        app, uploads_dir = _make_app(tmp_path, monkeypatch)
+        uploads_dir.mkdir()
+        (uploads_dir / "good_corpus").mkdir()
+        (uploads_dir / "broken_corpus").mkdir()
+        # Force ``analyze_corpus_dir`` à lever pour ``broken_corpus``
+        # uniquement, pour vérifier que le listing continue après
+        # l'exception.
+        original_analyze = corpus_router.analyze_corpus_dir
+        def fake_analyze(d: Path) -> dict:
+            if d.name == "broken_corpus":
+                raise RuntimeError("disque corrompu simulé")
+            return original_analyze(d)
+        monkeypatch.setattr(
+            corpus_router, "analyze_corpus_dir", fake_analyze,
+        )
+        with caplog.at_level("WARNING"):
+            with TestClient(app) as client:
+                r = client.get("/api/corpus/uploads")
+        assert r.status_code == 200
+        # ``good_corpus`` est listé, ``broken_corpus`` ignoré + warning.
+        ids = [u["corpus_id"] for u in r.json()["uploads"]]
+        assert "good_corpus" in ids
+        assert "broken_corpus" not in ids
+        assert any(
+            "broken_corpus" in rec.message for rec in caplog.records
+        ), "warning sur le corpus cassé attendu"
+# ──────────────────────────────────────────────────────────────────────
+# /api/corpus/upload — image rejetée → 415
+# ──────────────────────────────────────────────────────────────────────
+class TestUploadImageRejection:
+    def test_oversized_image_returns_415(
+        self, tmp_path, monkeypatch,
+    ) -> None:
+        """Image > limite → ``ValueError`` côté validation, mappé
+        en HTTP 415 par le handler."""
+        from fastapi.testclient import TestClient
+        app, uploads_dir = _make_app(tmp_path, monkeypatch)
+        uploads_dir.mkdir()
+        monkeypatch.setenv("PICARONES_MAX_UPLOAD_MB", "1")
+        big_data = b"\x89PNG\r\n\x1a\n" + b"\x00" * (2 * 1024 * 1024)
+        with TestClient(app) as client:
+            r = client.post(
+                "/api/corpus/upload",
+                files={"files": ("big.png", big_data, "image/png")},
+            )
+            assert r.status_code == 415, r.text
+            assert "taille" in r.text.lower() or "limite" in r.text.lower()
+# ──────────────────────────────────────────────────────────────────────
+# _is_path_allowed — branche d'exception (ValueError/TypeError)
+# ──────────────────────────────────────────────────────────────────────
+class TestIsPathAllowedException:
+    def test_value_error_on_compare_continues_to_next_root(
+        self, monkeypatch,
+    ) -> None:
+        """``Path.is_relative_to`` lève ``ValueError`` quand on
+        compare des paths de drives différents (Windows) ou autres
+        cas pathologiques.  Le helper doit continuer à itérer
+        plutôt que de planter."""
+        from picarones.interfaces.web.routers import corpus as corpus_router
+        class RaisingPath:
+            """Fake Path qui lève sur ``__eq__``/``is_relative_to``."""
+            def __eq__(self, other):
+                raise ValueError("simulated path comparison error")
+            def is_relative_to(self, other):
+                raise ValueError("simulated")
+        # Premier root lève → continue ; deuxième root match.
+        from pathlib import Path as RealPath
+        target = RealPath("/tmp")
+        monkeypatch.setattr(
+            corpus_router,
+            "_BROWSE_ROOTS",
+            [RaisingPath(), target],
+        )
+        assert corpus_router._is_path_allowed(target) is True
+    def test_no_match_returns_false(self, monkeypatch) -> None:
+        from pathlib import Path as RealPath
+        from picarones.interfaces.web.routers import corpus as corpus_router
+        # ``_BROWSE_ROOTS`` ne contient que des paths qui ne
+        # contiennent pas ``/totally/unrelated``.
+        monkeypatch.setattr(
+            corpus_router,
+            "_BROWSE_ROOTS",
+            [RealPath("/var/picarones-uploads-test-only")],
+        )
+        assert corpus_router._is_path_allowed(
+            RealPath("/totally/unrelated"),
+        ) is False

tests/web/test_s8_benchmark_utils_factory.py ADDED Viewed

	@@ -0,0 +1,289 @@

+"""Sprint S8.7 — couverture réelle des factories de
+``benchmark_utils.py`` (avant : 51.51% patch coverage).
+Pourquoi ce fichier
+-------------------
+``_build_llm_adapter`` et ``_engine_from_competitor`` sont les
+points de **routage** entre la config web (``CompetitorConfig``)
+et les adapters concrets : si une régression silencieusement
+fait passer ``mistral`` au lieu de ``openai``, ou ``tesseract``
+au lieu de ``mistral_ocr``, le benchmark tourne mais avec le
+mauvais moteur — tests fonctionnels classiques ne le verraient
+pas.
+Pattern
+-------
+Les adapters LLM lazy-importent leurs SDK (cf. ``__init__``
+sans ``import openai``), donc ``OpenAIAdapter()`` etc.
+s'instancient sans erreur même hors environnement de prod —
+on peut donc tester directement le routing sans mocker les SDK.
+Pour les adapters OCR cloud (mistral_ocr, google_vision,
+azure_doc_intel) qui exigent un SDK à l'import du wrapper,
+on réutilise le pattern ``patch.dict(sys.modules, {... : None})``
+de ``test_s8_factory_branches.py``.
+"""
+from __future__ import annotations
+import sys
+from unittest.mock import patch
+import pytest
+from picarones.interfaces.web.benchmark_utils import (
+    _build_llm_adapter,
+    _engine_from_competitor,
+    sse_format,
+)
+from picarones.interfaces.web.models import CompetitorConfig
+# ──────────────────────────────────────────────────────────────────────
+# _build_llm_adapter — routing par provider
+# ──────────────────────────────────────────────────────────────────────
+class TestBuildLLMAdapterRouting:
+    """Chaque provider de la config doit retourner exactement
+    l'adapter correspondant — pas un autre, pas une instance
+    fallback silencieuse."""
+    @pytest.mark.parametrize(
+        ("provider", "expected_class_name"),
+        [
+            ("openai", "OpenAIAdapter"),
+            ("anthropic", "AnthropicAdapter"),
+            ("mistral", "MistralAdapter"),
+            ("ollama", "OllamaAdapter"),
+        ],
+    )
+    def test_provider_routes_to_expected_adapter(
+        self, provider: str, expected_class_name: str,
+    ) -> None:
+        comp = CompetitorConfig(
+            name="t", ocr_engine="", llm_provider=provider, llm_model="m",
+        )
+        adapter = _build_llm_adapter(comp)
+        assert type(adapter).__name__ == expected_class_name, (
+            f"provider={provider!r} doit instancier "
+            f"{expected_class_name}, reçu {type(adapter).__name__}"
+        )
+    def test_unknown_provider_raises_value_error(self) -> None:
+        comp = CompetitorConfig(
+            name="t", ocr_engine="",
+            llm_provider="some_made_up_provider", llm_model="x",
+        )
+        with pytest.raises(ValueError, match="inconnu|unknown"):
+            _build_llm_adapter(comp)
+    def test_empty_llm_model_uses_adapter_default(self) -> None:
+        """Quand ``llm_model`` est vide, on passe ``None`` à
+        l'adapter (qui utilise son default interne) — pas une
+        chaîne vide qui serait rejetée par l'API."""
+        comp = CompetitorConfig(
+            name="t", ocr_engine="", llm_provider="openai", llm_model="",
+        )
+        adapter = _build_llm_adapter(comp)
+        # L'adapter doit être instancié sans planter sur llm_model="".
+        assert adapter is not None
+# ──────────────────────────────────────────────────────────────────────
+# _engine_from_competitor — routing OCR / pipeline / corpus-only
+# ──────────────────────────────────────────────────────────────────────
+class TestEngineFromCompetitorOCROnly:
+    """OCR seul (pas de ``llm_provider``) → retourne un
+    ``BaseOCRAdapter`` directement, prêt à être enregistré."""
+    def test_tesseract_only_returns_adapter(self) -> None:
+        comp = CompetitorConfig(
+            name="t", ocr_engine="tesseract", llm_provider="",
+            ocr_model="fra",
+        )
+        engine = _engine_from_competitor(comp)
+        assert engine.name == "tesseract"
+    def test_unknown_engine_raises_runtime_error(self) -> None:
+        """``RuntimeError`` (et pas ``ValueError`` brut) — c'est le
+        contrat documenté pour que le worker thread puisse
+        loguer ``warning`` et passer au concurrent suivant."""
+        comp = CompetitorConfig(
+            name="t", ocr_engine="not_an_engine", llm_provider="",
+        )
+        with pytest.raises(RuntimeError, match="inconnu"):
+            _engine_from_competitor(comp)
+class TestEngineFromCompetitorPipeline:
+    """OCR + LLM → retourne un ``OCRLLMPipelineConfig`` (rewrite)
+    avec le bon mode selon ``pipeline_mode``."""
+    @pytest.mark.parametrize(
+        ("pipeline_mode", "expected_mode"),
+        [
+            ("text_only", "text_only"),
+            ("post_correction_text", "text_only"),
+            ("text_and_image", "text_and_image"),
+            ("post_correction_image", "text_and_image"),
+            ("", "text_only"),  # fallback
+        ],
+    )
+    def test_pipeline_mode_mapping_with_ocr(
+        self, pipeline_mode: str, expected_mode: str,
+    ) -> None:
+        """Modes qui exigent un OCR amont (``text_only``,
+        ``text_and_image``) — testés avec ``tesseract`` réel."""
+        comp = CompetitorConfig(
+            name="t", ocr_engine="tesseract", llm_provider="mistral",
+            llm_model="m", ocr_model="fra", pipeline_mode=pipeline_mode,
+        )
+        pipeline = _engine_from_competitor(comp)
+        assert pipeline.mode == expected_mode
+    def test_zero_shot_mode_requires_corpus_ocr(self) -> None:
+        """Le mode ``zero_shot`` exige ``ocr_adapter=None`` au niveau
+        du pipeline (le VLM lit l'image directement) — donc côté
+        factory web, il doit être combiné avec ``ocr_engine=corpus``
+        ou ``""``, pas avec un moteur live."""
+        comp = CompetitorConfig(
+            name="t", ocr_engine="corpus", llm_provider="mistral",
+            llm_model="m", pipeline_mode="zero_shot",
+        )
+        pipeline = _engine_from_competitor(comp)
+        assert pipeline.mode == "zero_shot"
+        assert pipeline.ocr_adapter is None
+    def test_pipeline_name_from_explicit_name(self) -> None:
+        comp = CompetitorConfig(
+            name="my-pipeline", ocr_engine="tesseract",
+            llm_provider="mistral", llm_model="m", ocr_model="fra",
+        )
+        pipeline = _engine_from_competitor(comp)
+        assert pipeline.pipeline_name == "my-pipeline"
+    def test_pipeline_name_default_format(self) -> None:
+        """Sans ``name`` explicite, format ``{engine} → {model}``."""
+        comp = CompetitorConfig(
+            name="", ocr_engine="tesseract", llm_provider="mistral",
+            llm_model="ministral-3b-latest", ocr_model="fra",
+        )
+        pipeline = _engine_from_competitor(comp)
+        assert "tesseract" in pipeline.pipeline_name
+        assert "ministral" in pipeline.pipeline_name
+    def test_default_prompt_file_when_not_specified(self) -> None:
+        comp = CompetitorConfig(
+            name="t", ocr_engine="tesseract", llm_provider="mistral",
+            llm_model="m", ocr_model="fra", prompt_file="",
+        )
+        pipeline = _engine_from_competitor(comp)
+        assert pipeline.prompt_template == "correction_medieval_french.txt"
+class TestEngineFromCompetitorCorpusOCR:
+    """Mode ``corpus`` : utilise OCR pré-calculé (fichiers
+    ``.ocr.txt``) au lieu d'un moteur live — exige un
+    ``llm_provider`` car le pipeline a forcément besoin d'un
+    LLM (post-correction ou zero-shot)."""
+    @pytest.mark.parametrize("ocr_engine", ["corpus", ""])
+    def test_corpus_or_empty_without_llm_raises(
+        self, ocr_engine: str,
+    ) -> None:
+        comp = CompetitorConfig(
+            name="t", ocr_engine=ocr_engine, llm_provider="",
+        )
+        with pytest.raises(ValueError, match="llm_provider"):
+            _engine_from_competitor(comp)
+    @pytest.mark.parametrize("ocr_engine", ["corpus", ""])
+    def test_corpus_with_llm_returns_pipeline(
+        self, ocr_engine: str,
+    ) -> None:
+        """Mode corpus + LLM → pipeline ``zero_shot`` (le LLM/VLM
+        traite l'image ou l'OCR pré-calculé, l'``ocr_adapter`` est
+        ``None``)."""
+        comp = CompetitorConfig(
+            name="post-corr", ocr_engine=ocr_engine,
+            llm_provider="mistral", llm_model="m",
+            pipeline_mode="zero_shot",
+        )
+        pipeline = _engine_from_competitor(comp)
+        assert pipeline.ocr_adapter is None, (
+            "en mode corpus, l'OCR adapter doit être None — "
+            "le pipeline lit l'OCR pré-calculé du corpus."
+        )
+        assert pipeline.llm_adapter is not None
+    def test_corpus_pipeline_name_format(self) -> None:
+        """Sans ``name``, format ``corpus_ocr → {model}``."""
+        comp = CompetitorConfig(
+            name="", ocr_engine="corpus", llm_provider="mistral",
+            llm_model="ministral-3b-latest",
+            pipeline_mode="zero_shot",
+        )
+        pipeline = _engine_from_competitor(comp)
+        assert "corpus_ocr" in pipeline.pipeline_name
+        assert "ministral" in pipeline.pipeline_name
+class TestEngineFromCompetitorCloudWithoutSDK:
+    """Pour les adapters OCR cloud, le wrapper module est
+    importé conditionnellement — un SDK absent doit être
+    transformé en ``RuntimeError`` propre côté factory web."""
+    @pytest.mark.parametrize(
+        ("engine", "module_path"),
+        [
+            ("mistral_ocr", "picarones.adapters.ocr.mistral_ocr"),
+            ("google_vision", "picarones.adapters.ocr.google_vision"),
+            ("azure_doc_intel", "picarones.adapters.ocr.azure_doc_intel"),
+        ],
+    )
+    def test_cloud_engine_without_sdk_runtime_error(
+        self, engine: str, module_path: str,
+    ) -> None:
+        comp = CompetitorConfig(
+            name="t", ocr_engine=engine, llm_provider="",
+        )
+        with patch.dict(sys.modules, {module_path: None}):
+            with pytest.raises(RuntimeError, match="indisponible"):
+                _engine_from_competitor(comp)
+# ──────────────────────────────────────────────────────────────────────
+# sse_format — sérialisation Server-Sent Events
+# ──────────────────────────────────────────────────────────────────────
+class TestSSEFormat:
+    """Le format SSE doit respecter la spec WHATWG : ``id:`` (si
+    seq fourni), ``event:``, ``data:``, double newline final."""
+    def test_basic_event_no_seq(self) -> None:
+        out = sse_format("log", {"message": "hello"})
+        assert "event: log\n" in out
+        # ``json.dumps`` par défaut → séparateurs avec espace.
+        assert '"message": "hello"' in out
+        assert out.endswith("\n\n")
+        assert not out.startswith("id:")
+    def test_event_with_seq(self) -> None:
+        out = sse_format("progress", {"pct": 0.5}, seq=42)
+        assert out.startswith("id: 42\n")
+        assert "event: progress\n" in out
+    def test_unicode_preserved(self) -> None:
+        """``ensure_ascii=False`` — les accents passent en clair."""
+        out = sse_format("log", {"message": "événement"})
+        assert "événement" in out
+    def test_seq_zero_not_skipped(self) -> None:
+        """``seq=0`` est valide (premier événement) — ne doit pas
+        être traité comme None."""
+        out = sse_format("start", {}, seq=0)
+        assert out.startswith("id: 0\n")