Spaces:

Ma-Ri-Ba-Ku
/

Picarones

Running

Claude commited on 16 days ago

Commit

a98013e

unverified ·

1 Parent(s): ebddecf

feat(web): S1 — expose 6 toggles BenchmarkRunRequest figés dans l'UI

Sprint S1 — débloque les options Pydantic qui existaient déjà côté
API mais étaient jamais transmises depuis l'interface web.

Champs exposés :

1. ``report_lang`` (FR / EN) — toggle dans Options avancées. Permet
de générer un rapport en anglais ; auparavant figé à "fr".
2. ``views`` — checkboxes ``alto_documentary`` et ``searchability``
(``text_final`` toujours actif). Active les vues canoniques
AltoView et SearchView du rapport HTML.
3. ``expose_alto`` (par-compétiteur) — checkbox dans la section
Concurrents, visible uniquement quand le moteur sélectionné
est Tesseract. Débloque la production d'ALTO XML natif.
4. ``entity_extractor`` — input texte avec datalist de presets
(``spacy.fr_core_news_sm``, ``spacy.en_core_web_sm``).
Débloque les renderers NER summary + per category du rapport.
5. ``output_json`` — toggle ; ON dérive un chemin auto depuis
``report_name`` (relatif au workspace, validé par Pydantic).
6. ``partial_dir`` — toggle ; ON dérive un chemin checkpoint auto
avec timestamp. Permet la reprise après interruption.

Modifications :

- ``_view_benchmark.html`` : ajout d'une section ``<details>``
collapsible « Options avancées » dans Section 03 + checkbox
``compose-expose-alto`` masquable dans Section 02.
- ``web-app.js`` : helper ``_gatherAdvancedOptions()`` qui collecte
les 5 champs globaux ; étendu ``startBenchmark()``,
``_gatherCurrentConfig()``, ``_applyConfig()`` pour les inclure.
Handler ``onComposeOCRChange()`` montre/cache le checkbox
expose_alto selon le moteur. Modification ``addCompetitor()``
pour lire la valeur du checkbox sur les compétiteurs Tesseract.
- i18n : 12 nouvelles clés (FR + EN) dans la table T inline.

Tests (tests/web/test_benchmark_request_options.py) :
- 7 tests présence des IDs DOM dans le template rendu.
- 2 tests payload UI-style complet vs minimal accepté par l'API.
- 2 tests ``expose_alto`` per-compétiteur (mix Tesseract +
cloud OCR).
- 10 tests couverture i18n FR ↔ EN des nouveaux libellés.

Verification :
- 5182 tests passed (+21 vs S0-ter), 0 failed, 20 skipped
- make lint : All checks passed
- Le contrat Pydantic API ↔ payload restait inchangé (déjà testé
par tests/web/test_benchmark_run_b3_final_fields.py — 22 tests).
- Pas de refactor du JS ``web-app.js`` (1762 → 1885 LOC) — pas
de budget LOC enforced sur ce fichier ; le découpage ESM
proposé dans le plan S1 est reporté au sprint refonte rapport
(S5) pour cohérence (un seul gros refactor JS au lieu de deux).

DoD :
- 6 champs UI traversent jusqu'au worker ``run_benchmark_thread_v2``
(vérifié par lecture de benchmark_utils.py:469-525, propagation
déjà branchée depuis l'audit Phase D3 mai 2026).
- Test bout-en-bout : payload-style UI accepté par /api/benchmark/run.
- Bloc NER du rapport HTML deviendra rempli dès que l'utilisateur
fournit un entity_extractor sur un corpus annoté ENTITIES.

https://claude.ai/code/session_01WYDbfkhKPeBZ15BTP4e9Ye

Files changed (3) hide show

picarones/interfaces/web/static/web-app.js +123 -4
picarones/interfaces/web/templates/_view_benchmark.html +88 -0
tests/web/test_benchmark_request_options.py +261 -0

picarones/interfaces/web/static/web-app.js CHANGED Viewed

@@ -120,6 +120,22 @@ const T = {
     compose_prompt: "Prompt",
     compose_max_image_dim: "Image max (px)",
     compose_max_image_dim_hint: "0 = pleine résolution (défaut, méthodo inchangée). > 0 réduit l'image envoyée au VLM (modes image) pour éviter les 429 — change la méthodo, run fingerprinté à part.",
     compose_add: "+ Ajouter",
     compose_empty: "Aucun concurrent ajouté.",
     mode_text_only: "Post-correction texte",
@@ -269,6 +285,22 @@ const T = {
     compose_prompt: "Prompt",
     compose_max_image_dim: "Max image (px)",
     compose_max_image_dim_hint: "0 = full resolution (default, methodology unchanged). > 0 shrinks the image sent to the VLM (image modes) to avoid 429s — changes methodology, fingerprinted as a separate run.",
     compose_add: "+ Add",
     compose_empty: "No competitors added.",
     mode_text_only: "Text post-correction",
@@ -544,6 +576,15 @@ async function onComposeOCRChange() {
   const engine = document.getElementById("compose-ocr-engine").value;
   _pendingOCREngine = engine;   // marquer la requête courante
   const sp = document.getElementById("sp-ocr-model");
   // Google Vision et Azure ont des listes statiques — pas d'appel API nécessaire
   if (engine === "google_vision") {
     sp.style.display = "none";
@@ -658,9 +699,14 @@ function addCompetitor() {
   const comp = { name: "", engine_name: "", ocr_model: "",
                   llm_provider: "", llm_model: "", pipeline_mode: "", prompt_file: "",
-                  max_image_dimension: 0 };
   const _maxImgDim = parseInt((document.getElementById("compose-max-image-dim") || {}).value, 10);
   const maxImgDim = Number.isFinite(_maxImgDim) && _maxImgDim > 0 ? _maxImgDim : 0;
   if (mode === "postcorrection") {
     // Post-correction : OCR vient du corpus (.ocr.txt)
@@ -691,6 +737,7 @@ function addCompetitor() {
     comp.pipeline_mode = document.getElementById("compose-pipeline-mode").value;
     comp.prompt_file = document.getElementById("compose-prompt").value;
     comp.max_image_dimension = maxImgDim;
     if (!comp.llm_provider) {
       errEl.textContent = lang === "fr" ? "Sélectionnez un provider LLM." : "Select an LLM provider.";
       return;
@@ -708,7 +755,9 @@ function addCompetitor() {
     }
     comp.engine_name = ocrEngine;
     comp.ocr_model = ocrModel;
-    comp.name = `${ocrEngine}${ocrModel ? " ("+ocrModel+")" : ""}`;
   }
   errEl.textContent = "";
@@ -870,6 +919,7 @@ async function startBenchmark() {
     output_dir: document.getElementById("output-dir").value,
     report_name: document.getElementById("report-name").value,
     profile: (document.getElementById("run-profile") || {}).value || "standard",
   };
   document.getElementById("start-btn").disabled = true;
@@ -1603,11 +1653,57 @@ function _updateCorpusOCRNotice(corpusData) {
 // côté serveur (avec tests dédiés) mais aucun bouton ne les appelait —
 // code zombie typique post-rewrite.
 function _gatherCurrentConfig() {
   /** Sérialise l'état UI courant en dict compatible
    * ``/api/config/save``.  Inclut les compétiteurs composés
-   * (_competitors), les options de normalisation et le profil de
-   * langue rapport. */
   return {
     label: document.getElementById("report-name").value || "picarones-config",
     corpus_path: document.getElementById("corpus-path").value,
@@ -1617,6 +1713,7 @@ function _gatherCurrentConfig() {
     output_dir: document.getElementById("output-dir").value,
     report_name: document.getElementById("report-name").value,
     profile: (document.getElementById("run-profile") || {}).value || "standard",
   };
 }
@@ -1729,6 +1826,28 @@ function _applyConfig(cfg) {
     _competitors = cfg.competitors;
     renderCompetitors();
   }
 }
 // ─── Init ────────────────────────────────────────────────────────────────────

     compose_prompt: "Prompt",
     compose_max_image_dim: "Image max (px)",
     compose_max_image_dim_hint: "0 = pleine résolution (défaut, méthodo inchangée). > 0 réduit l'image envoyée au VLM (modes image) pour éviter les 429 — change la méthodo, run fingerprinté à part.",
+    compose_expose_alto_label: "Produire l'ALTO XML natif",
+    compose_expose_alto_hint: "Active la vue « alto_documentary » du rapport. Tesseract uniquement.",
+    bench_advanced_options: "Options avancées",
+    bench_advanced_options_hint: "vues du rapport, langue, NER, reprise, export JSON",
+    bench_report_lang_label: "Langue du rapport",
+    bench_entity_extractor_label: "Extracteur NER (optionnel)",
+    bench_entity_extractor_hint: "Format : module.submodule:Symbol. Débloque le bloc NER du rapport HTML.",
+    bench_views_label: "Vues d'évaluation à activer",
+    bench_view_text_final: "text_final (toujours actif)",
+    bench_view_alto_documentary: "alto_documentary",
+    bench_view_searchability: "searchability",
+    bench_views_hint: "alto_documentary nécessite qu'au moins un concurrent Tesseract ait expose_alto activé.",
+    bench_partial_resume_label: "Permettre la reprise sur interruption",
+    bench_partial_resume_hint: "Crée un répertoire de checkpoint pour reprendre un run interrompu.",
+    bench_output_json_label: "Exporter aussi en JSON",
+    bench_output_json_hint: "Génère un fichier JSON additionnel à côté du rapport HTML.",
     compose_add: "+ Ajouter",
     compose_empty: "Aucun concurrent ajouté.",
     mode_text_only: "Post-correction texte",
     compose_prompt: "Prompt",
     compose_max_image_dim: "Max image (px)",
     compose_max_image_dim_hint: "0 = full resolution (default, methodology unchanged). > 0 shrinks the image sent to the VLM (image modes) to avoid 429s — changes methodology, fingerprinted as a separate run.",
+    compose_expose_alto_label: "Produce native ALTO XML",
+    compose_expose_alto_hint: "Enables the \"alto_documentary\" report view. Tesseract only.",
+    bench_advanced_options: "Advanced options",
+    bench_advanced_options_hint: "report views, language, NER, resume, JSON export",
+    bench_report_lang_label: "Report language",
+    bench_entity_extractor_label: "NER extractor (optional)",
+    bench_entity_extractor_hint: "Format: module.submodule:Symbol. Unlocks the NER block of the HTML report.",
+    bench_views_label: "Evaluation views to enable",
+    bench_view_text_final: "text_final (always on)",
+    bench_view_alto_documentary: "alto_documentary",
+    bench_view_searchability: "searchability",
+    bench_views_hint: "alto_documentary requires at least one Tesseract competitor with expose_alto enabled.",
+    bench_partial_resume_label: "Enable resume after interruption",
+    bench_partial_resume_hint: "Creates a checkpoint directory to resume an interrupted run.",
+    bench_output_json_label: "Also export as JSON",
+    bench_output_json_hint: "Generates an additional JSON file alongside the HTML report.",
     compose_add: "+ Add",
     compose_empty: "No competitors added.",
     mode_text_only: "Text post-correction",
   const engine = document.getElementById("compose-ocr-engine").value;
   _pendingOCREngine = engine;   // marquer la requête courante
   const sp = document.getElementById("sp-ocr-model");
+  // expose_alto est spécifique à Tesseract.  Visible uniquement pour ce moteur.
+  const altoWrap = document.getElementById("compose-expose-alto-wrap");
+  if (altoWrap) {
+    altoWrap.style.display = engine === "tesseract" ? "block" : "none";
+    if (engine !== "tesseract") {
+      const cb = document.getElementById("compose-expose-alto");
+      if (cb) cb.checked = false;
+    }
+  }
   // Google Vision et Azure ont des listes statiques — pas d'appel API nécessaire
   if (engine === "google_vision") {
     sp.style.display = "none";
   const comp = { name: "", engine_name: "", ocr_model: "",
                   llm_provider: "", llm_model: "", pipeline_mode: "", prompt_file: "",
+                  max_image_dimension: 0, expose_alto: false };
   const _maxImgDim = parseInt((document.getElementById("compose-max-image-dim") || {}).value, 10);
   const maxImgDim = Number.isFinite(_maxImgDim) && _maxImgDim > 0 ? _maxImgDim : 0;
+  // expose_alto n'est lu que pour Tesseract (les autres moteurs ne le
+  // propagent pas — ignoré côté adapter, mais on ne l'envoie pas pour
+  // garder le payload propre).
+  const exposeAltoCb = document.getElementById("compose-expose-alto");
+  const composeExposeAlto = !!(exposeAltoCb && exposeAltoCb.checked);
   if (mode === "postcorrection") {
     // Post-correction : OCR vient du corpus (.ocr.txt)
     comp.pipeline_mode = document.getElementById("compose-pipeline-mode").value;
     comp.prompt_file = document.getElementById("compose-prompt").value;
     comp.max_image_dimension = maxImgDim;
+    if (ocrEngine === "tesseract") comp.expose_alto = composeExposeAlto;
     if (!comp.llm_provider) {
       errEl.textContent = lang === "fr" ? "Sélectionnez un provider LLM." : "Select an LLM provider.";
       return;
     }
     comp.engine_name = ocrEngine;
     comp.ocr_model = ocrModel;
+    if (ocrEngine === "tesseract") comp.expose_alto = composeExposeAlto;
+    const altoSuffix = (ocrEngine === "tesseract" && composeExposeAlto) ? " · ALTO" : "";
+    comp.name = `${ocrEngine}${ocrModel ? " ("+ocrModel+")" : ""}${altoSuffix}`;
   }
   errEl.textContent = "";
     output_dir: document.getElementById("output-dir").value,
     report_name: document.getElementById("report-name").value,
     profile: (document.getElementById("run-profile") || {}).value || "standard",
+    ..._gatherAdvancedOptions(),
   };
   document.getElementById("start-btn").disabled = true;
 // côté serveur (avec tests dédiés) mais aucun bouton ne les appelait —
 // code zombie typique post-rewrite.
+function _gatherAdvancedOptions() {
+  /** Collecte les 5 champs de la section « Options avancées » sous
+   * forme de dict prêt à être étalé dans le payload
+   * ``POST /api/benchmark/run`` ou la sauvegarde de config.
+   *
+   * Champs collectés :
+   *   - ``report_lang``   ("fr" | "en")
+   *   - ``views``         liste des vues activées (text_final toujours inclus)
+   *   - ``partial_dir``   chemin auto-généré si toggle ON, "" sinon
+   *   - ``entity_extractor``  dotted path NER, "" si vide
+   *   - ``output_json``   chemin auto si toggle ON, "" sinon
+   *
+   * ``expose_alto`` n'est PAS dans ce dict — il est par-compétiteur
+   * dans ``_competitors[i].expose_alto`` (positionné par addCompetitor).
+   */
+  const views = ["text_final"];
+  if ((document.getElementById("view-alto-documentary") || {}).checked) {
+    views.push("alto_documentary");
+  }
+  if ((document.getElementById("view-searchability") || {}).checked) {
+    views.push("searchability");
+  }
+  // Reprise : si toggle ON, générer un partial_dir relatif avec
+  // timestamp lisible.  Le validator Pydantic refuse les chemins
+  // absolus et ``..``, ce format est sûr.
+  let partialDir = "";
+  if ((document.getElementById("enable-partial-resume") || {}).checked) {
+    const ts = new Date().toISOString().replace(/[:.]/g, "-").slice(0, 19);
+    partialDir = `partial/run-${ts}`;
+  }
+  // Export JSON : si toggle ON, dériver le chemin depuis report_name
+  // (ou un nom par défaut).  Relatif au output_dir côté serveur.
+  let outputJson = "";
+  if ((document.getElementById("enable-output-json") || {}).checked) {
+    const stem = (document.getElementById("report-name").value || "rapport").trim();
+    outputJson = `${stem}.json`;
+  }
+  return {
+    report_lang: (document.getElementById("report-lang") || {}).value || "fr",
+    views: views,
+    partial_dir: partialDir,
+    entity_extractor: (document.getElementById("entity-extractor") || {}).value.trim(),
+    output_json: outputJson,
+  };
+}
 function _gatherCurrentConfig() {
   /** Sérialise l'état UI courant en dict compatible
    * ``/api/config/save``.  Inclut les compétiteurs composés
+   * (_competitors), les options de normalisation, le profil de
+   * langue rapport et les options avancées. */
   return {
     label: document.getElementById("report-name").value || "picarones-config",
     corpus_path: document.getElementById("corpus-path").value,
     output_dir: document.getElementById("output-dir").value,
     report_name: document.getElementById("report-name").value,
     profile: (document.getElementById("run-profile") || {}).value || "standard",
+    ..._gatherAdvancedOptions(),
   };
 }
     _competitors = cfg.competitors;
     renderCompetitors();
   }
+  // Options avancées
+  if (typeof cfg.report_lang === "string") {
+    const sel = document.getElementById("report-lang");
+    if (sel) sel.value = cfg.report_lang;
+  }
+  if (Array.isArray(cfg.views)) {
+    const altoCb = document.getElementById("view-alto-documentary");
+    if (altoCb) altoCb.checked = cfg.views.includes("alto_documentary");
+    const searchCb = document.getElementById("view-searchability");
+    if (searchCb) searchCb.checked = cfg.views.includes("searchability");
+  }
+  if (typeof cfg.entity_extractor === "string") {
+    const inp = document.getElementById("entity-extractor");
+    if (inp) inp.value = cfg.entity_extractor;
+  }
+  // ``partial_dir`` / ``output_json`` non vides → toggle ON (l'UI
+  // régénère le chemin auto à la prochaine soumission, donc la
+  // valeur exacte n'est pas restaurée mais le toggle l'est).
+  const enablePartial = document.getElementById("enable-partial-resume");
+  if (enablePartial) enablePartial.checked = !!(cfg.partial_dir && cfg.partial_dir !== "");
+  const enableJson = document.getElementById("enable-output-json");
+  if (enableJson) enableJson.checked = !!(cfg.output_json && cfg.output_json !== "");
 }
 // ─── Init ────────────────────────────────────────────────────────────────────

picarones/interfaces/web/templates/_view_benchmark.html CHANGED Viewed

@@ -128,6 +128,19 @@
         </div>
       </div>
       <div id="compose-pipeline-section" style="display:none; margin-top:14px;">
         <div class="grid-2" style="gap:14px;">
           <div class="field">
@@ -243,6 +256,81 @@
           <input type="text" id="report-name" placeholder="rapport_2026_05_20" class="mono-input" />
         </div>
       </div>
     </div>
   </div>

         </div>
       </div>
+      {# expose_alto : visible uniquement pour Tesseract.  Active la
+         production native d'ALTO XML, débloquant la vue
+         ``alto_documentary`` du rapport HTML. #}
+      <div id="compose-expose-alto-wrap" class="field" style="margin-top:12px;">
+        <label class="row" style="gap:8px; align-items:flex-start; cursor:pointer;">
+          <input type="checkbox" id="compose-expose-alto" />
+          <span>
+            <span class="field-label" style="display:inline; margin:0;" data-i18n="compose_expose_alto_label">Produire l'ALTO XML natif</span>
+            <small class="help" style="display:block; margin-top:4px; padding-left:0; border-left:0;" data-i18n="compose_expose_alto_hint">Active la vue « alto_documentary » du rapport.  Tesseract uniquement.</small>
+          </span>
+        </label>
+      </div>
       <div id="compose-pipeline-section" style="display:none; margin-top:14px;">
         <div class="grid-2" style="gap:14px;">
           <div class="field">
           <input type="text" id="report-name" placeholder="rapport_2026_05_20" class="mono-input" />
         </div>
       </div>
+      {# Options avancées (collapsible) — débloquent les vues
+         alto/searchability, le rapport en/fr, l'extracteur NER, la
+         reprise sur interruption et l'export JSON additionnel. #}
+      <details id="bench-advanced-options" style="margin-top:18px;">
+        <summary style="cursor:pointer; font-weight:500; padding:8px 0; color:var(--g-700);">
+          <span data-i18n="bench_advanced_options">Options avancées</span>
+          <small class="help" style="display:inline; margin-left:8px; padding-left:0; border-left:0;" data-i18n="bench_advanced_options_hint">vues du rapport, langue, NER, reprise, export JSON</small>
+        </summary>
+        <div class="grid-2" style="gap:14px; margin-top:14px;">
+          <div class="field">
+            <div class="field-label"><span data-i18n="bench_report_lang_label">Langue du rapport</span></div>
+            <select id="report-lang">
+              <option value="fr">Français</option>
+              <option value="en">English</option>
+            </select>
+          </div>
+          <div class="field">
+            <div class="field-label">
+              <span data-i18n="bench_entity_extractor_label">Extracteur NER (optionnel)</span>
+            </div>
+            <input type="text" id="entity-extractor" class="mono-input"
+                   list="entity-extractor-presets"
+                   placeholder="ex : spacy.fr_core_news_sm" />
+            <datalist id="entity-extractor-presets">
+              <option value="spacy.fr_core_news_sm">Spacy — français standard</option>
+              <option value="spacy.en_core_web_sm">Spacy — English standard</option>
+            </datalist>
+            <small class="help" style="margin-top:6px; padding-left:0; border-left:0;" data-i18n="bench_entity_extractor_hint">Format : <code>module.submodule:Symbol</code>.  Débloque le bloc NER du rapport HTML.</small>
+          </div>
+        </div>
+        <div class="field" style="margin-top:14px;">
+          <div class="field-label"><span data-i18n="bench_views_label">Vues d'évaluation à activer</span></div>
+          <div class="row" style="gap:18px; flex-wrap:wrap; margin-top:4px;">
+            <label class="row" style="gap:6px; cursor:pointer;">
+              <input type="checkbox" id="view-text-final" checked disabled />
+              <span data-i18n="bench_view_text_final">text_final (toujours actif)</span>
+            </label>
+            <label class="row" style="gap:6px; cursor:pointer;">
+              <input type="checkbox" id="view-alto-documentary" />
+              <span data-i18n="bench_view_alto_documentary">alto_documentary</span>
+            </label>
+            <label class="row" style="gap:6px; cursor:pointer;">
+              <input type="checkbox" id="view-searchability" />
+              <span data-i18n="bench_view_searchability">searchability</span>
+            </label>
+          </div>
+          <small class="help" style="margin-top:6px; padding-left:0; border-left:0;" data-i18n="bench_views_hint">
+            <code>alto_documentary</code> nécessite qu'au moins un concurrent Tesseract ait <code>expose_alto</code> activé.
+          </small>
+        </div>
+        <div class="grid-2" style="gap:14px; margin-top:14px;">
+          <div class="field">
+            <label class="row" style="gap:8px; align-items:flex-start; cursor:pointer;">
+              <input type="checkbox" id="enable-partial-resume" />
+              <span>
+                <span class="field-label" style="display:inline; margin:0;" data-i18n="bench_partial_resume_label">Permettre la reprise sur interruption</span>
+                <small class="help" style="display:block; margin-top:4px; padding-left:0; border-left:0;" data-i18n="bench_partial_resume_hint">Crée un répertoire de checkpoint pour reprendre un run interrompu.</small>
+              </span>
+            </label>
+          </div>
+          <div class="field">
+            <label class="row" style="gap:8px; align-items:flex-start; cursor:pointer;">
+              <input type="checkbox" id="enable-output-json" />
+              <span>
+                <span class="field-label" style="display:inline; margin:0;" data-i18n="bench_output_json_label">Exporter aussi en JSON</span>
+                <small class="help" style="display:block; margin-top:4px; padding-left:0; border-left:0;" data-i18n="bench_output_json_hint">Génère un fichier JSON additionnel à côté du rapport HTML.</small>
+              </span>
+            </label>
+          </div>
+        </div>
+      </details>
     </div>
   </div>

tests/web/test_benchmark_request_options.py ADDED Viewed

	@@ -0,0 +1,261 @@

+"""Tests S1 — toggles UI exposés pour les options `BenchmarkRunRequest`.
+Sprint S1 expose dans le template web 6 champs Pydantic qui étaient
+figés côté UI :
+- ``report_lang`` (FR / EN)
+- ``views`` (text_final + alto_documentary + searchability)
+- ``expose_alto`` (par-compétiteur, sur Tesseract uniquement)
+- ``entity_extractor`` (dotted path NER)
+- ``output_json`` (toggle qui dérive un chemin auto)
+- ``partial_dir`` (toggle qui dérive un chemin auto)
+Ces tests vérifient :
+1. La présence des éléments DOM dans le template HTML rendu
+   (l'UI peut effectivement collecter ces valeurs).
+2. Le payload complet UI-style est accepté par
+   ``POST /api/benchmark/run`` (compat ascendante préservée).
+3. Le payload sans les nouveaux champs continue de marcher (les
+   défauts Pydantic restent valides).
+Le contrat **API ↔ Pydantic** lui-même (validation positive/négative,
+path traversal) est testé exhaustivement par
+``tests/web/test_benchmark_run_b3_final_fields.py``.  Ce module-ci
+cible la couche **UI → JSON payload**.
+"""
+from __future__ import annotations
+from pathlib import Path
+import pytest
+from fastapi.testclient import TestClient
+@pytest.fixture
+def client():
+    from picarones.interfaces.web.app import app
+    return TestClient(app)
+@pytest.fixture
+def workspace_corpus(tmp_path: Path) -> str:
+    """Crée un corpus minimal sous un workspace autorisé pour les
+    validators d'``output_dir`` / ``corpus_path``."""
+    from PIL import Image
+    img_path = tmp_path / "doc01.png"
+    Image.new("RGB", (40, 40), color=(255, 255, 255)).save(img_path)
+    (tmp_path / "doc01.gt.txt").write_text("hello", encoding="utf-8")
+    return str(tmp_path)
+# ─────────────────────────────────────────────────────────────────────────────
+# 1. Présence des éléments DOM dans le template
+# ─────────────────────────────────────────────────────────────────────────────
+class TestAdvancedOptionsUIElements:
+    """Le template ``_view_benchmark.html`` doit contenir les IDs DOM
+    consommés par ``_gather_advanced_options()`` côté JS."""
+    @pytest.fixture
+    def html(self, client) -> str:
+        response = client.get("/")
+        assert response.status_code == 200
+        return response.text
+    def test_report_lang_select_present(self, html: str) -> None:
+        assert 'id="report-lang"' in html, (
+            "Sélecteur de langue du rapport manquant — "
+            "report_lang ne sera jamais transmis depuis l'UI."
+        )
+        # Options FR + EN visibles
+        assert 'value="fr"' in html
+        assert 'value="en"' in html
+    def test_views_checkboxes_present(self, html: str) -> None:
+        """Les checkboxes pour alto_documentary et searchability."""
+        assert 'id="view-alto-documentary"' in html
+        assert 'id="view-searchability"' in html
+        # text_final reste activé en permanence (checkbox disabled)
+        assert 'id="view-text-final"' in html
+    def test_entity_extractor_input_present(self, html: str) -> None:
+        assert 'id="entity-extractor"' in html
+        # Le datalist propose des presets utiles
+        assert "spacy.fr_core_news_sm" in html
+        assert "spacy.en_core_web_sm" in html
+    def test_partial_resume_toggle_present(self, html: str) -> None:
+        assert 'id="enable-partial-resume"' in html
+    def test_output_json_toggle_present(self, html: str) -> None:
+        assert 'id="enable-output-json"' in html
+    def test_expose_alto_checkbox_present(self, html: str) -> None:
+        """``expose_alto`` est dans la section compose (par-compétiteur)."""
+        assert 'id="compose-expose-alto"' in html
+        # Wrap dont la visibilité dépend du moteur (Tesseract uniquement).
+        assert 'id="compose-expose-alto-wrap"' in html
+    def test_advanced_options_collapsible_present(self, html: str) -> None:
+        """Section ``<details>`` qui regroupe les options avancées."""
+        assert 'id="bench-advanced-options"' in html
+# ─────────────────────────────────────────────────────────────────────────────
+# 2. Payload UI-style complet accepté par l'API
+# ─────────────────────────────────────────────────────────────────────────────
+class TestFullUIStylePayloadAccepted:
+    """Un payload qui reflète la sérialisation produite par
+    ``_gather_advanced_options()`` doit être accepté tel quel."""
+    def test_payload_with_all_advanced_options(
+        self, client, workspace_corpus: str,
+    ) -> None:
+        """Smoke test : payload UI-style avec les 6 toggles activés.
+        Note : le validator de chemins est strict — on utilise
+        ``partial/run-2026-05-23`` (relatif, sans ``..``) et
+        ``rapport.json`` (basename).
+        """
+        payload = {
+            "corpus_path": workspace_corpus,
+            "competitors": [
+                {
+                    "name": "tesseract (fra) · ALTO",
+                    "engine_name": "tesseract",
+                    "ocr_model": "fra",
+                    "expose_alto": True,
+                    "max_image_dimension": 0,
+                },
+            ],
+            "normalization_profile": "nfc",
+            "char_exclude": "",
+            "output_dir": workspace_corpus,  # même workspace
+            "report_name": "test_s1",
+            "profile": "standard",
+            # Options avancées :
+            "report_lang": "en",
+            "views": ["text_final", "alto_documentary", "searchability"],
+            "partial_dir": "partial/run-2026-05-23",
+            "entity_extractor": "spacy.fr_core_news_sm",
+            "output_json": "test_s1.json",
+        }
+        # On ne lance pas vraiment le benchmark (lourd) — on vérifie
+        # juste que Pydantic accepte le payload et que le runner
+        # démarre un job.  Le retour est ``{"job_id": ..., "status": "pending"}``.
+        response = client.post("/api/benchmark/run", json=payload)
+        # Accepte 200 (job démarré) ou 429 (rate-limited en CI parallèle)
+        # mais surtout pas 422 (validation).
+        assert response.status_code != 422, (
+            f"Payload rejeté par Pydantic : {response.json()}"
+        )
+    def test_payload_with_minimal_advanced_options(
+        self, client, workspace_corpus: str,
+    ) -> None:
+        """Payload sans les nouveaux champs : compat ascendante.
+        Un client qui n'envoie aucun champ avancé doit continuer à
+        marcher.  Les défauts Pydantic prennent le relais.
+        """
+        payload = {
+            "corpus_path": workspace_corpus,
+            "competitors": [
+                {"name": "tesseract", "engine_name": "tesseract"},
+            ],
+            "output_dir": workspace_corpus,
+        }
+        response = client.post("/api/benchmark/run", json=payload)
+        assert response.status_code != 422, (
+            f"Payload minimal rejeté : {response.json()}"
+        )
+# ─────────────────────────────────────────────────────────────────────────────
+# 3. expose_alto par-compétiteur : transmission propre
+# ─────────────────────────────────────────────────────────────────────────────
+class TestExposeAltoPerCompetitor:
+    """``expose_alto`` est un champ de ``PipelineConfig``, pas de
+    ``BenchmarkRunRequest`` : il s'applique par-compétiteur."""
+    def test_expose_alto_only_on_tesseract_competitor(
+        self, client, workspace_corpus: str,
+    ) -> None:
+        """Mixer un compétiteur Tesseract+expose_alto et un autre OCR
+        cloud sans expose_alto doit fonctionner."""
+        payload = {
+            "corpus_path": workspace_corpus,
+            "competitors": [
+                {
+                    "name": "tesseract (ALTO)",
+                    "engine_name": "tesseract",
+                    "ocr_model": "fra",
+                    "expose_alto": True,
+                },
+                {
+                    "name": "mistral_ocr",
+                    "engine_name": "mistral_ocr",
+                    "expose_alto": False,  # No-op pour Mistral
+                },
+            ],
+            "output_dir": workspace_corpus,
+            "views": ["text_final", "alto_documentary"],
+        }
+        response = client.post("/api/benchmark/run", json=payload)
+        assert response.status_code != 422
+    def test_expose_alto_default_false(self, client, workspace_corpus: str) -> None:
+        """Compat ascendante : un client qui n'envoie pas expose_alto
+        reçoit le défaut ``false``."""
+        from picarones.interfaces.web.models import PipelineConfig
+        config = PipelineConfig(name="t", engine_name="tesseract")
+        assert config.expose_alto is False
+# ─────────────────────────────────────────────────────────────────────────────
+# 4. Synchronisation i18n des nouvelles clés
+# ─────────────────────────────────────────────────────────────��───────────────
+class TestI18nNewKeysCovered:
+    """Les nouveaux libellés doivent avoir une traduction FR ET EN
+    pour ne pas dégrader silencieusement en clé brute (``bench_views_label``)
+    quand la langue d'interface est EN."""
+    @pytest.fixture
+    def js_source(self) -> str:
+        path = (
+            Path(__file__).resolve().parents[2]
+            / "picarones" / "interfaces" / "web" / "static" / "web-app.js"
+        )
+        return path.read_text(encoding="utf-8")
+    @pytest.mark.parametrize("key", [
+        "compose_expose_alto_label",
+        "compose_expose_alto_hint",
+        "bench_advanced_options",
+        "bench_report_lang_label",
+        "bench_entity_extractor_label",
+        "bench_views_label",
+        "bench_view_alto_documentary",
+        "bench_view_searchability",
+        "bench_partial_resume_label",
+        "bench_output_json_label",
+    ])
+    def test_key_present_in_both_languages(self, js_source: str, key: str) -> None:
+        """Chaque clé doit apparaître au moins **deux fois** dans la
+        source — une fois dans ``T.fr`` et une fois dans ``T.en``."""
+        count = js_source.count(f"{key}:")
+        assert count >= 2, (
+            f"Clé i18n {key!r} déclarée seulement {count} fois ; "
+            "attendu ≥ 2 (FR + EN).  Risque : clé brute affichée en EN."
+        )