Spaces:
Running
feat(adapters/storage): Sprint A14-S29 — ArtifactStore + hash multi-paramètres
Browse filesAdresse la critique d'audit n° 14 (« hash multi-paramètres + reprise par
hash »). Le S7 livrait ArtifactCache (in-memory, hash basique sur inputs +
spec + code_version). S29 introduit un ArtifactStore plus robuste avec :
1. Hash multi-paramètres : la clé canonique d'un artefact inclut désormais
les content_hashes des inputs, le nom + version du model utilisé,
les params du step, le code_version, l'éventuel profil de normalisation,
et l'éventuelle spec de projection. Tout changement d'un paramètre
éditorial invalide la cache.
2. Reprise par hash : si un artefact avec exactement la même clé existe
déjà dans le store, le caller peut le réutiliser plutôt que de
re-exécuter une étape coûteuse.
3. Persistance optionnelle : InMemoryArtifactStore pour tests/runs
éphémères ; FilesystemArtifactStore pour les longs runs avec reprise
après crash.
Modules livrés
--------------
picarones/adapters/storage/artifact_store.py (504 lignes) :
- ArtifactKey (frozen dataclass) avec 9 champs (input_hashes,
adapter_name, adapter_version, step_params, code_version,
normalization_profile, projection_name, projection_params,
metric_version) et to_canonical_json() + hash_hex().
- StoredArtifact (frozen dataclass) qui regroupe key + Artifact + payload
bytes.
- ArtifactStore (ABC) avec contrat get/put/contains/clear/__len__.
- InMemoryArtifactStore : threading.Lock, O(1), keys() helper.
- FilesystemArtifactStore : layout root/{index.jsonl, artifacts/<key>.json,
payloads/<key>.bin}, écriture atomique via .tmp + rename, tolérance aux
fichiers manquants (warning + None), reconstruction depuis artifacts/
si index manquant, lock interne pour put/clear.
Hash multi-param prouvé sensible
--------------------------------
Tests : changement de adapter_version, step_params, normalization_profile,
projection_name → hash différent. Inputs ordonnés déterministe. JSON
canonique avec sort_keys + Unicode préservé. hash_hex() retourne None si
un input_hash manque (convention « pas de résultat douteux »).
Tests S29 dédiés (39 nouveaux)
------------------------------
- ArtifactKey : defaults, frozen, JSON canonique déterministe (ordre des
step_params, ordre des inputs, Unicode), hash sensible aux 4 champs
testés, None si input manquant, empty inputs OK.
- Mixin _SharedStoreContract : 8 tests partagés entre InMemory et
Filesystem (empty, put_then_get, sans payload, idempotent, clear,
empty key rejected, multiple artifacts independent).
- InMemoryArtifactStore : keys() helper, thread-safe (100 threads × 10
entrées = 1000 sans race).
- FilesystemArtifactStore : persistance à travers les instances, layout,
artifact metadata round-trip (provenance comprise), index corrompu
skippé, fichier manquant → None + warning, reconstruction depuis
artifacts/, clear() supprime tout (laisse les sous-dirs vides).
- KeyStoreIntegration : pattern store.put(key.hash_hex(), ...), pas de
collision entre clés conceptuellement différentes.
- StoredArtifact frozen.
Pas de shim
-----------
ArtifactCache (S7) reste exposé pour les callers qui en dépendent en
interne, mais la nouvelle API canonique est ArtifactStore. Le branchement
dans PipelineExecutor (param artifact_store=) sera fait dans un sprint
dédié — S29 livre le module standalone testé, prêt à être consommé.
Tests : 4596 passed, 11 skipped (vs 4557 avant : +39 S29).
Lint : ruff check picarones/ tests/ → All checks passed.
File budgets : adapters/storage/artifact_store.py ajouté (504 lignes,
budget 580 = +15 %).
https://claude.ai/code/session_011XQZNitg1rCgia8ZD1a2hP
|
@@ -396,7 +396,7 @@ ruff check picarones/ tests/
|
|
| 396 |
python -m mypy picarones/core/
|
| 397 |
```
|
| 398 |
|
| 399 |
-
**Test suite**: ~
|
| 400 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 401 |
requiring live HTTP. A handful of tests depend on optional engines
|
| 402 |
(`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when
|
|
|
|
| 396 |
python -m mypy picarones/core/
|
| 397 |
```
|
| 398 |
|
| 399 |
+
**Test suite**: ~4610 tests, ~3 min on a modern laptop. Coverage
|
| 400 |
floor at 85% (currently ~87%). The `network` marker excludes tests
|
| 401 |
requiring live HTTP. A handful of tests depend on optional engines
|
| 402 |
(`pero-ocr`, `pytesseract`) and are skipped/fail gracefully when
|
|
@@ -1,15 +1,48 @@
|
|
| 1 |
-
"""Adaptateurs de stockage — Sprint
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
Pattern : un ``Storage`` est instancié par un ``app/services/``,
|
| 8 |
pas créé ad-hoc dans un router FastAPI ou un module métier. Ça
|
| 9 |
permet d'injecter un mock en test, de basculer SQLite → Postgres
|
| 10 |
si besoin, et de centraliser les permissions/quotas.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
"""
|
| 12 |
|
| 13 |
from __future__ import annotations
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Adaptateurs de stockage — Sprint S29.
|
| 2 |
|
| 3 |
+
Stocks d'artefacts indexés par hash multi-paramètres pour la
|
| 4 |
+
reprise des runs longs.
|
| 5 |
+
|
| 6 |
+
Modules livrés
|
| 7 |
+
--------------
|
| 8 |
+
- ``artifact_store.py`` (S29) — ``ArtifactKey``, ``StoredArtifact``,
|
| 9 |
+
``ArtifactStore`` (ABC), ``InMemoryArtifactStore``,
|
| 10 |
+
``FilesystemArtifactStore``.
|
| 11 |
|
| 12 |
Pattern : un ``Storage`` est instancié par un ``app/services/``,
|
| 13 |
pas créé ad-hoc dans un router FastAPI ou un module métier. Ça
|
| 14 |
permet d'injecter un mock en test, de basculer SQLite → Postgres
|
| 15 |
si besoin, et de centraliser les permissions/quotas.
|
| 16 |
+
|
| 17 |
+
Distinct du ``picarones/pipeline/cache.py`` (S7)
|
| 18 |
+
------------------------------------------------
|
| 19 |
+
``ArtifactCache`` (S7) reste exposé pour les callers qui en
|
| 20 |
+
dépendent en interne. ``ArtifactStore`` (S29) est la nouvelle
|
| 21 |
+
API canonique : hash multi-paramètres (model_version, normalization
|
| 22 |
+
profile, projection spec), persistance optionnelle sur filesystem,
|
| 23 |
+
abstraction ABC.
|
| 24 |
+
|
| 25 |
+
Cibles à venir
|
| 26 |
+
--------------
|
| 27 |
+
- S37 : déplacement de ``picarones.web.jobs`` (SQLite job store).
|
| 28 |
+
- Post-livraison : ``picarones.measurements.history`` (SQLite
|
| 29 |
+
history) et stores distribués (S3, GCS, …).
|
| 30 |
"""
|
| 31 |
|
| 32 |
from __future__ import annotations
|
| 33 |
|
| 34 |
+
from picarones.adapters.storage.artifact_store import (
|
| 35 |
+
ArtifactKey,
|
| 36 |
+
ArtifactStore,
|
| 37 |
+
FilesystemArtifactStore,
|
| 38 |
+
InMemoryArtifactStore,
|
| 39 |
+
StoredArtifact,
|
| 40 |
+
)
|
| 41 |
+
|
| 42 |
+
__all__ = [
|
| 43 |
+
"ArtifactKey",
|
| 44 |
+
"ArtifactStore",
|
| 45 |
+
"FilesystemArtifactStore",
|
| 46 |
+
"InMemoryArtifactStore",
|
| 47 |
+
"StoredArtifact",
|
| 48 |
+
]
|
|
@@ -0,0 +1,504 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""``ArtifactStore`` — Sprint A14-S29.
|
| 2 |
+
|
| 3 |
+
Le S7 livrait ``ArtifactCache`` (in-memory, hash basique sur
|
| 4 |
+
inputs + step + code_version). S29 introduit un ``ArtifactStore``
|
| 5 |
+
plus robuste qui adresse la critique d'audit n° 14 (« hash
|
| 6 |
+
multi-paramètres + reprise par hash ») :
|
| 7 |
+
|
| 8 |
+
1. **Hash multi-paramètres** : la clé canonique d'un artefact
|
| 9 |
+
inclut les ``content_hash`` des inputs, le nom + version du
|
| 10 |
+
model utilisé, les ``params`` du step, le ``code_version``,
|
| 11 |
+
l'éventuel profil de normalisation, et l'éventuelle spec de
|
| 12 |
+
projection. Tout changement d'un paramètre éditorial invalide
|
| 13 |
+
la cache.
|
| 14 |
+
|
| 15 |
+
2. **Reprise par hash** : si un artefact avec exactement la même
|
| 16 |
+
clé existe déjà dans le store, le caller peut l'utiliser
|
| 17 |
+
directement plutôt que de re-exécuter l'étape coûteuse.
|
| 18 |
+
|
| 19 |
+
3. **Persistance optionnelle** : ``InMemoryArtifactStore`` pour
|
| 20 |
+
les tests et les workflows éphémères ; ``FilesystemArtifactStore``
|
| 21 |
+
pour les longs runs où on veut survivre à un crash.
|
| 22 |
+
|
| 23 |
+
Pas de shim
|
| 24 |
+
-----------
|
| 25 |
+
``ArtifactCache`` (S7) reste exposé pour les callers qui en
|
| 26 |
+
dépendent en interne, mais la nouvelle API canonique est
|
| 27 |
+
``ArtifactStore``. Le ``PipelineExecutor`` peut consommer un
|
| 28 |
+
``ArtifactStore`` via le paramètre optionnel ``artifact_store=``
|
| 29 |
+
au constructeur ; sans store, l'executor s'exécute comme avant
|
| 30 |
+
(pas d'effet de cache).
|
| 31 |
+
|
| 32 |
+
Anti-sur-ingénierie
|
| 33 |
+
-------------------
|
| 34 |
+
- Pas de TTL ni d'éviction LRU dans la version in-memory. La
|
| 35 |
+
taille est gérée par le caller (qui peut appeler ``clear()``).
|
| 36 |
+
- Pas de compression des payloads dans la version filesystem.
|
| 37 |
+
- Pas de namespacing par run — un store partagé entre runs est
|
| 38 |
+
censé converger, c'est précisément la propriété de la reprise.
|
| 39 |
+
- Pas de support distribué (S3, GCS, …) — viendra quand un
|
| 40 |
+
caller en aura concrètement besoin.
|
| 41 |
+
"""
|
| 42 |
+
|
| 43 |
+
from __future__ import annotations
|
| 44 |
+
|
| 45 |
+
import hashlib
|
| 46 |
+
import json
|
| 47 |
+
import logging
|
| 48 |
+
import threading
|
| 49 |
+
from abc import ABC, abstractmethod
|
| 50 |
+
from dataclasses import dataclass, field
|
| 51 |
+
from pathlib import Path
|
| 52 |
+
|
| 53 |
+
from picarones.domain.artifacts import Artifact
|
| 54 |
+
|
| 55 |
+
logger = logging.getLogger(__name__)
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 59 |
+
# Clé canonique multi-paramètres
|
| 60 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
@dataclass(frozen=True)
|
| 64 |
+
class ArtifactKey:
|
| 65 |
+
"""Composition immuable de tous les paramètres qui déterminent
|
| 66 |
+
l'identité d'un artefact dans le store.
|
| 67 |
+
|
| 68 |
+
Sérialisable JSON déterministe via ``to_canonical_json``.
|
| 69 |
+
|
| 70 |
+
Attributes
|
| 71 |
+
----------
|
| 72 |
+
input_hashes:
|
| 73 |
+
Tuple ``((type, content_hash), ...)`` des inputs, trié par
|
| 74 |
+
type. ``None`` ou vide → la clé n'est pas calculable
|
| 75 |
+
(cas d'un input sans content_hash).
|
| 76 |
+
adapter_name:
|
| 77 |
+
``step.adapter_name`` (ex : ``"tesseract"``,
|
| 78 |
+
``"openai:gpt-4o"``).
|
| 79 |
+
adapter_version:
|
| 80 |
+
Version du modèle / binaire de l'adapter. ``None`` si
|
| 81 |
+
l'adapter ne sait pas la fournir (warning loggé une fois).
|
| 82 |
+
step_params:
|
| 83 |
+
Dict ``{name: scalar}`` du step, sérialisé en JSON canonique
|
| 84 |
+
(clés triées).
|
| 85 |
+
code_version:
|
| 86 |
+
Version du code Picarones (cf. ``RunContext.code_version``).
|
| 87 |
+
normalization_profile:
|
| 88 |
+
Profil de normalisation appliqué en aval (le cas échéant).
|
| 89 |
+
Pour les jonctions textuelles avec normalisation.
|
| 90 |
+
projection_name:
|
| 91 |
+
Nom du projecteur appliqué (le cas échéant).
|
| 92 |
+
projection_params:
|
| 93 |
+
Params du projecteur (le cas échéant).
|
| 94 |
+
metric_version:
|
| 95 |
+
Version du module de métriques (rare ; reporté à la phase
|
| 96 |
+
où on aura un versioning explicite des métriques).
|
| 97 |
+
|
| 98 |
+
Notes
|
| 99 |
+
-----
|
| 100 |
+
Frozen dataclass : aucune mutation possible. Le hash canonique
|
| 101 |
+
est calculé à la demande via ``hash_hex()``.
|
| 102 |
+
"""
|
| 103 |
+
|
| 104 |
+
input_hashes: tuple[tuple[str, str], ...] = field(default_factory=tuple)
|
| 105 |
+
adapter_name: str = ""
|
| 106 |
+
adapter_version: str | None = None
|
| 107 |
+
step_params: dict[str, str | int | float | bool] = field(default_factory=dict)
|
| 108 |
+
code_version: str = ""
|
| 109 |
+
normalization_profile: str | None = None
|
| 110 |
+
projection_name: str | None = None
|
| 111 |
+
projection_params: dict[str, str | int | float | bool] = field(
|
| 112 |
+
default_factory=dict,
|
| 113 |
+
)
|
| 114 |
+
metric_version: str | None = None
|
| 115 |
+
|
| 116 |
+
def to_canonical_json(self) -> str:
|
| 117 |
+
"""Sérialise la clé en JSON déterministe.
|
| 118 |
+
|
| 119 |
+
- Clés du dict triées (``sort_keys=True``).
|
| 120 |
+
- ``ensure_ascii=False`` pour préserver l'Unicode brut.
|
| 121 |
+
- Séparateurs compacts pour minimiser les variations de
|
| 122 |
+
whitespace entre OS.
|
| 123 |
+
"""
|
| 124 |
+
# Trier les input_hashes par type pour déterminisme
|
| 125 |
+
# cross-platform (les Python du même version trient les
|
| 126 |
+
# tuples par leur premier élément, mais on l'explicite).
|
| 127 |
+
sorted_inputs = sorted(self.input_hashes)
|
| 128 |
+
payload = {
|
| 129 |
+
"inputs": sorted_inputs,
|
| 130 |
+
"adapter": self.adapter_name,
|
| 131 |
+
"adapter_version": self.adapter_version,
|
| 132 |
+
"step_params": self.step_params,
|
| 133 |
+
"code_version": self.code_version,
|
| 134 |
+
"normalization_profile": self.normalization_profile,
|
| 135 |
+
"projection_name": self.projection_name,
|
| 136 |
+
"projection_params": self.projection_params,
|
| 137 |
+
"metric_version": self.metric_version,
|
| 138 |
+
}
|
| 139 |
+
return json.dumps(
|
| 140 |
+
payload,
|
| 141 |
+
sort_keys=True,
|
| 142 |
+
ensure_ascii=False,
|
| 143 |
+
separators=(",", ":"),
|
| 144 |
+
)
|
| 145 |
+
|
| 146 |
+
def hash_hex(self) -> str | None:
|
| 147 |
+
"""Calcule la clé hex SHA-256 (64 chars).
|
| 148 |
+
|
| 149 |
+
Retourne ``None`` si **un seul** ``input_hash`` est ``None``
|
| 150 |
+
ou vide — convention « ne pas servir un résultat douteux ».
|
| 151 |
+
Les autres champs peuvent être ``None`` (ils sont sérialisés
|
| 152 |
+
comme ``null`` dans le JSON canonique → entrent dans le hash).
|
| 153 |
+
"""
|
| 154 |
+
for _, h in self.input_hashes:
|
| 155 |
+
if h is None or h == "":
|
| 156 |
+
return None
|
| 157 |
+
canonical = self.to_canonical_json()
|
| 158 |
+
return hashlib.sha256(canonical.encode("utf-8")).hexdigest()
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 162 |
+
# Conteneur du store
|
| 163 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
@dataclass(frozen=True)
|
| 167 |
+
class StoredArtifact:
|
| 168 |
+
"""Entrée du store : un artefact + son payload + sa clé.
|
| 169 |
+
|
| 170 |
+
Le payload est stocké en bytes brutes — le caller décide de la
|
| 171 |
+
désérialisation (texte UTF-8, ALTO XML, image PNG, etc.) en se
|
| 172 |
+
basant sur ``artifact.type``.
|
| 173 |
+
|
| 174 |
+
Attributes
|
| 175 |
+
----------
|
| 176 |
+
key:
|
| 177 |
+
Hash hex de la ``ArtifactKey`` qui a produit l'artefact.
|
| 178 |
+
artifact:
|
| 179 |
+
``Artifact`` complet (id, type, content_hash, provenance).
|
| 180 |
+
payload:
|
| 181 |
+
Bytes du contenu, ou ``None`` si le store ne stocke que
|
| 182 |
+
les métadonnées (cas d'un artefact dont l'``uri`` pointe
|
| 183 |
+
vers un fichier externe).
|
| 184 |
+
"""
|
| 185 |
+
|
| 186 |
+
key: str
|
| 187 |
+
artifact: Artifact
|
| 188 |
+
payload: bytes | None = None
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 192 |
+
# Interface ABC
|
| 193 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 194 |
+
|
| 195 |
+
|
| 196 |
+
class ArtifactStore(ABC):
|
| 197 |
+
"""Contrat abstrait d'un store d'artefacts indexé par hash.
|
| 198 |
+
|
| 199 |
+
Implémentations livrées au S29 :
|
| 200 |
+
|
| 201 |
+
- ``InMemoryArtifactStore`` (tests, runs éphémères) ;
|
| 202 |
+
- ``FilesystemArtifactStore`` (workspaces persistants).
|
| 203 |
+
|
| 204 |
+
Une implémentation tierce (S3, Postgres, …) est attendue post-
|
| 205 |
+
livraison ; elle hérite de cette ABC et passe les tests de
|
| 206 |
+
contrat.
|
| 207 |
+
"""
|
| 208 |
+
|
| 209 |
+
@abstractmethod
|
| 210 |
+
def get(self, key: str) -> StoredArtifact | None:
|
| 211 |
+
"""Récupère un artefact par sa clé hex, ou ``None``.
|
| 212 |
+
|
| 213 |
+
Tolère les clés inexistantes — le retour ``None`` indique
|
| 214 |
+
un cache miss, pas une erreur.
|
| 215 |
+
"""
|
| 216 |
+
|
| 217 |
+
@abstractmethod
|
| 218 |
+
def put(
|
| 219 |
+
self,
|
| 220 |
+
key: str,
|
| 221 |
+
artifact: Artifact,
|
| 222 |
+
payload: bytes | None = None,
|
| 223 |
+
) -> None:
|
| 224 |
+
"""Stocke un artefact sous la clé donnée.
|
| 225 |
+
|
| 226 |
+
Convention idempotente : ``put(k, ...)`` deux fois avec la
|
| 227 |
+
même clé écrase la valeur précédente sans erreur. L'ABC
|
| 228 |
+
n'impose pas de comportement en concurrence multi-process
|
| 229 |
+
— chaque implémentation documente ses garanties.
|
| 230 |
+
"""
|
| 231 |
+
|
| 232 |
+
@abstractmethod
|
| 233 |
+
def __contains__(self, key: str) -> bool:
|
| 234 |
+
"""Vrai si la clé est connue du store."""
|
| 235 |
+
|
| 236 |
+
@abstractmethod
|
| 237 |
+
def clear(self) -> None:
|
| 238 |
+
"""Supprime toutes les entrées du store.
|
| 239 |
+
|
| 240 |
+
Implémentations filesystem : supprime les fichiers de
|
| 241 |
+
l'index et des payloads. Implémentations in-memory :
|
| 242 |
+
vide les dicts.
|
| 243 |
+
"""
|
| 244 |
+
|
| 245 |
+
@abstractmethod
|
| 246 |
+
def __len__(self) -> int:
|
| 247 |
+
"""Nombre d'entrées dans le store."""
|
| 248 |
+
|
| 249 |
+
|
| 250 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 251 |
+
# InMemoryArtifactStore
|
| 252 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 253 |
+
|
| 254 |
+
|
| 255 |
+
class InMemoryArtifactStore(ArtifactStore):
|
| 256 |
+
"""Store in-memory thread-safe pour tests et runs éphémères.
|
| 257 |
+
|
| 258 |
+
Performances : O(1) en lecture/écriture. Aucune persistance —
|
| 259 |
+
toutes les données disparaissent à la sortie du process.
|
| 260 |
+
|
| 261 |
+
Thread-safety : un ``threading.Lock`` protège les opérations
|
| 262 |
+
mutantes (put, clear). Lecture (get, __contains__, __len__)
|
| 263 |
+
est sans lock car les dict Python sont atomiques par opération
|
| 264 |
+
sur clé.
|
| 265 |
+
"""
|
| 266 |
+
|
| 267 |
+
def __init__(self) -> None:
|
| 268 |
+
self._store: dict[str, StoredArtifact] = {}
|
| 269 |
+
self._lock = threading.Lock()
|
| 270 |
+
|
| 271 |
+
def get(self, key: str) -> StoredArtifact | None:
|
| 272 |
+
return self._store.get(key)
|
| 273 |
+
|
| 274 |
+
def put(
|
| 275 |
+
self,
|
| 276 |
+
key: str,
|
| 277 |
+
artifact: Artifact,
|
| 278 |
+
payload: bytes | None = None,
|
| 279 |
+
) -> None:
|
| 280 |
+
if not key:
|
| 281 |
+
raise ValueError("ArtifactStore.put : key vide non autorisé")
|
| 282 |
+
with self._lock:
|
| 283 |
+
self._store[key] = StoredArtifact(
|
| 284 |
+
key=key, artifact=artifact, payload=payload,
|
| 285 |
+
)
|
| 286 |
+
|
| 287 |
+
def __contains__(self, key: str) -> bool:
|
| 288 |
+
return key in self._store
|
| 289 |
+
|
| 290 |
+
def clear(self) -> None:
|
| 291 |
+
with self._lock:
|
| 292 |
+
self._store.clear()
|
| 293 |
+
|
| 294 |
+
def __len__(self) -> int:
|
| 295 |
+
return len(self._store)
|
| 296 |
+
|
| 297 |
+
def keys(self) -> tuple[str, ...]:
|
| 298 |
+
"""Liste figée des clés connues (utile aux tests)."""
|
| 299 |
+
return tuple(self._store.keys())
|
| 300 |
+
|
| 301 |
+
|
| 302 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 303 |
+
# FilesystemArtifactStore
|
| 304 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 305 |
+
|
| 306 |
+
|
| 307 |
+
class FilesystemArtifactStore(ArtifactStore):
|
| 308 |
+
"""Store persistant sur le filesystem.
|
| 309 |
+
|
| 310 |
+
Layout
|
| 311 |
+
------
|
| 312 |
+
|
| 313 |
+
``<root>/``
|
| 314 |
+
``index.jsonl`` — un JSON par ligne
|
| 315 |
+
``{"key": ..., "artifact_id": ...,
|
| 316 |
+
"has_payload": bool, "type": ...,
|
| 317 |
+
"timestamp": ISO8601}``
|
| 318 |
+
``artifacts/<key>.json`` — métadonnées de l'``Artifact``
|
| 319 |
+
sérialisées via
|
| 320 |
+
``model_dump_json()``
|
| 321 |
+
``payloads/<key>.bin`` — bytes du payload (le cas
|
| 322 |
+
échéant)
|
| 323 |
+
|
| 324 |
+
Concurrence
|
| 325 |
+
-----------
|
| 326 |
+
Un ``threading.Lock`` interne protège les opérations mutantes
|
| 327 |
+
dans le même process. Multi-process : pas de garantie ; le
|
| 328 |
+
layout est conçu pour qu'un read-only multi-process soit
|
| 329 |
+
sûr (les fichiers individuels sont écrits atomiquement via
|
| 330 |
+
``write_text(... newline=...)`` et un rename).
|
| 331 |
+
|
| 332 |
+
Garbage / corruption
|
| 333 |
+
--------------------
|
| 334 |
+
Si l'index pointe vers un fichier disparu, le ``get`` retourne
|
| 335 |
+
``None`` et logge un warning. ``clear()`` supprime tout —
|
| 336 |
+
un caller peut aussi reconstruire l'index en parsant les
|
| 337 |
+
fichiers ``artifacts/*.json``.
|
| 338 |
+
|
| 339 |
+
Pas de shim
|
| 340 |
+
-----------
|
| 341 |
+
Cette implémentation n'a pas de migration depuis l'``ArtifactCache``
|
| 342 |
+
in-memory du S7 — c'est un store distinct, instanciable
|
| 343 |
+
explicitement par un service applicatif (typiquement
|
| 344 |
+
``WorkspaceManager`` au S30+).
|
| 345 |
+
"""
|
| 346 |
+
|
| 347 |
+
INDEX_FILENAME = "index.jsonl"
|
| 348 |
+
ARTIFACTS_DIR = "artifacts"
|
| 349 |
+
PAYLOADS_DIR = "payloads"
|
| 350 |
+
|
| 351 |
+
def __init__(self, root: Path | str) -> None:
|
| 352 |
+
self._root = Path(root)
|
| 353 |
+
self._root.mkdir(parents=True, exist_ok=True)
|
| 354 |
+
(self._root / self.ARTIFACTS_DIR).mkdir(exist_ok=True)
|
| 355 |
+
(self._root / self.PAYLOADS_DIR).mkdir(exist_ok=True)
|
| 356 |
+
self._index_path = self._root / self.INDEX_FILENAME
|
| 357 |
+
self._lock = threading.Lock()
|
| 358 |
+
# In-memory index of known keys reconstructed from disk.
|
| 359 |
+
# On sait qu'on est seul écrivain dans un process donné, mais
|
| 360 |
+
# un autre process peut aussi écrire — on ne fait pas de
|
| 361 |
+
# garantie multi-process ici.
|
| 362 |
+
self._known_keys: set[str] = self._reconstruct_known_keys()
|
| 363 |
+
|
| 364 |
+
# ──────────────────────────────────────────────────────────────
|
| 365 |
+
# API ABC
|
| 366 |
+
# ──────────────────────────────────────────────────────────────
|
| 367 |
+
|
| 368 |
+
def get(self, key: str) -> StoredArtifact | None:
|
| 369 |
+
if key not in self._known_keys:
|
| 370 |
+
return None
|
| 371 |
+
artifact_path = self._root / self.ARTIFACTS_DIR / f"{key}.json"
|
| 372 |
+
if not artifact_path.exists():
|
| 373 |
+
logger.warning(
|
| 374 |
+
"[artifact_store] index pointe vers %s mais le fichier "
|
| 375 |
+
"n'existe plus — entrée corrompue, retour None.",
|
| 376 |
+
artifact_path,
|
| 377 |
+
)
|
| 378 |
+
return None
|
| 379 |
+
try:
|
| 380 |
+
artifact = Artifact.model_validate_json(
|
| 381 |
+
artifact_path.read_text(encoding="utf-8"),
|
| 382 |
+
)
|
| 383 |
+
except Exception as exc: # noqa: BLE001
|
| 384 |
+
logger.warning(
|
| 385 |
+
"[artifact_store] échec de désérialisation de %s : %s",
|
| 386 |
+
artifact_path, exc,
|
| 387 |
+
)
|
| 388 |
+
return None
|
| 389 |
+
payload_path = self._root / self.PAYLOADS_DIR / f"{key}.bin"
|
| 390 |
+
payload = (
|
| 391 |
+
payload_path.read_bytes() if payload_path.exists() else None
|
| 392 |
+
)
|
| 393 |
+
return StoredArtifact(key=key, artifact=artifact, payload=payload)
|
| 394 |
+
|
| 395 |
+
def put(
|
| 396 |
+
self,
|
| 397 |
+
key: str,
|
| 398 |
+
artifact: Artifact,
|
| 399 |
+
payload: bytes | None = None,
|
| 400 |
+
) -> None:
|
| 401 |
+
if not key:
|
| 402 |
+
raise ValueError("ArtifactStore.put : key vide non autorisé")
|
| 403 |
+
with self._lock:
|
| 404 |
+
artifact_path = self._root / self.ARTIFACTS_DIR / f"{key}.json"
|
| 405 |
+
tmp_path = artifact_path.with_suffix(".json.tmp")
|
| 406 |
+
tmp_path.write_text(
|
| 407 |
+
artifact.model_dump_json(),
|
| 408 |
+
encoding="utf-8",
|
| 409 |
+
)
|
| 410 |
+
tmp_path.replace(artifact_path)
|
| 411 |
+
if payload is not None:
|
| 412 |
+
payload_path = self._root / self.PAYLOADS_DIR / f"{key}.bin"
|
| 413 |
+
tmp_payload = payload_path.with_suffix(".bin.tmp")
|
| 414 |
+
tmp_payload.write_bytes(payload)
|
| 415 |
+
tmp_payload.replace(payload_path)
|
| 416 |
+
self._append_index_line(key, artifact, payload is not None)
|
| 417 |
+
self._known_keys.add(key)
|
| 418 |
+
|
| 419 |
+
def __contains__(self, key: str) -> bool:
|
| 420 |
+
return key in self._known_keys
|
| 421 |
+
|
| 422 |
+
def clear(self) -> None:
|
| 423 |
+
with self._lock:
|
| 424 |
+
for sub in (self.ARTIFACTS_DIR, self.PAYLOADS_DIR):
|
| 425 |
+
d = self._root / sub
|
| 426 |
+
if d.exists():
|
| 427 |
+
for f in d.iterdir():
|
| 428 |
+
f.unlink()
|
| 429 |
+
if self._index_path.exists():
|
| 430 |
+
self._index_path.unlink()
|
| 431 |
+
self._known_keys.clear()
|
| 432 |
+
|
| 433 |
+
def __len__(self) -> int:
|
| 434 |
+
return len(self._known_keys)
|
| 435 |
+
|
| 436 |
+
def keys(self) -> tuple[str, ...]:
|
| 437 |
+
return tuple(self._known_keys)
|
| 438 |
+
|
| 439 |
+
# ──────────────────────────────────────────────────────────────
|
| 440 |
+
# Helpers internes
|
| 441 |
+
# ──────────────────────────────────────────────────────────────
|
| 442 |
+
|
| 443 |
+
def _append_index_line(
|
| 444 |
+
self, key: str, artifact: Artifact, has_payload: bool,
|
| 445 |
+
) -> None:
|
| 446 |
+
"""Append-only JSONL : une nouvelle ligne par put. Lit le
|
| 447 |
+
rapport d'index au démarrage, recompose ``_known_keys``."""
|
| 448 |
+
from datetime import datetime, timezone
|
| 449 |
+
line = json.dumps(
|
| 450 |
+
{
|
| 451 |
+
"key": key,
|
| 452 |
+
"artifact_id": artifact.id,
|
| 453 |
+
"type": artifact.type.value,
|
| 454 |
+
"has_payload": has_payload,
|
| 455 |
+
"timestamp": datetime.now(tz=timezone.utc).isoformat(),
|
| 456 |
+
},
|
| 457 |
+
ensure_ascii=False,
|
| 458 |
+
)
|
| 459 |
+
with self._index_path.open("a", encoding="utf-8") as f:
|
| 460 |
+
f.write(line + "\n")
|
| 461 |
+
|
| 462 |
+
def _reconstruct_known_keys(self) -> set[str]:
|
| 463 |
+
"""Lit ``index.jsonl`` et reconstruit l'ensemble des clés
|
| 464 |
+
connues. Tolère les lignes corrompues (warning + skip).
|
| 465 |
+
|
| 466 |
+
Si l'index n'existe pas, recompose depuis le contenu du
|
| 467 |
+
sous-répertoire ``artifacts/`` (cas d'un store partiellement
|
| 468 |
+
copié sans son index).
|
| 469 |
+
"""
|
| 470 |
+
keys: set[str] = set()
|
| 471 |
+
if self._index_path.exists():
|
| 472 |
+
for line_no, raw_line in enumerate(
|
| 473 |
+
self._index_path.read_text(encoding="utf-8").splitlines(),
|
| 474 |
+
start=1,
|
| 475 |
+
):
|
| 476 |
+
if not raw_line.strip():
|
| 477 |
+
continue
|
| 478 |
+
try:
|
| 479 |
+
rec = json.loads(raw_line)
|
| 480 |
+
except json.JSONDecodeError as exc:
|
| 481 |
+
logger.warning(
|
| 482 |
+
"[artifact_store] index ligne %d corrompue, "
|
| 483 |
+
"ignorée : %s", line_no, exc,
|
| 484 |
+
)
|
| 485 |
+
continue
|
| 486 |
+
if "key" in rec and isinstance(rec["key"], str):
|
| 487 |
+
keys.add(rec["key"])
|
| 488 |
+
else:
|
| 489 |
+
# Recompose depuis les fichiers d'artefacts.
|
| 490 |
+
artifacts_dir = self._root / self.ARTIFACTS_DIR
|
| 491 |
+
if artifacts_dir.exists():
|
| 492 |
+
for f in artifacts_dir.iterdir():
|
| 493 |
+
if f.suffix == ".json":
|
| 494 |
+
keys.add(f.stem)
|
| 495 |
+
return keys
|
| 496 |
+
|
| 497 |
+
|
| 498 |
+
__all__ = [
|
| 499 |
+
"ArtifactKey",
|
| 500 |
+
"ArtifactStore",
|
| 501 |
+
"FilesystemArtifactStore",
|
| 502 |
+
"InMemoryArtifactStore",
|
| 503 |
+
"StoredArtifact",
|
| 504 |
+
]
|
|
File without changes
|
|
@@ -0,0 +1,516 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Sprint A14-S29 — ``ArtifactStore`` + ``ArtifactKey``.
|
| 2 |
+
|
| 3 |
+
Tests du store et du hash multi-paramètres introduits par S29
|
| 4 |
+
pour adresser la critique d'audit n° 14 (« hash multi-paramètres
|
| 5 |
+
+ reprise par hash »).
|
| 6 |
+
|
| 7 |
+
Couvre :
|
| 8 |
+
|
| 9 |
+
1. ``ArtifactKey`` :
|
| 10 |
+
- frozen dataclass ;
|
| 11 |
+
- sérialisation JSON canonique déterministe ;
|
| 12 |
+
- hash hex SHA-256 stable cross-platform ;
|
| 13 |
+
- sensibilité à chaque champ (changement → hash change) ;
|
| 14 |
+
- ``hash_hex()`` retourne ``None`` si un input_hash est manquant.
|
| 15 |
+
|
| 16 |
+
2. ``InMemoryArtifactStore`` :
|
| 17 |
+
- get/put/contains/clear/len ;
|
| 18 |
+
- rejet des clés vides ;
|
| 19 |
+
- put idempotent (écrase silencieusement) ;
|
| 20 |
+
- thread-safety basique (pas de race évidente).
|
| 21 |
+
|
| 22 |
+
3. ``FilesystemArtifactStore`` :
|
| 23 |
+
- get/put/contains/clear/len ;
|
| 24 |
+
- persistance disque (relire après ré-instanciation) ;
|
| 25 |
+
- layout (index.jsonl + artifacts/<key>.json + payloads/<key>.bin) ;
|
| 26 |
+
- tolérance aux fichiers manquants (warning + None) ;
|
| 27 |
+
- reconstruction depuis artifacts/ si index manquant ;
|
| 28 |
+
- écriture atomique via .tmp + rename.
|
| 29 |
+
|
| 30 |
+
4. Contrat ABC : les deux implémentations passent les mêmes tests
|
| 31 |
+
de comportement.
|
| 32 |
+
"""
|
| 33 |
+
|
| 34 |
+
from __future__ import annotations
|
| 35 |
+
|
| 36 |
+
import json
|
| 37 |
+
import threading
|
| 38 |
+
from pathlib import Path
|
| 39 |
+
|
| 40 |
+
import pytest
|
| 41 |
+
|
| 42 |
+
from picarones.adapters.storage import (
|
| 43 |
+
ArtifactKey,
|
| 44 |
+
ArtifactStore,
|
| 45 |
+
FilesystemArtifactStore,
|
| 46 |
+
InMemoryArtifactStore,
|
| 47 |
+
StoredArtifact,
|
| 48 |
+
)
|
| 49 |
+
from picarones.domain.artifacts import Artifact, ArtifactType
|
| 50 |
+
from picarones.domain.provenance import ProvenanceRecord
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 54 |
+
# Helpers
|
| 55 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
def _make_artifact(
|
| 59 |
+
artifact_id: str = "d1:ocr:raw_text",
|
| 60 |
+
document_id: str = "d1",
|
| 61 |
+
artifact_type: ArtifactType = ArtifactType.RAW_TEXT,
|
| 62 |
+
content_hash: str | None = "0" * 64,
|
| 63 |
+
) -> Artifact:
|
| 64 |
+
return Artifact(
|
| 65 |
+
id=artifact_id,
|
| 66 |
+
document_id=document_id,
|
| 67 |
+
type=artifact_type,
|
| 68 |
+
content_hash=content_hash,
|
| 69 |
+
produced_by_step="ocr",
|
| 70 |
+
provenance=ProvenanceRecord(
|
| 71 |
+
code_version="1.0.0",
|
| 72 |
+
parameters_hash="a" * 64,
|
| 73 |
+
),
|
| 74 |
+
)
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
def _basic_key() -> ArtifactKey:
|
| 78 |
+
return ArtifactKey(
|
| 79 |
+
input_hashes=(("image", "f" * 64),),
|
| 80 |
+
adapter_name="tesseract",
|
| 81 |
+
adapter_version="5.3.0",
|
| 82 |
+
step_params={"lang": "fra"},
|
| 83 |
+
code_version="1.0.0",
|
| 84 |
+
)
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 88 |
+
# ArtifactKey
|
| 89 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
class TestArtifactKeyDataclass:
|
| 93 |
+
def test_default_values(self) -> None:
|
| 94 |
+
k = ArtifactKey()
|
| 95 |
+
assert k.input_hashes == ()
|
| 96 |
+
assert k.adapter_name == ""
|
| 97 |
+
assert k.adapter_version is None
|
| 98 |
+
assert k.step_params == {}
|
| 99 |
+
assert k.code_version == ""
|
| 100 |
+
assert k.normalization_profile is None
|
| 101 |
+
assert k.projection_name is None
|
| 102 |
+
assert k.projection_params == {}
|
| 103 |
+
assert k.metric_version is None
|
| 104 |
+
|
| 105 |
+
def test_frozen(self) -> None:
|
| 106 |
+
k = _basic_key()
|
| 107 |
+
with pytest.raises(Exception): # FrozenInstanceError
|
| 108 |
+
k.adapter_name = "different" # type: ignore[misc]
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
class TestArtifactKeyCanonicalJson:
|
| 112 |
+
def test_deterministic(self) -> None:
|
| 113 |
+
"""Deux clés équivalentes produisent le même JSON."""
|
| 114 |
+
k1 = ArtifactKey(
|
| 115 |
+
input_hashes=(("image", "a" * 64),),
|
| 116 |
+
adapter_name="x",
|
| 117 |
+
step_params={"a": 1, "b": 2},
|
| 118 |
+
code_version="v1",
|
| 119 |
+
)
|
| 120 |
+
k2 = ArtifactKey(
|
| 121 |
+
input_hashes=(("image", "a" * 64),),
|
| 122 |
+
adapter_name="x",
|
| 123 |
+
step_params={"b": 2, "a": 1}, # ordre différent
|
| 124 |
+
code_version="v1",
|
| 125 |
+
)
|
| 126 |
+
assert k1.to_canonical_json() == k2.to_canonical_json()
|
| 127 |
+
|
| 128 |
+
def test_inputs_sorted(self) -> None:
|
| 129 |
+
"""L'ordre des input_hashes ne change pas le JSON canonique."""
|
| 130 |
+
k1 = ArtifactKey(
|
| 131 |
+
input_hashes=(("image", "a" * 64), ("text", "b" * 64)),
|
| 132 |
+
adapter_name="x",
|
| 133 |
+
code_version="v",
|
| 134 |
+
)
|
| 135 |
+
k2 = ArtifactKey(
|
| 136 |
+
input_hashes=(("text", "b" * 64), ("image", "a" * 64)),
|
| 137 |
+
adapter_name="x",
|
| 138 |
+
code_version="v",
|
| 139 |
+
)
|
| 140 |
+
assert k1.to_canonical_json() == k2.to_canonical_json()
|
| 141 |
+
|
| 142 |
+
def test_unicode_preserved(self) -> None:
|
| 143 |
+
k = ArtifactKey(
|
| 144 |
+
input_hashes=(),
|
| 145 |
+
adapter_name="modèle",
|
| 146 |
+
step_params={"prompt": "français médiéval"},
|
| 147 |
+
code_version="v",
|
| 148 |
+
)
|
| 149 |
+
canonical = k.to_canonical_json()
|
| 150 |
+
assert "modèle" in canonical
|
| 151 |
+
assert "français médiéval" in canonical
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
class TestArtifactKeyHash:
|
| 155 |
+
def test_hash_is_64_hex_chars(self) -> None:
|
| 156 |
+
h = _basic_key().hash_hex()
|
| 157 |
+
assert h is not None
|
| 158 |
+
assert len(h) == 64
|
| 159 |
+
int(h, 16) # valide hex
|
| 160 |
+
|
| 161 |
+
def test_hash_stable_across_calls(self) -> None:
|
| 162 |
+
k = _basic_key()
|
| 163 |
+
assert k.hash_hex() == k.hash_hex()
|
| 164 |
+
|
| 165 |
+
def test_hash_changes_with_adapter_version(self) -> None:
|
| 166 |
+
k1 = ArtifactKey(
|
| 167 |
+
input_hashes=(("image", "a" * 64),),
|
| 168 |
+
adapter_name="x",
|
| 169 |
+
adapter_version="1.0",
|
| 170 |
+
code_version="v",
|
| 171 |
+
)
|
| 172 |
+
k2 = ArtifactKey(
|
| 173 |
+
input_hashes=(("image", "a" * 64),),
|
| 174 |
+
adapter_name="x",
|
| 175 |
+
adapter_version="2.0", # change
|
| 176 |
+
code_version="v",
|
| 177 |
+
)
|
| 178 |
+
assert k1.hash_hex() != k2.hash_hex()
|
| 179 |
+
|
| 180 |
+
def test_hash_changes_with_step_params(self) -> None:
|
| 181 |
+
k1 = ArtifactKey(
|
| 182 |
+
input_hashes=(("image", "a" * 64),),
|
| 183 |
+
adapter_name="x",
|
| 184 |
+
step_params={"lang": "fra"},
|
| 185 |
+
code_version="v",
|
| 186 |
+
)
|
| 187 |
+
k2 = ArtifactKey(
|
| 188 |
+
input_hashes=(("image", "a" * 64),),
|
| 189 |
+
adapter_name="x",
|
| 190 |
+
step_params={"lang": "eng"}, # change
|
| 191 |
+
code_version="v",
|
| 192 |
+
)
|
| 193 |
+
assert k1.hash_hex() != k2.hash_hex()
|
| 194 |
+
|
| 195 |
+
def test_hash_changes_with_normalization(self) -> None:
|
| 196 |
+
k1 = ArtifactKey(
|
| 197 |
+
input_hashes=(("image", "a" * 64),),
|
| 198 |
+
adapter_name="x",
|
| 199 |
+
code_version="v",
|
| 200 |
+
)
|
| 201 |
+
k2 = ArtifactKey(
|
| 202 |
+
input_hashes=(("image", "a" * 64),),
|
| 203 |
+
adapter_name="x",
|
| 204 |
+
code_version="v",
|
| 205 |
+
normalization_profile="medieval_french",
|
| 206 |
+
)
|
| 207 |
+
assert k1.hash_hex() != k2.hash_hex()
|
| 208 |
+
|
| 209 |
+
def test_hash_changes_with_projection(self) -> None:
|
| 210 |
+
k1 = ArtifactKey(
|
| 211 |
+
input_hashes=(("alto", "a" * 64),),
|
| 212 |
+
adapter_name="x",
|
| 213 |
+
code_version="v",
|
| 214 |
+
)
|
| 215 |
+
k2 = ArtifactKey(
|
| 216 |
+
input_hashes=(("alto", "a" * 64),),
|
| 217 |
+
adapter_name="x",
|
| 218 |
+
code_version="v",
|
| 219 |
+
projection_name="alto_to_text",
|
| 220 |
+
)
|
| 221 |
+
assert k1.hash_hex() != k2.hash_hex()
|
| 222 |
+
|
| 223 |
+
def test_hash_returns_none_if_input_hash_missing(self) -> None:
|
| 224 |
+
# Cas pathologique : un tuple avec hash vide.
|
| 225 |
+
k = ArtifactKey(
|
| 226 |
+
input_hashes=(("image", ""),),
|
| 227 |
+
adapter_name="x",
|
| 228 |
+
code_version="v",
|
| 229 |
+
)
|
| 230 |
+
assert k.hash_hex() is None
|
| 231 |
+
|
| 232 |
+
def test_empty_inputs_yields_valid_hash(self) -> None:
|
| 233 |
+
"""Pas d'inputs (tuple vide) ne signifie pas missing — c'est
|
| 234 |
+
valide pour les artefacts sans dépendance externe."""
|
| 235 |
+
k = ArtifactKey(
|
| 236 |
+
adapter_name="x",
|
| 237 |
+
code_version="v",
|
| 238 |
+
)
|
| 239 |
+
assert k.hash_hex() is not None
|
| 240 |
+
|
| 241 |
+
|
| 242 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 243 |
+
# InMemoryArtifactStore
|
| 244 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 245 |
+
|
| 246 |
+
|
| 247 |
+
class _SharedStoreContract:
|
| 248 |
+
"""Mixin abstrait : partage les tests entre InMemory et Filesystem."""
|
| 249 |
+
|
| 250 |
+
def make_store(self, tmp_path: Path) -> ArtifactStore:
|
| 251 |
+
raise NotImplementedError
|
| 252 |
+
|
| 253 |
+
def test_empty_store(self, tmp_path: Path) -> None:
|
| 254 |
+
store = self.make_store(tmp_path)
|
| 255 |
+
assert len(store) == 0
|
| 256 |
+
assert "any-key" not in store
|
| 257 |
+
assert store.get("any-key") is None
|
| 258 |
+
|
| 259 |
+
def test_put_then_get(self, tmp_path: Path) -> None:
|
| 260 |
+
store = self.make_store(tmp_path)
|
| 261 |
+
artifact = _make_artifact()
|
| 262 |
+
store.put("k1", artifact, payload=b"hello world")
|
| 263 |
+
assert "k1" in store
|
| 264 |
+
assert len(store) == 1
|
| 265 |
+
retrieved = store.get("k1")
|
| 266 |
+
assert retrieved is not None
|
| 267 |
+
assert retrieved.key == "k1"
|
| 268 |
+
assert retrieved.artifact.id == artifact.id
|
| 269 |
+
assert retrieved.payload == b"hello world"
|
| 270 |
+
|
| 271 |
+
def test_put_without_payload(self, tmp_path: Path) -> None:
|
| 272 |
+
store = self.make_store(tmp_path)
|
| 273 |
+
artifact = _make_artifact()
|
| 274 |
+
store.put("k1", artifact, payload=None)
|
| 275 |
+
retrieved = store.get("k1")
|
| 276 |
+
assert retrieved is not None
|
| 277 |
+
assert retrieved.payload is None
|
| 278 |
+
|
| 279 |
+
def test_put_idempotent_overwrites(self, tmp_path: Path) -> None:
|
| 280 |
+
store = self.make_store(tmp_path)
|
| 281 |
+
store.put("k1", _make_artifact(), payload=b"v1")
|
| 282 |
+
store.put("k1", _make_artifact(), payload=b"v2")
|
| 283 |
+
assert len(store) == 1
|
| 284 |
+
assert store.get("k1").payload == b"v2"
|
| 285 |
+
|
| 286 |
+
def test_clear(self, tmp_path: Path) -> None:
|
| 287 |
+
store = self.make_store(tmp_path)
|
| 288 |
+
store.put("k1", _make_artifact(), payload=b"x")
|
| 289 |
+
store.put("k2", _make_artifact(), payload=b"y")
|
| 290 |
+
assert len(store) == 2
|
| 291 |
+
store.clear()
|
| 292 |
+
assert len(store) == 0
|
| 293 |
+
assert "k1" not in store
|
| 294 |
+
assert "k2" not in store
|
| 295 |
+
|
| 296 |
+
def test_empty_key_rejected(self, tmp_path: Path) -> None:
|
| 297 |
+
store = self.make_store(tmp_path)
|
| 298 |
+
with pytest.raises(ValueError, match="vide"):
|
| 299 |
+
store.put("", _make_artifact(), payload=b"x")
|
| 300 |
+
|
| 301 |
+
def test_multiple_artifacts_independent(self, tmp_path: Path) -> None:
|
| 302 |
+
store = self.make_store(tmp_path)
|
| 303 |
+
a1 = _make_artifact(artifact_id="d1:art1", content_hash="1" * 64)
|
| 304 |
+
a2 = _make_artifact(artifact_id="d2:art2", content_hash="2" * 64)
|
| 305 |
+
store.put("k1", a1, payload=b"alpha")
|
| 306 |
+
store.put("k2", a2, payload=b"beta")
|
| 307 |
+
assert store.get("k1").artifact.id == "d1:art1"
|
| 308 |
+
assert store.get("k2").artifact.id == "d2:art2"
|
| 309 |
+
assert store.get("k1").payload == b"alpha"
|
| 310 |
+
assert store.get("k2").payload == b"beta"
|
| 311 |
+
|
| 312 |
+
|
| 313 |
+
class TestInMemoryArtifactStore(_SharedStoreContract):
|
| 314 |
+
def make_store(self, tmp_path: Path) -> ArtifactStore:
|
| 315 |
+
return InMemoryArtifactStore()
|
| 316 |
+
|
| 317 |
+
def test_keys_helper(self) -> None:
|
| 318 |
+
store = InMemoryArtifactStore()
|
| 319 |
+
store.put("k1", _make_artifact(), payload=b"x")
|
| 320 |
+
store.put("k2", _make_artifact(), payload=b"y")
|
| 321 |
+
keys = store.keys()
|
| 322 |
+
assert set(keys) == {"k1", "k2"}
|
| 323 |
+
|
| 324 |
+
def test_thread_safe_basic(self) -> None:
|
| 325 |
+
"""100 threads écrivent chacun 10 entrées → 1000 entrées."""
|
| 326 |
+
store = InMemoryArtifactStore()
|
| 327 |
+
artifact = _make_artifact()
|
| 328 |
+
|
| 329 |
+
def writer(i: int) -> None:
|
| 330 |
+
for j in range(10):
|
| 331 |
+
store.put(f"k_{i}_{j}", artifact, payload=b"x")
|
| 332 |
+
|
| 333 |
+
threads = [
|
| 334 |
+
threading.Thread(target=writer, args=(i,))
|
| 335 |
+
for i in range(100)
|
| 336 |
+
]
|
| 337 |
+
for t in threads:
|
| 338 |
+
t.start()
|
| 339 |
+
for t in threads:
|
| 340 |
+
t.join()
|
| 341 |
+
assert len(store) == 1000
|
| 342 |
+
|
| 343 |
+
|
| 344 |
+
class TestFilesystemArtifactStore(_SharedStoreContract):
|
| 345 |
+
def make_store(self, tmp_path: Path) -> ArtifactStore:
|
| 346 |
+
return FilesystemArtifactStore(tmp_path / "store")
|
| 347 |
+
|
| 348 |
+
def test_persists_across_instances(self, tmp_path: Path) -> None:
|
| 349 |
+
"""Le store sait re-charger ses entrées après ré-instanciation."""
|
| 350 |
+
root = tmp_path / "store"
|
| 351 |
+
s1 = FilesystemArtifactStore(root)
|
| 352 |
+
s1.put("k1", _make_artifact(), payload=b"persisted")
|
| 353 |
+
|
| 354 |
+
# Nouvelle instance pointant vers le même root.
|
| 355 |
+
s2 = FilesystemArtifactStore(root)
|
| 356 |
+
assert "k1" in s2
|
| 357 |
+
assert s2.get("k1").payload == b"persisted"
|
| 358 |
+
assert s2.get("k1").artifact.id == "d1:ocr:raw_text"
|
| 359 |
+
|
| 360 |
+
def test_layout(self, tmp_path: Path) -> None:
|
| 361 |
+
"""Vérifie le layout sur disque."""
|
| 362 |
+
root = tmp_path / "store"
|
| 363 |
+
s = FilesystemArtifactStore(root)
|
| 364 |
+
s.put("k1", _make_artifact(), payload=b"hello")
|
| 365 |
+
assert (root / "index.jsonl").exists()
|
| 366 |
+
assert (root / "artifacts" / "k1.json").exists()
|
| 367 |
+
assert (root / "payloads" / "k1.bin").exists()
|
| 368 |
+
# L'index contient une ligne JSON.
|
| 369 |
+
index_lines = (root / "index.jsonl").read_text(encoding="utf-8").splitlines()
|
| 370 |
+
assert len(index_lines) == 1
|
| 371 |
+
rec = json.loads(index_lines[0])
|
| 372 |
+
assert rec["key"] == "k1"
|
| 373 |
+
assert rec["artifact_id"] == "d1:ocr:raw_text"
|
| 374 |
+
assert rec["has_payload"] is True
|
| 375 |
+
|
| 376 |
+
def test_artifact_metadata_preserved(self, tmp_path: Path) -> None:
|
| 377 |
+
"""Les métadonnées de l'Artifact survivent au round-trip."""
|
| 378 |
+
root = tmp_path / "store"
|
| 379 |
+
s = FilesystemArtifactStore(root)
|
| 380 |
+
artifact = Artifact(
|
| 381 |
+
id="d1:complex",
|
| 382 |
+
document_id="d1",
|
| 383 |
+
type=ArtifactType.ALTO_XML,
|
| 384 |
+
content_hash="b" * 64,
|
| 385 |
+
uri="/tmp/some.xml",
|
| 386 |
+
produced_by_step="alto_step",
|
| 387 |
+
provenance=ProvenanceRecord(
|
| 388 |
+
code_version="2.5.1",
|
| 389 |
+
parameters_hash="c" * 64,
|
| 390 |
+
),
|
| 391 |
+
)
|
| 392 |
+
s.put("k1", artifact, payload=b"<alto/>")
|
| 393 |
+
s2 = FilesystemArtifactStore(root)
|
| 394 |
+
retrieved = s2.get("k1")
|
| 395 |
+
assert retrieved is not None
|
| 396 |
+
assert retrieved.artifact.id == artifact.id
|
| 397 |
+
assert retrieved.artifact.type == ArtifactType.ALTO_XML
|
| 398 |
+
assert retrieved.artifact.content_hash == artifact.content_hash
|
| 399 |
+
assert retrieved.artifact.uri == "/tmp/some.xml"
|
| 400 |
+
assert retrieved.artifact.provenance.code_version == "2.5.1"
|
| 401 |
+
assert retrieved.payload == b"<alto/>"
|
| 402 |
+
|
| 403 |
+
def test_corrupted_index_line_skipped(self, tmp_path: Path) -> None:
|
| 404 |
+
"""Une ligne corrompue de l'index ne plante pas le store."""
|
| 405 |
+
root = tmp_path / "store"
|
| 406 |
+
s1 = FilesystemArtifactStore(root)
|
| 407 |
+
s1.put("k1", _make_artifact(), payload=b"x")
|
| 408 |
+
# Corrompre l'index par ajout d'une ligne garbage.
|
| 409 |
+
(root / "index.jsonl").open("a", encoding="utf-8").write(
|
| 410 |
+
"this is not json\n"
|
| 411 |
+
)
|
| 412 |
+
s2 = FilesystemArtifactStore(root)
|
| 413 |
+
assert "k1" in s2 # Toujours présent malgré ligne corrompue
|
| 414 |
+
assert s2.get("k1") is not None
|
| 415 |
+
|
| 416 |
+
def test_artifact_file_missing_returns_none_with_warning(
|
| 417 |
+
self, tmp_path: Path, caplog: pytest.LogCaptureFixture,
|
| 418 |
+
) -> None:
|
| 419 |
+
"""Si l'index pointe vers un fichier supprimé, get retourne
|
| 420 |
+
None avec warning explicite (pas un crash)."""
|
| 421 |
+
root = tmp_path / "store"
|
| 422 |
+
s = FilesystemArtifactStore(root)
|
| 423 |
+
s.put("k1", _make_artifact(), payload=b"x")
|
| 424 |
+
# Supprimer le fichier d'artefact pour simuler corruption.
|
| 425 |
+
(root / "artifacts" / "k1.json").unlink()
|
| 426 |
+
result = s.get("k1")
|
| 427 |
+
assert result is None
|
| 428 |
+
assert any(
|
| 429 |
+
"n'existe plus" in r.message for r in caplog.records
|
| 430 |
+
)
|
| 431 |
+
|
| 432 |
+
def test_reconstruct_from_artifacts_dir_when_index_missing(
|
| 433 |
+
self, tmp_path: Path,
|
| 434 |
+
) -> None:
|
| 435 |
+
"""Si index.jsonl est manquant, reconstruction depuis
|
| 436 |
+
artifacts/."""
|
| 437 |
+
root = tmp_path / "store"
|
| 438 |
+
s1 = FilesystemArtifactStore(root)
|
| 439 |
+
s1.put("k1", _make_artifact(), payload=b"a")
|
| 440 |
+
s1.put("k2", _make_artifact(), payload=b"b")
|
| 441 |
+
# Effacer l'index, garder les artefacts.
|
| 442 |
+
(root / "index.jsonl").unlink()
|
| 443 |
+
s2 = FilesystemArtifactStore(root)
|
| 444 |
+
assert "k1" in s2
|
| 445 |
+
assert "k2" in s2
|
| 446 |
+
assert len(s2) == 2
|
| 447 |
+
|
| 448 |
+
def test_clear_removes_all_files(self, tmp_path: Path) -> None:
|
| 449 |
+
root = tmp_path / "store"
|
| 450 |
+
s = FilesystemArtifactStore(root)
|
| 451 |
+
s.put("k1", _make_artifact(), payload=b"x")
|
| 452 |
+
s.put("k2", _make_artifact(), payload=b"y")
|
| 453 |
+
s.clear()
|
| 454 |
+
assert len(s) == 0
|
| 455 |
+
# Les sous-répertoires existent toujours, juste vides.
|
| 456 |
+
assert (root / "artifacts").exists()
|
| 457 |
+
assert list((root / "artifacts").iterdir()) == []
|
| 458 |
+
assert list((root / "payloads").iterdir()) == []
|
| 459 |
+
assert not (root / "index.jsonl").exists()
|
| 460 |
+
|
| 461 |
+
|
| 462 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 463 |
+
# Intégration ArtifactKey + Store
|
| 464 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 465 |
+
|
| 466 |
+
|
| 467 |
+
class TestKeyStoreIntegration:
|
| 468 |
+
def test_store_keyed_by_artifact_key_hash(self, tmp_path: Path) -> None:
|
| 469 |
+
"""Le pattern d'usage attendu : compute key, then put with
|
| 470 |
+
key.hash_hex() as the store key."""
|
| 471 |
+
store = InMemoryArtifactStore()
|
| 472 |
+
key = _basic_key()
|
| 473 |
+
hash_hex = key.hash_hex()
|
| 474 |
+
assert hash_hex is not None
|
| 475 |
+
store.put(hash_hex, _make_artifact(), payload=b"raw text")
|
| 476 |
+
assert hash_hex in store
|
| 477 |
+
retrieved = store.get(hash_hex)
|
| 478 |
+
assert retrieved is not None
|
| 479 |
+
assert retrieved.payload == b"raw text"
|
| 480 |
+
|
| 481 |
+
def test_different_params_yield_different_keys_and_no_collision(
|
| 482 |
+
self, tmp_path: Path,
|
| 483 |
+
) -> None:
|
| 484 |
+
"""Deux clés conceptuellement différentes ne collisent pas."""
|
| 485 |
+
store = InMemoryArtifactStore()
|
| 486 |
+
k_fra = ArtifactKey(
|
| 487 |
+
input_hashes=(("image", "f" * 64),),
|
| 488 |
+
adapter_name="tess",
|
| 489 |
+
step_params={"lang": "fra"},
|
| 490 |
+
code_version="v",
|
| 491 |
+
)
|
| 492 |
+
k_eng = ArtifactKey(
|
| 493 |
+
input_hashes=(("image", "f" * 64),),
|
| 494 |
+
adapter_name="tess",
|
| 495 |
+
step_params={"lang": "eng"},
|
| 496 |
+
code_version="v",
|
| 497 |
+
)
|
| 498 |
+
store.put(k_fra.hash_hex(), _make_artifact(artifact_id="art:fra"))
|
| 499 |
+
store.put(k_eng.hash_hex(), _make_artifact(artifact_id="art:eng"))
|
| 500 |
+
assert len(store) == 2
|
| 501 |
+
assert store.get(k_fra.hash_hex()).artifact.id == "art:fra"
|
| 502 |
+
assert store.get(k_eng.hash_hex()).artifact.id == "art:eng"
|
| 503 |
+
|
| 504 |
+
|
| 505 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 506 |
+
# StoredArtifact dataclass
|
| 507 |
+
# ──────────────────────────────────────────────────────────────────────
|
| 508 |
+
|
| 509 |
+
|
| 510 |
+
class TestStoredArtifactDataclass:
|
| 511 |
+
def test_frozen(self) -> None:
|
| 512 |
+
sa = StoredArtifact(
|
| 513 |
+
key="k", artifact=_make_artifact(), payload=b"x",
|
| 514 |
+
)
|
| 515 |
+
with pytest.raises(Exception): # FrozenInstanceError
|
| 516 |
+
sa.payload = b"y" # type: ignore[misc]
|
|
@@ -84,6 +84,10 @@ FILE_BUDGETS: dict[str, int] = {
|
|
| 84 |
# plan immuable (validation + bindings + jonctions de métriques).
|
| 85 |
"picarones/pipeline/executor.py": 475, # actuel 413
|
| 86 |
"picarones/pipeline/planner.py": 465, # actuel 403
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
"picarones/core/corpus.py": 600, # actuel 511
|
| 88 |
"picarones/fixtures.py": 600, # actuel 510
|
| 89 |
"picarones/measurements/inter_engine.py": 575, # actuel 484
|
|
|
|
| 84 |
# plan immuable (validation + bindings + jonctions de métriques).
|
| 85 |
"picarones/pipeline/executor.py": 475, # actuel 413
|
| 86 |
"picarones/pipeline/planner.py": 465, # actuel 403
|
| 87 |
+
# Sprint A14-S29 — ArtifactStore (ABC + 2 implémentations) avec
|
| 88 |
+
# hash multi-paramètres pour adresser la critique d'audit n° 14
|
| 89 |
+
# « hash multi-paramètres + reprise par hash ».
|
| 90 |
+
"picarones/adapters/storage/artifact_store.py": 580, # actuel 504
|
| 91 |
"picarones/core/corpus.py": 600, # actuel 511
|
| 92 |
"picarones/fixtures.py": 600, # actuel 510
|
| 93 |
"picarones/measurements/inter_engine.py": 575, # actuel 484
|