File size: 3,901 Bytes
80d8c84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# Validation Map β€” `replicalab/utils/validation.py`

> Deterministic protocol validation against scenario constraints.
> Pure functions β€” no LLM calls, no side effects.
>
> **Tasks implemented:** MOD 05

## Public API

### `validate_protocol(protocol: Protocol, scenario: NormalizedScenarioPack) -> ValidationResult`
Main entry point. Never raises β€” always returns a `ValidationResult`.

**Checks run (in order):**
1. `_check_obvious_impossibilities` β€” sample_size < 1, no controls, duration < 1
2. `_check_duration_vs_time_limit` β€” protocol days vs lab time_limit_days
3. `_check_equipment_vocabulary` β€” items vs available/booked/substitutable
4. `_check_reagent_vocabulary` β€” items vs in-stock/out-of-stock/substitutable
5. `_check_required_element_coverage` β€” protocol text vs hidden_reference_spec.required_elements

**Result:** `valid=True` only if zero ERROR-level issues.

## Data Classes

### `IssueSeverity(str, Enum)`
| Value | Meaning |
|-------|---------|
| `error` | Hard failure β€” protocol cannot proceed |
| `warning` | Advisory β€” protocol is suboptimal but possible |

### `ValidationIssue(BaseModel)` β€” `extra="forbid"`
| Field | Type | Example |
|-------|------|---------|
| `severity` | `IssueSeverity` | `ERROR` |
| `category` | `str` | `"equipment"`, `"duration"`, `"sample_size"` |
| `message` | `str` | `"Equipment 'X' is booked and has no substitution."` |

### `ValidationResult(BaseModel)` β€” `extra="forbid"`
| Field | Type |
|-------|------|
| `valid` | `bool` |
| `issues` | `list[ValidationIssue]` |

**Properties:**
- `errors` β†’ `list[ValidationIssue]` (severity=ERROR only)
- `warnings` β†’ `list[ValidationIssue]` (severity=WARNING only)

## Check Details

### `_check_obvious_impossibilities`
| Condition | Severity | Category |
|-----------|----------|----------|
| `sample_size < 1` | ERROR | `sample_size` |
| `controls` empty | WARNING | `controls` |
| `duration_days < 1` | ERROR | `duration` |

### `_check_duration_vs_time_limit`
| Condition | Severity | Category |
|-----------|----------|----------|
| `duration_days > time_limit_days` | ERROR | `duration` |

### `_check_equipment_vocabulary`
| Condition | Severity | Category |
|-----------|----------|----------|
| Item available | β€” (pass) | β€” |
| Item booked + has substitution | WARNING | `equipment` |
| Item booked + no substitution | ERROR | `equipment` |
| Item unknown (not in inventory) | WARNING | `equipment` |

### `_check_reagent_vocabulary`
| Condition | Severity | Category |
|-----------|----------|----------|
| Item in stock | β€” (pass) | β€” |
| Item out of stock + has substitution | WARNING | `reagent` |
| Item out of stock + no substitution | ERROR | `reagent` |
| Item unknown (not in inventory) | WARNING | `reagent` |

### `_check_required_element_coverage`
Checks each `hidden_reference_spec.required_elements` against protocol text fields using token matching.

**Protocol text searched:** technique, rationale, controls, equipment, reagents (joined, lowercased).
**Token extraction:** `_element_tokens(element)` splits on spaces, keeps tokens with 3+ chars.
**Match:** any token from element found in protocol text β†’ covered.

| Condition | Severity | Category |
|-----------|----------|----------|
| Element not addressed | WARNING | `required_element` |

## Internal Helpers

| Function | Purpose |
|----------|---------|
| `_normalize(label)` | Lowercase, strip, collapse whitespace |
| `_element_tokens(element)` | Split element string into searchable tokens (3+ chars) |
| `_substitution_alternatives(scenario)` | Set of normalized original items from `allowed_substitutions` |

## Who Consumes This

- **`lab_manager_policy.py`** β€” `check_feasibility()` calls `validate_protocol()` and wraps result in `protocol` DimensionCheck
- **`scoring/`** (future) β€” JDG 01 rigor score will reuse `_element_tokens` for required element matching