Spaces:

bardd
/

RegTech

Sleeping

App Files Files Community

bardd commited on 6 days ago

Commit

916ecde

verified ·

1 Parent(s): fd8a620

Upload 48 files

Browse files

Files changed (9) hide show

.env.example +3 -0
README.md +22 -10
prompts/README.md +9 -4
prompts/cma.md +16 -2
prompts/fca.md +17 -2
prompts/legal_basis.md +69 -0
prompts/pra.md +17 -3
prompts/validation.md +35 -9
server.py +585 -197

.env.example CHANGED Viewed

@@ -1,9 +1,12 @@
 GOOGLE_API_KEY=
 GEMINI_API_KEY=
 GEMINI_CLI_BINARY=gemini
 MAX_IMAGE_BYTES=8388608
 MAX_BATCH_IMAGES=20
 MAX_PARALLEL_WORKERS=4
 LANGSMITH_API_KEY=
 LANGSMITH_TRACING=true
 LANGSMITH_PROJECT=regtechdemo-hf-v2

 GOOGLE_API_KEY=
 GEMINI_API_KEY=
 GEMINI_CLI_BINARY=gemini
+GEMINI_TIMEOUT_SEC=90
 MAX_IMAGE_BYTES=8388608
 MAX_BATCH_IMAGES=20
 MAX_PARALLEL_WORKERS=4
+PIPELINE_STAGE_WORKERS=4
+VALIDATION_RETRY_PASSES=1
 LANGSMITH_API_KEY=
 LANGSMITH_TRACING=true
 LANGSMITH_PROJECT=regtechdemo-hf-v2

README.md CHANGED Viewed

@@ -10,24 +10,36 @@ pinned: false
 # TechReg Compliance Intelligence (HitSafe.ai)
-This space hosts the **HitSafe.ai** compliance screening engine, designed for UK fintech teams to screen marketing assets against FCA COBS 4 and Consumer Duty requirements.
-Architecture notes for the v2 pipeline live in [ARCHITECTURE.md](./ARCHITECTURE.md).
-## 🚀 Deployment
-This Space uses a **Docker** SDK on port **8080**.
-### Environment Variables
-Set the following secret in your Space settings:
-- `GEMINI_API_KEY`: Your Google Gemini API key.
-- `LANGSMITH_API_KEY`: Your LangSmith API key if you want full tracing.
-Optional runtime env vars:
 - `LANGSMITH_TRACING=true`
 - `LANGSMITH_PROJECT=regtechdemo-hf-v2`
 - `LANGSMITH_TRACE_USER_AD_COPY=true`
 - `LANGSMITH_TRACE_RAW_REQUEST=true`
-## 🛡️ Guidance
 This tool provides guidance only and is not legal advice.

 # TechReg Compliance Intelligence (HitSafe.ai)
+This Space hosts the **HitSafe.ai** compliance screening engine for UK fintech teams.
+Current v2 runtime model:
+- parallel `legal_basis + FCA + CMA + PRA` first pass
+- validation arbitration
+- one full retry pass when validation flags unresolved applicability, citation, or conflict issues
+Architecture notes live in [ARCHITECTURE.md](./ARCHITECTURE.md).
+## Deployment
+This Space uses the **Docker** SDK on port **8080**.
+### Required secrets
+- `GEMINI_API_KEY`: Your Google Gemini API key
+- `LANGSMITH_API_KEY`: Your LangSmith API key if tracing is enabled
+### Recommended runtime variables
 - `LANGSMITH_TRACING=true`
 - `LANGSMITH_PROJECT=regtechdemo-hf-v2`
 - `LANGSMITH_TRACE_USER_AD_COPY=true`
 - `LANGSMITH_TRACE_RAW_REQUEST=true`
+- `GEMINI_CLI_BINARY=gemini`
+- `MAX_IMAGE_BYTES=8388608`
+- `MAX_BATCH_IMAGES=20`
+- `MAX_PARALLEL_WORKERS=4`
+- `PIPELINE_STAGE_WORKERS=4`
+- `VALIDATION_RETRY_PASSES=1`
+- `GEMINI_TIMEOUT_SEC=90`
+## Guidance
 This tool provides guidance only and is not legal advice.

prompts/README.md CHANGED Viewed

@@ -3,11 +3,16 @@
 This folder is the repo-owned prompt registry for `HF_space_v2`.
 Design intent:
-- `router.md` decides which regulatory modules apply.
-- `fca.md` is the primary UK financial promotions review.
 - `cma.md` is for consumer fairness, misleading practices, and pricing/UX risk.
-- `pra.md` is conditional and only applies when prudential claims or prudentially regulated firms are involved.
-- `validation.md` performs a final consistency and citation-quality pass.
 - `legacy_oft.md` exists only to map historical OFT references and should not be used as a live regulator module.
 These files are intended to be loaded by application code. They should not depend on user-home CLI command config.

 This folder is the repo-owned prompt registry for `HF_space_v2`.
 Design intent:
+- `legal_basis.md` determines legal basis, perimeter position, claimed exemptions, and regulator applicability.
+- `fca.md` is the primary UK financial promotions and customer-communications review.
 - `cma.md` is for consumer fairness, misleading practices, and pricing/UX risk.
+- `pra.md` is for prudentially sensitive claims and prudentially regulated-firm representations.
+- `validation.md` performs the final arbitration, citation-quality check, applicability suppression, and retry decision.
 - `legacy_oft.md` exists only to map historical OFT references and should not be used as a live regulator module.
+Runtime model:
+- `legal_basis`, `fca`, `cma`, and `pra` run in parallel on pass one.
+- `validation` merges and arbitrates those results.
+- If validation identifies unresolved applicability, citation, or conflict issues, the runtime performs one full second pass with the first-pass history and validator critique included.
 These files are intended to be loaded by application code. They should not depend on user-home CLI command config.

prompts/cma.md CHANGED Viewed

@@ -1,5 +1,5 @@
 Role
-You are the CMA review module for misleading consumer treatment, unfair presentation, and pricing/fairness risk in UK customer-facing communications.
 Primary scope
 - Misleading consumer presentation
@@ -8,6 +8,11 @@ Primary scope
 - Dark-pattern style UX cues
 - Customer-facing wording and public documents where fairness/transparency issues are material
 Method
 - Focus on consumer fairness and misleading commercial presentation.
 - Do not force FCA conclusions here; this module is complementary.
@@ -17,12 +22,15 @@ Output
 Return JSON only:
 {
   "module": "cma",
   "summary": "string",
   "findings": [
     {
       "issue": "string",
       "rule_ref": "string",
-      "authority_type": "legislation|guidance|consumer_protection",
       "severity": "CRITICAL|HIGH|ADVISORY",
       "confidence": 0,
       "why": "string",
@@ -30,5 +38,11 @@ Return JSON only:
     }
   ],
   "safe_rewrite": "string",
   "manual_review_required": false
 }

 Role
+You are the CMA review module for misleading consumer treatment, unfair presentation, pricing/fairness risk, and consumer-facing communications in the UK.
 Primary scope
 - Misleading consumer presentation
 - Dark-pattern style UX cues
 - Customer-facing wording and public documents where fairness/transparency issues are material
+Mandatory research behavior
+- Use `google_web_search` before finalizing when current CMA, consumer law, pricing/fairness duties, or scope questions require verification.
+- Prefer official CMA, gov.uk, legislation.gov.uk, and FCA sources when scope overlaps financial promotions.
+- If CMA-style analysis is weakly connected to the case, return `applicability="uncertain"` or `not_apply` rather than forcing findings.
 Method
 - Focus on consumer fairness and misleading commercial presentation.
 - Do not force FCA conclusions here; this module is complementary.
 Return JSON only:
 {
   "module": "cma",
+  "applicability": "apply|not_apply|uncertain",
+  "why_applicable": "string",
   "summary": "string",
   "findings": [
     {
       "issue": "string",
       "rule_ref": "string",
+      "source_url": "string",
+      "authority_type": "legislation|guidance|consumer_protection|cma_guidance",
       "severity": "CRITICAL|HIGH|ADVISORY",
       "confidence": 0,
       "why": "string",
     }
   ],
   "safe_rewrite": "string",
+  "source_verification": {
+    "verification_timestamp": "ISO-8601 string",
+    "official_urls": ["string"],
+    "google_web_search_used": true,
+    "manual_review_required": false
+  },
   "manual_review_required": false
 }

prompts/fca.md CHANGED Viewed

@@ -6,23 +6,32 @@ Primary scope
 - PRIN 2A / Consumer Duty
 - FCA guidance relevant to social, digital, and customer-facing communications
 Method
 - Review the submission for fair, clear, and not misleading issues.
 - Evaluate text and visual salience together where images are provided.
 - Use exact rule references where confidence is high.
 - Distinguish binding rules from guidance.
-- If confidence is weak, mark manual review rather than overstating.
 Output
 Return JSON only:
 {
   "module": "fca",
   "summary": "string",
   "findings": [
     {
       "issue": "string",
       "rule_ref": "string",
-      "authority_type": "binding_rule|guidance",
       "severity": "CRITICAL|HIGH|ADVISORY",
       "confidence": 0,
       "why": "string",
@@ -30,5 +39,11 @@ Return JSON only:
     }
   ],
   "safe_rewrite": "string",
   "manual_review_required": false
 }

 - PRIN 2A / Consumer Duty
 - FCA guidance relevant to social, digital, and customer-facing communications
+Mandatory research behavior
+- Use `google_web_search` before finalizing when rule applicability, exemptions, current citations, policy statements, or current FCA guidance need verification.
+- Prefer official FCA, FCA Handbook, legislation.gov.uk, and gov.uk sources.
+- If FCA applicability is unclear, say so explicitly and mark `applicability="uncertain"` rather than overstating breaches.
 Method
 - Review the submission for fair, clear, and not misleading issues.
 - Evaluate text and visual salience together where images are provided.
 - Use exact rule references where confidence is high.
 - Distinguish binding rules from guidance.
+- Do not assume FCA rules apply just because the communication looks risky.
+- If the legal basis is genuinely out of scope or unclear, say that.
 Output
 Return JSON only:
 {
   "module": "fca",
+  "applicability": "apply|not_apply|uncertain",
+  "why_applicable": "string",
   "summary": "string",
   "findings": [
     {
       "issue": "string",
       "rule_ref": "string",
+      "source_url": "string",
+      "authority_type": "binding_rule|guidance|policy_statement|legislation",
       "severity": "CRITICAL|HIGH|ADVISORY",
       "confidence": 0,
       "why": "string",
     }
   ],
   "safe_rewrite": "string",
+  "source_verification": {
+    "verification_timestamp": "ISO-8601 string",
+    "official_urls": ["string"],
+    "google_web_search_used": true,
+    "manual_review_required": false
+  },
   "manual_review_required": false
 }

prompts/legal_basis.md ADDED Viewed

	@@ -0,0 +1,69 @@

+Role
+You are the legal-basis and perimeter review module for a UK regulated-communications pipeline.
+Primary objective
+Determine the strongest current legal basis for analysing the submission before any regulator-specific conclusions are relied on.
+Core tasks
+- Identify product type, channel, and audience.
+- Determine whether the communication appears to be:
+  - an FCA-regulated financial promotion
+  - a promotion claiming to rely on an exemption
+  - out of FCA perimeter
+  - legally uncertain
+- Check whether any claimed exemption is merely asserted or appears plausibly evidenced from the communication itself.
+- Decide whether FCA, CMA, and PRA analysis should be treated as applying, not applying, or remaining uncertain.
+Mandatory research behavior
+- Use `google_web_search` before finalizing whenever applicability, exemptions, rule scope, or temporal/current-law accuracy matters.
+- Use official sources where possible: FCA, FCA Handbook, legislation.gov.uk, gov.uk, Bank of England/PRA, CMA.
+- Do not rely on memory alone for exemption/perimeter conclusions.
+- If you cannot verify the legal basis confidently, set `manual_review_required=true` and mark applicability as `uncertain` rather than guessing.
+Method
+- Separate “exemption claimed in the ad” from “exemption verified as lawfully usable”.
+- Treat public, broad-distribution promotions with caution when exemptions depend on recipient status.
+- Flag missing or weak evidence for investor-status gating, approval route, or other perimeter conditions.
+- Prefer clear uncertainty over false certainty.
+Output
+Return JSON only:
+{
+  "module": "legal_basis",
+  "summary": "string",
+  "input_mode": "text|image|text+image|file",
+  "product_type": "string",
+  "channel": "string",
+  "audience": "string",
+  "promotion_scope": "regulated_financial_promotion|claimed_exempt_promotion|out_of_scope|uncertain",
+  "claimed_exemptions": [
+    {
+      "name": "string",
+      "status": "claimed|not_claimed|uncertain",
+      "evidence": "string"
+    }
+  ],
+  "applicability": {
+    "fca": "apply|not_apply|uncertain",
+    "cma": "apply|not_apply|uncertain",
+    "pra": "apply|not_apply|uncertain"
+  },
+  "legal_basis_findings": [
+    {
+      "issue": "string",
+      "rule_ref": "string",
+      "source_url": "string",
+      "severity": "CRITICAL|HIGH|ADVISORY",
+      "confidence": 0,
+      "why": "string",
+      "fix": "string"
+    }
+  ],
+  "source_verification": {
+    "verification_timestamp": "ISO-8601 string",
+    "official_urls": ["string"],
+    "google_web_search_used": true,
+    "manual_review_required": false
+  },
+  "manual_review_required": false
+}

prompts/pra.md CHANGED Viewed

@@ -7,21 +7,29 @@ Primary scope
 - Safety-of-funds framing
 - Prudential soundness representations by firms plausibly within PRA-relevant populations
 Method
-- Run only when the router identifies prudential relevance.
-- If the submission is mainly an FCA promotions matter, say that and keep findings narrow.
 - Avoid broad application to ordinary marketing copy unless prudential claims are explicit.
 Output
 Return JSON only:
 {
   "module": "pra",
   "summary": "string",
   "findings": [
     {
       "issue": "string",
       "rule_ref": "string",
-      "authority_type": "rule|statement|guidance",
       "severity": "CRITICAL|HIGH|ADVISORY",
       "confidence": 0,
       "why": "string",
@@ -29,5 +37,11 @@ Return JSON only:
     }
   ],
   "safe_rewrite": "string",
   "manual_review_required": false
 }

 - Safety-of-funds framing
 - Prudential soundness representations by firms plausibly within PRA-relevant populations
+Mandatory research behavior
+- Use `google_web_search` before finalizing when prudential rule scope, current PRA expectations, or firm-type applicability need verification.
+- Prefer Bank of England / PRA, FCA, and legislation.gov.uk sources.
+- If prudential relevance is weak, return `applicability="not_apply"` or `uncertain` instead of manufacturing findings.
 Method
+- Keep findings narrow and tied to prudential claims.
+- If the submission is mainly an FCA promotions matter, say that and avoid broad PRA conclusions.
 - Avoid broad application to ordinary marketing copy unless prudential claims are explicit.
 Output
 Return JSON only:
 {
   "module": "pra",
+  "applicability": "apply|not_apply|uncertain",
+  "why_applicable": "string",
   "summary": "string",
   "findings": [
     {
       "issue": "string",
       "rule_ref": "string",
+      "source_url": "string",
+      "authority_type": "rule|statement|guidance|legislation",
       "severity": "CRITICAL|HIGH|ADVISORY",
       "confidence": 0,
       "why": "string",
     }
   ],
   "safe_rewrite": "string",
+  "source_verification": {
+    "verification_timestamp": "ISO-8601 string",
+    "official_urls": ["string"],
+    "google_web_search_used": true,
+    "manual_review_required": false
+  },
   "manual_review_required": false
 }

prompts/validation.md CHANGED Viewed

@@ -1,19 +1,29 @@
 Role
-You are the final validation stage for a multi-module UK regulatory review pipeline.
 Inputs
-- Router output
 - FCA module output
-- CMA module output when present
-- PRA module output when present
 Task
 - Merge the module outputs into one final answer.
 - Remove duplicates.
-- Downgrade weak or inconsistent findings.
-- Check that rule references and severity levels are internally coherent.
-- If modules conflict, surface the conflict and set manual review.
-- Prefer caution over false certainty.
 Output
 Return JSON only:
@@ -21,11 +31,17 @@ Return JSON only:
   "overall_verdict": "PASS|FAIL|MANUAL_REVIEW",
   "risk_level": "low|medium|high",
   "summary": "string",
   "validated_findings": [
     {
-      "module": "fca|cma|pra",
       "issue": "string",
       "rule_ref": "string",
       "severity": "CRITICAL|HIGH|ADVISORY",
       "confidence": 0,
       "why": "string",
@@ -34,5 +50,15 @@ Return JSON only:
   ],
   "safe_rewrite": "string",
   "conflicts": ["string"],
   "manual_review_required": false
 }

 Role
+You are the final validation and arbitration stage for a multi-module UK regulatory review pipeline.
 Inputs
+- legal_basis output
 - FCA module output
+- CMA module output
+- PRA module output
+- optional prior pass history
+- optional validator critique for retry pass
+Mandatory research behavior
+- Use `google_web_search` again before finalizing whenever you need to verify current applicability, exemptions, rule citations, source URLs, or conflicts across module outputs.
+- Prefer official FCA, FCA Handbook, legislation.gov.uk, gov.uk, Bank of England/PRA, and CMA sources.
+- If official verification remains inconclusive, set `manual_review_required=true` rather than pretending certainty.
 Task
+- Treat legal-basis/perimeter analysis as a first-class constraint.
 - Merge the module outputs into one final answer.
+- Suppress findings from modules that are not applicable.
+- If applicability is uncertain, prefer `MANUAL_REVIEW` over definitive breach claims.
 - Remove duplicates.
+- Resolve or surface conflicts.
+- Check that cited rules, source URLs, and severity levels are coherent.
+- Decide whether one full retry pass is needed.
+- On a retry pass, use the prior pass history and validator critique to correct the second-pass outcome, not to repeat the same mistake.
 Output
 Return JSON only:
   "overall_verdict": "PASS|FAIL|MANUAL_REVIEW",
   "risk_level": "low|medium|high",
   "summary": "string",
+  "applicability_summary": {
+    "fca": "apply|not_apply|uncertain",
+    "cma": "apply|not_apply|uncertain",
+    "pra": "apply|not_apply|uncertain"
+  },
   "validated_findings": [
     {
+      "module": "legal_basis|fca|cma|pra",
       "issue": "string",
       "rule_ref": "string",
+      "source_url": "string",
       "severity": "CRITICAL|HIGH|ADVISORY",
       "confidence": 0,
       "why": "string",
   ],
   "safe_rewrite": "string",
   "conflicts": ["string"],
+  "retry_required": false,
+  "retry_targets": ["legal_basis", "fca", "cma", "pra"],
+  "retry_reason": "string",
+  "retry_guidance": ["string"],
+  "source_verification": {
+    "verification_timestamp": "ISO-8601 string",
+    "official_urls": ["string"],
+    "google_web_search_used": true,
+    "manual_review_required": false
+  },
   "manual_review_required": false
 }

server.py CHANGED Viewed

@@ -34,6 +34,8 @@ LOCKED_GEMINI_MODEL = "gemini-3-flash-preview"
 MAX_IMAGE_BYTES = int(os.environ.get("MAX_IMAGE_BYTES", str(8 * 1024 * 1024)))
 MAX_BATCH_IMAGES = int(os.environ.get("MAX_BATCH_IMAGES", "20"))
 MAX_PARALLEL_WORKERS = max(1, int(os.environ.get("MAX_PARALLEL_WORKERS", "4")))
 LANGSMITH_PROJECT = os.environ.get("LANGSMITH_PROJECT", "regtechdemo-hf-v2")
 LANGSMITH_TRACE_USER_AD_COPY = (
     os.environ.get("LANGSMITH_TRACE_USER_AD_COPY", "true").strip().lower() == "true"
@@ -76,13 +78,15 @@ JSON_SCHEMA_HINT = {
     "safe_rewrite": "optional ad rewrite",
 }
 PROMPT_FILE_MAP = {
-    "router": "router.md",
     "fca": "fca.md",
     "cma": "cma.md",
     "pra": "pra.md",
     "validation": "validation.md",
 }
-PIPELINE_MODULE_ORDER = ["fca", "cma", "pra"]
 PROMPT_CACHE: dict[str, str] = {}
 if os.environ.get("LANGSMITH_API_KEY") and ls is None:
@@ -204,79 +208,43 @@ def build_submission_block(
     return "\n".join(parts)
-def build_router_prompt(
-    *,
-    ad_text: str,
-    extra_context: str,
-    image_at_path: str | None,
-    system_prompt: str,
-    request_id: str | None = None,
-) -> str:
-    with traced_stage(
-        "build_router_prompt",
-        "tool",
-        inputs=sanitize_for_langsmith(
-            {
-                "ad_text": ad_text,
-                "extra_context": extra_context,
-                "image_at_path": image_at_path,
-                "system_prompt": system_prompt,
-            },
-            ad_text=ad_text,
-        ),
-        metadata={"request_id": request_id},
-        tags=["prompt-build", "router"],
-    ) as (_run, outputs):
-        operator_override = get_operator_override(system_prompt)
-        prompt = [
-            load_prompt_template("router"),
-            "",
-            build_submission_block(
-                ad_text=ad_text,
-                extra_context=extra_context,
-                image_at_path=image_at_path,
-            ),
-        ]
-        if operator_override:
-            prompt += ["", "Additional operator instructions:", operator_override]
-        full_prompt = "\n".join(prompt).strip()
-        outputs["prompt"] = sanitize_for_langsmith(full_prompt, ad_text=ad_text)
-        return full_prompt
-def build_module_prompt(
-    module_name: str,
     *,
     ad_text: str,
     extra_context: str,
     image_at_path: str | None,
     system_prompt: str,
-    router_output: dict[str, Any],
     request_id: str | None = None,
 ) -> str:
     with traced_stage(
-        f"build_{module_name}_prompt",
         "tool",
         inputs=sanitize_for_langsmith(
             {
-                "module": module_name,
                 "ad_text": ad_text,
                 "extra_context": extra_context,
                 "image_at_path": image_at_path,
                 "system_prompt": system_prompt,
-                "router_output": router_output,
             },
             ad_text=ad_text,
         ),
-        metadata={"request_id": request_id, "module": module_name},
-        tags=["prompt-build", module_name],
     ) as (_run, outputs):
         operator_override = get_operator_override(system_prompt)
         prompt = [
-            load_prompt_template(module_name),
             "",
-            "Router output JSON:",
-            json.dumps(router_output, ensure_ascii=True, indent=2),
             "",
             build_submission_block(
                 ad_text=ad_text,
@@ -284,6 +252,18 @@ def build_module_prompt(
                 image_at_path=image_at_path,
             ),
         ]
         if operator_override:
             prompt += ["", "Additional operator instructions:", operator_override]
         full_prompt = "\n".join(prompt).strip()
@@ -297,8 +277,11 @@ def build_validation_prompt(
     extra_context: str,
     image_at_path: str | None,
     system_prompt: str,
-    router_output: dict[str, Any],
     module_outputs: dict[str, dict[str, Any]],
     request_id: str | None = None,
 ) -> str:
     with traced_stage(
@@ -310,20 +293,26 @@ def build_validation_prompt(
                 "extra_context": extra_context,
                 "image_at_path": image_at_path,
                 "system_prompt": system_prompt,
-                "router_output": router_output,
                 "module_outputs": module_outputs,
             },
             ad_text=ad_text,
         ),
-        metadata={"request_id": request_id},
         tags=["prompt-build", "validation"],
     ) as (_run, outputs):
         operator_override = get_operator_override(system_prompt)
         prompt = [
             load_prompt_template("validation"),
             "",
-            "Router output JSON:",
-            json.dumps(router_output, ensure_ascii=True, indent=2),
             "",
             "Module outputs JSON:",
             json.dumps(module_outputs, ensure_ascii=True, indent=2),
@@ -334,6 +323,18 @@ def build_validation_prompt(
                 image_at_path=image_at_path,
             ),
         ]
         if operator_override:
             prompt += ["", "Additional operator instructions:", operator_override]
         full_prompt = "\n".join(prompt).strip()
@@ -684,12 +685,6 @@ def dedupe_preserve_order(values: list[str]) -> list[str]:
         output.append(key)
     return output
-def normalize_module_name(module_name: str) -> str:
-    value = str(module_name or "").strip().lower()
-    return value if value in PIPELINE_MODULE_ORDER else ""
 def stage_result(
     stage_name: str,
     ok: bool,
@@ -725,48 +720,191 @@ def run_named_stage(
     return stage_result(stage_name, ok, status, result)
-def default_router_output(ad_text: str, image_at_path: str | None) -> dict[str, Any]:
     return {
         "input_mode": infer_input_mode(ad_text, image_at_path),
         "product_type": "unknown",
         "channel": "unknown",
         "audience": "unknown",
-        "recommended_modules": ["fca"],
-        "routing_rationale": ["Router failed or returned invalid JSON; defaulted to FCA-only review."],
         "manual_review_required": True,
     }
-def coerce_router_output(
     stage: dict[str, Any],
     *,
     ad_text: str,
     image_at_path: str | None,
 ) -> dict[str, Any]:
     parsed = stage.get("parsed_output")
-    fallback = default_router_output(ad_text, image_at_path)
     if not isinstance(parsed, dict):
         return fallback
-    recommended_raw = parsed.get("recommended_modules")
-    recommended: list[str] = []
-    if isinstance(recommended_raw, list):
-        for item in recommended_raw:
-            normalized = normalize_module_name(str(item))
-            if normalized:
-                recommended.append(normalized)
-    if "fca" not in recommended:
-        recommended.insert(0, "fca")
-    routing_rationale = parsed.get("routing_rationale")
     return {
         "input_mode": str(parsed.get("input_mode") or infer_input_mode(ad_text, image_at_path)),
         "product_type": str(parsed.get("product_type") or "unknown"),
         "channel": str(parsed.get("channel") or "unknown"),
         "audience": str(parsed.get("audience") or "unknown"),
-        "recommended_modules": dedupe_preserve_order(recommended or ["fca"]),
-        "routing_rationale": routing_rationale if isinstance(routing_rationale, list) else fallback["routing_rationale"],
-        "manual_review_required": bool(parsed.get("manual_review_required", False) or not stage.get("ok")),
     }
@@ -774,56 +912,124 @@ def coerce_module_output(module_name: str, stage: dict[str, Any]) -> dict[str, A
     parsed = stage.get("parsed_output")
     fallback = {
         "module": module_name,
         "summary": f"{module_name.upper()} module did not return valid JSON.",
         "findings": [],
         "safe_rewrite": "",
         "manual_review_required": True,
     }
     if not isinstance(parsed, dict):
         return fallback
-    findings = parsed.get("findings")
-    normalized_findings: list[dict[str, Any]] = []
-    if isinstance(findings, list):
-        for finding in findings:
-            if not isinstance(finding, dict):
-                continue
-            normalized_findings.append(
-                {
-                    "issue": str(finding.get("issue") or "Unspecified issue"),
-                    "rule_ref": str(finding.get("rule_ref") or "Unknown"),
-                    "authority_type": str(finding.get("authority_type") or "unknown"),
-                    "severity": str(finding.get("severity") or "ADVISORY").upper(),
-                    "confidence": int(finding.get("confidence") or 0),
-                    "why": str(finding.get("why") or "No explanation provided."),
-                    "fix": str(finding.get("fix") or "No fix provided."),
-                }
             )
     return {
         "module": normalize_module_name(str(parsed.get("module") or module_name)) or module_name,
         "summary": str(parsed.get("summary") or f"{module_name.upper()} module completed."),
-        "findings": normalized_findings,
         "safe_rewrite": str(parsed.get("safe_rewrite") or ""),
-        "manual_review_required": bool(parsed.get("manual_review_required", False) or not stage.get("ok")),
     }
 def synthesize_validation_output(
-    router_output: dict[str, Any],
     module_outputs: dict[str, dict[str, Any]],
 ) -> dict[str, Any]:
     validated_findings: list[dict[str, Any]] = []
     safe_rewrite = ""
-    manual_review_required = bool(router_output.get("manual_review_required"))
-    for module_name in PIPELINE_MODULE_ORDER:
         module_output = module_outputs.get(module_name)
         if not module_output:
             continue
-        manual_review_required = manual_review_required or bool(module_output.get("manual_review_required"))
         if not safe_rewrite and module_output.get("safe_rewrite"):
             safe_rewrite = str(module_output.get("safe_rewrite"))
         for finding in module_output.get("findings", []):
             if not isinstance(finding, dict):
                 continue
@@ -832,25 +1038,82 @@ def synthesize_validation_output(
                     "module": module_name,
                     "issue": str(finding.get("issue") or "Unspecified issue"),
                     "rule_ref": str(finding.get("rule_ref") or "Unknown"),
                     "severity": str(finding.get("severity") or "ADVISORY").upper(),
-                    "confidence": int(finding.get("confidence") or 0),
                     "why": str(finding.get("why") or "No explanation provided."),
                     "fix": str(finding.get("fix") or "No fix provided."),
                 }
             )
     has_high = any(severity_rank(item.get("severity", "")) >= 2 for item in validated_findings)
-    risk_level = "high" if has_high else ("medium" if validated_findings or manual_review_required else "low")
-    overall_verdict = "MANUAL_REVIEW" if manual_review_required else ("FAIL" if validated_findings else "PASS")
-    summary = "No material issues identified." if not validated_findings else "Potential issues identified across selected regulatory modules."
     return {
         "overall_verdict": overall_verdict,
         "risk_level": risk_level,
         "summary": summary,
         "validated_findings": validated_findings,
         "safe_rewrite": safe_rewrite,
-        "conflicts": [],
         "manual_review_required": manual_review_required,
     }
@@ -858,27 +1121,35 @@ def synthesize_validation_output(
 def coerce_validation_output(
     stage: dict[str, Any],
     *,
-    router_output: dict[str, Any],
     module_outputs: dict[str, dict[str, Any]],
 ) -> dict[str, Any]:
     parsed = stage.get("parsed_output")
-    fallback = synthesize_validation_output(router_output, module_outputs)
     if not isinstance(parsed, dict):
         return fallback
-    validated_findings_raw = parsed.get("validated_findings")
     validated_findings: list[dict[str, Any]] = []
-    if isinstance(validated_findings_raw, list):
-        for finding in validated_findings_raw:
-            if not isinstance(finding, dict):
-                continue
             validated_findings.append(
                 {
-                    "module": normalize_module_name(str(finding.get("module") or "")) or "fca",
                     "issue": str(finding.get("issue") or "Unspecified issue"),
                     "rule_ref": str(finding.get("rule_ref") or "Unknown"),
                     "severity": str(finding.get("severity") or "ADVISORY").upper(),
-                    "confidence": int(finding.get("confidence") or 0),
                     "why": str(finding.get("why") or "No explanation provided."),
                     "fix": str(finding.get("fix") or "No fix provided."),
                 }
@@ -891,17 +1162,53 @@ def coerce_validation_output(
     if risk_level not in {"low", "medium", "high"}:
         risk_level = fallback["risk_level"]
     conflicts = parsed.get("conflicts")
     return {
         "overall_verdict": str(parsed.get("overall_verdict") or fallback["overall_verdict"]).upper(),
         "risk_level": risk_level,
         "summary": str(parsed.get("summary") or fallback["summary"]),
         "validated_findings": validated_findings,
         "safe_rewrite": str(parsed.get("safe_rewrite") or fallback["safe_rewrite"]),
         "conflicts": conflicts if isinstance(conflicts, list) else fallback["conflicts"],
-        "manual_review_required": bool(
-            parsed.get("manual_review_required", False) or fallback["manual_review_required"]
-        ),
     }
@@ -919,7 +1226,8 @@ def build_legacy_output(validation_output: dict[str, Any]) -> dict[str, Any]:
                 "fix": str(finding.get("fix") or "No fix provided."),
                 "module": str(finding.get("module") or "unknown"),
                 "severity": str(finding.get("severity") or "ADVISORY"),
-                "confidence": int(finding.get("confidence") or 0),
             }
         )
@@ -931,9 +1239,51 @@ def build_legacy_output(validation_output: dict[str, Any]) -> dict[str, Any]:
         "overall_verdict": validation_output.get("overall_verdict", "MANUAL_REVIEW"),
         "manual_review_required": bool(validation_output.get("manual_review_required", False)),
         "conflicts": validation_output.get("conflicts", []),
     }
 def run_review_pipeline(
     *,
     ad_text: str,
@@ -959,95 +1309,133 @@ def run_review_pipeline(
         metadata={"request_id": request_id, **(trace_metadata or {})},
         tags=["review-pipeline"],
     ) as (_run, outputs):
-        router_prompt = build_router_prompt(
-            ad_text=ad_text,
-            extra_context=extra_context,
-            image_at_path=image_at_path,
-            system_prompt=system_prompt,
-            request_id=request_id,
-        )
-        router_stage = run_named_stage(
-            "router",
-            router_prompt,
-            ad_text=ad_text,
-            request_id=request_id,
-            trace_metadata=trace_metadata,
-        )
-        router_output = coerce_router_output(router_stage, ad_text=ad_text, image_at_path=image_at_path)
-        selected_modules = [
-            module_name
-            for module_name in PIPELINE_MODULE_ORDER
-            if module_name in set(router_output.get("recommended_modules", []))
-        ]
-        if "fca" not in selected_modules:
-            selected_modules.insert(0, "fca")
-        selected_modules = dedupe_preserve_order(selected_modules)
-        module_stage_results: dict[str, dict[str, Any]] = {}
-        module_outputs: dict[str, dict[str, Any]] = {}
-        for module_name in selected_modules:
-            module_prompt = build_module_prompt(
-                module_name,
                 ad_text=ad_text,
                 extra_context=extra_context,
                 image_at_path=image_at_path,
                 system_prompt=system_prompt,
-                router_output=router_output,
                 request_id=request_id,
             )
-            module_stage = run_named_stage(
-                module_name,
-                module_prompt,
                 ad_text=ad_text,
                 request_id=request_id,
-                trace_metadata={"module": module_name, **(trace_metadata or {})},
             )
-            module_stage_results[module_name] = module_stage
-            module_outputs[module_name] = coerce_module_output(module_name, module_stage)
-        validation_prompt = build_validation_prompt(
-            ad_text=ad_text,
-            extra_context=extra_context,
-            image_at_path=image_at_path,
-            system_prompt=system_prompt,
-            router_output=router_output,
-            module_outputs=module_outputs,
-            request_id=request_id,
-        )
-        validation_stage = run_named_stage(
-            "validation",
-            validation_prompt,
-            ad_text=ad_text,
-            request_id=request_id,
-            trace_metadata=trace_metadata,
-        )
-        validation_output = coerce_validation_output(
-            validation_stage,
-            router_output=router_output,
-            module_outputs=module_outputs,
-        )
-        legacy_output = build_legacy_output(validation_output)
         pipeline_output = {
             "request_id": request_id,
             "input_mode": infer_input_mode(ad_text, image_at_path),
-            "router": {
-                "stage": router_stage,
-                "output": router_output,
-            },
-            "selected_modules": selected_modules,
-            "modules": {
-                module_name: {
-                    "stage": module_stage_results[module_name],
-                    "output": module_outputs[module_name],
-                }
-                for module_name in selected_modules
-            },
-            "validation": {
-                "stage": validation_stage,
-                "output": validation_output,
-            },
             "legacy_output": legacy_output,
         }
         outputs["pipeline_output"] = sanitize_for_langsmith(pipeline_output, ad_text=ad_text)

 MAX_IMAGE_BYTES = int(os.environ.get("MAX_IMAGE_BYTES", str(8 * 1024 * 1024)))
 MAX_BATCH_IMAGES = int(os.environ.get("MAX_BATCH_IMAGES", "20"))
 MAX_PARALLEL_WORKERS = max(1, int(os.environ.get("MAX_PARALLEL_WORKERS", "4")))
+PIPELINE_STAGE_WORKERS = max(1, int(os.environ.get("PIPELINE_STAGE_WORKERS", "4")))
+VALIDATION_RETRY_PASSES = max(0, int(os.environ.get("VALIDATION_RETRY_PASSES", "1")))
 LANGSMITH_PROJECT = os.environ.get("LANGSMITH_PROJECT", "regtechdemo-hf-v2")
 LANGSMITH_TRACE_USER_AD_COPY = (
     os.environ.get("LANGSMITH_TRACE_USER_AD_COPY", "true").strip().lower() == "true"
     "safe_rewrite": "optional ad rewrite",
 }
 PROMPT_FILE_MAP = {
+    "legal_basis": "legal_basis.md",
     "fca": "fca.md",
     "cma": "cma.md",
     "pra": "pra.md",
     "validation": "validation.md",
 }
+PIPELINE_STAGE_ORDER = ["legal_basis", "fca", "cma", "pra"]
+REGULATOR_STAGE_ORDER = ["fca", "cma", "pra"]
+ALL_REVIEW_STAGES = set(PIPELINE_STAGE_ORDER)
 PROMPT_CACHE: dict[str, str] = {}
 if os.environ.get("LANGSMITH_API_KEY") and ls is None:
     return "\n".join(parts)
+def build_parallel_stage_prompt(
+    stage_name: str,
     *,
     ad_text: str,
     extra_context: str,
     image_at_path: str | None,
     system_prompt: str,
+    pass_number: int,
+    prior_passes: list[dict[str, Any]] | None = None,
+    retry_context: dict[str, Any] | None = None,
     request_id: str | None = None,
 ) -> str:
     with traced_stage(
+        f"build_{stage_name}_prompt",
         "tool",
         inputs=sanitize_for_langsmith(
             {
+                "stage": stage_name,
                 "ad_text": ad_text,
                 "extra_context": extra_context,
                 "image_at_path": image_at_path,
                 "system_prompt": system_prompt,
+                "pass_number": pass_number,
+                "prior_passes": prior_passes or [],
+                "retry_context": retry_context or {},
             },
             ad_text=ad_text,
         ),
+        metadata={"request_id": request_id, "stage": stage_name, "pass_number": pass_number},
+        tags=["prompt-build", stage_name],
     ) as (_run, outputs):
         operator_override = get_operator_override(system_prompt)
         prompt = [
+            load_prompt_template(stage_name),
             "",
+            f"Pipeline pass: {pass_number}",
+            "This runtime uses Gemini CLI. When the prompt requires `google_web_search`, you must use it before finalizing if the tool is available.",
             "",
             build_submission_block(
                 ad_text=ad_text,
                 image_at_path=image_at_path,
             ),
         ]
+        if prior_passes:
+            prompt += [
+                "",
+                "Prior pipeline pass history JSON:",
+                json.dumps(prior_passes, ensure_ascii=True, indent=2),
+            ]
+        if retry_context:
+            prompt += [
+                "",
+                "Validator retry context JSON:",
+                json.dumps(retry_context, ensure_ascii=True, indent=2),
+            ]
         if operator_override:
             prompt += ["", "Additional operator instructions:", operator_override]
         full_prompt = "\n".join(prompt).strip()
     extra_context: str,
     image_at_path: str | None,
     system_prompt: str,
+    pass_number: int,
+    legal_basis_output: dict[str, Any],
     module_outputs: dict[str, dict[str, Any]],
+    prior_passes: list[dict[str, Any]] | None = None,
+    retry_context: dict[str, Any] | None = None,
     request_id: str | None = None,
 ) -> str:
     with traced_stage(
                 "extra_context": extra_context,
                 "image_at_path": image_at_path,
                 "system_prompt": system_prompt,
+                "pass_number": pass_number,
+                "legal_basis_output": legal_basis_output,
                 "module_outputs": module_outputs,
+                "prior_passes": prior_passes or [],
+                "retry_context": retry_context or {},
             },
             ad_text=ad_text,
         ),
+        metadata={"request_id": request_id, "pass_number": pass_number},
         tags=["prompt-build", "validation"],
     ) as (_run, outputs):
         operator_override = get_operator_override(system_prompt)
         prompt = [
             load_prompt_template("validation"),
             "",
+            f"Pipeline pass: {pass_number}",
+            "This runtime uses Gemini CLI. When the prompt requires `google_web_search`, you must use it before finalizing if the tool is available.",
+            "",
+            "Legal basis output JSON:",
+            json.dumps(legal_basis_output, ensure_ascii=True, indent=2),
             "",
             "Module outputs JSON:",
             json.dumps(module_outputs, ensure_ascii=True, indent=2),
                 image_at_path=image_at_path,
             ),
         ]
+        if prior_passes:
+            prompt += [
+                "",
+                "Prior pipeline pass history JSON:",
+                json.dumps(prior_passes, ensure_ascii=True, indent=2),
+            ]
+        if retry_context:
+            prompt += [
+                "",
+                "Validator retry context JSON:",
+                json.dumps(retry_context, ensure_ascii=True, indent=2),
+            ]
         if operator_override:
             prompt += ["", "Additional operator instructions:", operator_override]
         full_prompt = "\n".join(prompt).strip()
         output.append(key)
     return output
 def stage_result(
     stage_name: str,
     ok: bool,
     return stage_result(stage_name, ok, status, result)
+def normalize_stage_name(stage_name: str) -> str:
+    value = str(stage_name or "").strip().lower()
+    return value if value in ALL_REVIEW_STAGES else ""
+def normalize_module_name(module_name: str) -> str:
+    value = str(module_name or "").strip().lower()
+    return value if value in REGULATOR_STAGE_ORDER else ""
+def normalize_applicability(value: Any) -> str:
+    normalized = str(value or "").strip().lower()
+    if normalized in {"apply", "not_apply", "uncertain"}:
+        return normalized
+    return "uncertain"
+def normalize_confidence(value: Any) -> float:
+    try:
+        numeric = float(value)
+    except (TypeError, ValueError):
+        return 0.0
+    if numeric < 0:
+        return 0.0
+    if numeric > 100:
+        return 100.0
+    return round(numeric, 2)
+def normalize_string_list(value: Any) -> list[str]:
+    if not isinstance(value, list):
+        return []
+    items = [str(item).strip() for item in value if str(item).strip()]
+    return dedupe_preserve_order(items)
+def normalize_source_verification(value: Any) -> dict[str, Any]:
+    if not isinstance(value, dict):
+        return {
+            "verification_timestamp": "",
+            "official_urls": [],
+            "google_web_search_used": False,
+            "manual_review_required": True,
+        }
+    official_urls = normalize_string_list(
+        value.get("official_urls")
+        or value.get("source_urls")
+        or value.get("urls")
+        or []
+    )
+    if not official_urls:
+        official_urls = dedupe_preserve_order(
+            normalize_string_list(value.get("handbook_urls"))
+            + normalize_string_list(value.get("policy_urls"))
+            + normalize_string_list(value.get("policy_statement_urls"))
+            + normalize_string_list(value.get("legislation_urls"))
+        )
     return {
+        "verification_timestamp": str(value.get("verification_timestamp") or ""),
+        "official_urls": official_urls,
+        "google_web_search_used": bool(value.get("google_web_search_used", False)),
+        "manual_review_required": bool(value.get("manual_review_required", False)),
+    }
+def normalize_finding(
+    finding: dict[str, Any],
+    *,
+    default_module: str,
+    default_authority_type: str = "unknown",
+) -> dict[str, Any]:
+    return {
+        "module": default_module,
+        "issue": str(finding.get("issue") or "Unspecified issue"),
+        "rule_ref": str(finding.get("rule_ref") or "Unknown"),
+        "source_url": str(finding.get("source_url") or ""),
+        "authority_type": str(finding.get("authority_type") or default_authority_type),
+        "severity": str(finding.get("severity") or "ADVISORY").upper(),
+        "confidence": normalize_confidence(finding.get("confidence")),
+        "why": str(finding.get("why") or "No explanation provided."),
+        "fix": str(finding.get("fix") or "No fix provided."),
+    }
+def default_legal_basis_output(ad_text: str, image_at_path: str | None) -> dict[str, Any]:
+    return {
+        "module": "legal_basis",
+        "summary": "Legal basis could not be determined reliably.",
         "input_mode": infer_input_mode(ad_text, image_at_path),
         "product_type": "unknown",
         "channel": "unknown",
         "audience": "unknown",
+        "promotion_scope": "uncertain",
+        "claimed_exemptions": [],
+        "applicability": {
+            "fca": "uncertain",
+            "cma": "uncertain",
+            "pra": "uncertain",
+        },
+        "legal_basis_findings": [
+            {
+                "module": "legal_basis",
+                "issue": "Legal basis could not be verified",
+                "rule_ref": "Perimeter / exemption verification required",
+                "source_url": "",
+                "authority_type": "verification",
+                "severity": "ADVISORY",
+                "confidence": 0.0,
+                "why": "The legal-basis stage failed or returned invalid JSON, so regulator applicability is uncertain.",
+                "fix": "Re-run with verified official sources or escalate to manual review.",
+            }
+        ],
+        "source_verification": {
+            "verification_timestamp": "",
+            "official_urls": [],
+            "google_web_search_used": False,
+            "manual_review_required": True,
+        },
         "manual_review_required": True,
     }
+def coerce_legal_basis_output(
     stage: dict[str, Any],
     *,
     ad_text: str,
     image_at_path: str | None,
 ) -> dict[str, Any]:
     parsed = stage.get("parsed_output")
+    fallback = default_legal_basis_output(ad_text, image_at_path)
     if not isinstance(parsed, dict):
         return fallback
+    claimed_exemptions: list[dict[str, Any]] = []
+    for item in parsed.get("claimed_exemptions", []):
+        if not isinstance(item, dict):
+            continue
+        status = str(item.get("status") or "uncertain").strip().lower()
+        if status not in {"claimed", "not_claimed", "uncertain"}:
+            status = "uncertain"
+        claimed_exemptions.append(
+            {
+                "name": str(item.get("name") or "Unknown"),
+                "status": status,
+                "evidence": str(item.get("evidence") or ""),
+            }
+        )
+    legal_basis_findings: list[dict[str, Any]] = []
+    for finding in parsed.get("legal_basis_findings", []):
+        if isinstance(finding, dict):
+            legal_basis_findings.append(
+                normalize_finding(
+                    finding,
+                    default_module="legal_basis",
+                    default_authority_type="verification",
+                )
+            )
+    source_verification = normalize_source_verification(parsed.get("source_verification"))
+    manual_review_required = bool(
+        parsed.get("manual_review_required", False)
+        or source_verification.get("manual_review_required", False)
+        or not stage.get("ok")
+    )
     return {
+        "module": "legal_basis",
+        "summary": str(parsed.get("summary") or fallback["summary"]),
         "input_mode": str(parsed.get("input_mode") or infer_input_mode(ad_text, image_at_path)),
         "product_type": str(parsed.get("product_type") or "unknown"),
         "channel": str(parsed.get("channel") or "unknown"),
         "audience": str(parsed.get("audience") or "unknown"),
+        "promotion_scope": str(parsed.get("promotion_scope") or "uncertain"),
+        "claimed_exemptions": claimed_exemptions,
+        "applicability": {
+            "fca": normalize_applicability(parsed.get("applicability", {}).get("fca") if isinstance(parsed.get("applicability"), dict) else None),
+            "cma": normalize_applicability(parsed.get("applicability", {}).get("cma") if isinstance(parsed.get("applicability"), dict) else None),
+            "pra": normalize_applicability(parsed.get("applicability", {}).get("pra") if isinstance(parsed.get("applicability"), dict) else None),
+        },
+        "legal_basis_findings": legal_basis_findings or fallback["legal_basis_findings"],
+        "source_verification": source_verification,
+        "manual_review_required": manual_review_required,
     }
     parsed = stage.get("parsed_output")
     fallback = {
         "module": module_name,
+        "applicability": "uncertain",
+        "why_applicable": f"{module_name.upper()} applicability could not be verified.",
         "summary": f"{module_name.upper()} module did not return valid JSON.",
         "findings": [],
         "safe_rewrite": "",
+        "source_verification": {
+            "verification_timestamp": "",
+            "official_urls": [],
+            "google_web_search_used": False,
+            "manual_review_required": True,
+        },
         "manual_review_required": True,
     }
     if not isinstance(parsed, dict):
         return fallback
+    findings: list[dict[str, Any]] = []
+    for finding in parsed.get("findings", []):
+        if isinstance(finding, dict):
+            findings.append(
+                normalize_finding(
+                    finding,
+                    default_module=module_name,
+                )
             )
+    source_verification = normalize_source_verification(parsed.get("source_verification"))
     return {
         "module": normalize_module_name(str(parsed.get("module") or module_name)) or module_name,
+        "applicability": normalize_applicability(parsed.get("applicability")),
+        "why_applicable": str(parsed.get("why_applicable") or ""),
         "summary": str(parsed.get("summary") or f"{module_name.upper()} module completed."),
+        "findings": findings,
         "safe_rewrite": str(parsed.get("safe_rewrite") or ""),
+        "source_verification": source_verification,
+        "manual_review_required": bool(
+            parsed.get("manual_review_required", False)
+            or source_verification.get("manual_review_required", False)
+            or not stage.get("ok")
+        ),
     }
 def synthesize_validation_output(
+    legal_basis_output: dict[str, Any],
     module_outputs: dict[str, dict[str, Any]],
+    *,
+    pass_number: int,
 ) -> dict[str, Any]:
     validated_findings: list[dict[str, Any]] = []
+    conflicts: list[str] = []
     safe_rewrite = ""
+    source_urls = list(legal_basis_output.get("source_verification", {}).get("official_urls", []))
+    google_web_search_used = bool(
+        legal_basis_output.get("source_verification", {}).get("google_web_search_used", False)
+    )
+    applicability_summary = {
+        module_name: normalize_applicability(
+            legal_basis_output.get("applicability", {}).get(module_name)
+        )
+        for module_name in REGULATOR_STAGE_ORDER
+    }
+    manual_review_required = bool(legal_basis_output.get("manual_review_required", False))
+    for finding in legal_basis_output.get("legal_basis_findings", []):
+        if isinstance(finding, dict):
+            validated_findings.append(
+                {
+                    "module": "legal_basis",
+                    "issue": str(finding.get("issue") or "Unspecified issue"),
+                    "rule_ref": str(finding.get("rule_ref") or "Unknown"),
+                    "source_url": str(finding.get("source_url") or ""),
+                    "severity": str(finding.get("severity") or "ADVISORY").upper(),
+                    "confidence": normalize_confidence(finding.get("confidence")),
+                    "why": str(finding.get("why") or "No explanation provided."),
+                    "fix": str(finding.get("fix") or "No fix provided."),
+                }
+            )
+    for module_name in REGULATOR_STAGE_ORDER:
         module_output = module_outputs.get(module_name)
         if not module_output:
             continue
+        module_applicability = normalize_applicability(module_output.get("applicability"))
+        source_verification = module_output.get("source_verification", {})
+        source_urls.extend(source_verification.get("official_urls", []))
+        google_web_search_used = google_web_search_used or bool(source_verification.get("google_web_search_used", False))
+        legal_basis_applicability = applicability_summary.get(module_name, "uncertain")
+        effective_applicability = legal_basis_applicability
+        if effective_applicability == "uncertain" and module_applicability != "uncertain":
+            effective_applicability = module_applicability
+            applicability_summary[module_name] = module_applicability
+        if (
+            legal_basis_applicability != "uncertain"
+            and module_applicability != "uncertain"
+            and legal_basis_applicability != module_applicability
+        ):
+            conflicts.append(
+                f"{module_name.upper()} applicability conflict: legal_basis={legal_basis_applicability}, module={module_applicability}."
+            )
+            manual_review_required = True
+        if effective_applicability != "apply":
+            if module_output.get("findings"):
+                conflicts.append(
+                    f"{module_name.upper()} returned findings while applicability is {effective_applicability}."
+                )
+                manual_review_required = True
+            manual_review_required = manual_review_required or bool(module_output.get("manual_review_required", False))
+            continue
         if not safe_rewrite and module_output.get("safe_rewrite"):
             safe_rewrite = str(module_output.get("safe_rewrite"))
         for finding in module_output.get("findings", []):
             if not isinstance(finding, dict):
                 continue
                     "module": module_name,
                     "issue": str(finding.get("issue") or "Unspecified issue"),
                     "rule_ref": str(finding.get("rule_ref") or "Unknown"),
+                    "source_url": str(finding.get("source_url") or ""),
                     "severity": str(finding.get("severity") or "ADVISORY").upper(),
+                    "confidence": normalize_confidence(finding.get("confidence")),
                     "why": str(finding.get("why") or "No explanation provided."),
                     "fix": str(finding.get("fix") or "No fix provided."),
                 }
             )
+        manual_review_required = manual_review_required or bool(module_output.get("manual_review_required", False))
+    deduped_findings: list[dict[str, Any]] = []
+    seen_finding_keys: set[tuple[str, str, str]] = set()
+    for finding in validated_findings:
+        key = (
+            str(finding.get("module") or ""),
+            str(finding.get("issue") or ""),
+            str(finding.get("rule_ref") or ""),
+        )
+        if key in seen_finding_keys:
+            continue
+        seen_finding_keys.add(key)
+        deduped_findings.append(finding)
+    validated_findings = deduped_findings
+    source_urls = dedupe_preserve_order([url for url in source_urls if url])
+    applicability_uncertain = any(
+        applicability_summary.get(module_name) == "uncertain" for module_name in REGULATOR_STAGE_ORDER
+    )
+    if applicability_uncertain:
+        manual_review_required = True
     has_high = any(severity_rank(item.get("severity", "")) >= 2 for item in validated_findings)
+    if validated_findings:
+        risk_level = "high" if has_high else "medium"
+        overall_verdict = "FAIL"
+        summary = "Validated issues remain after legal-basis and regulator arbitration."
+    elif manual_review_required:
+        risk_level = "medium"
+        overall_verdict = "MANUAL_REVIEW"
+        summary = "No definitive breach set can be returned safely; manual review is required."
+    else:
+        risk_level = "low"
+        overall_verdict = "PASS"
+        summary = "No material issues identified after legal-basis and regulator arbitration."
+    retry_required = pass_number <= VALIDATION_RETRY_PASSES and bool(
+        conflicts or applicability_uncertain or not google_web_search_used or not source_urls
+    )
+    retry_guidance: list[str] = []
+    if conflicts:
+        retry_guidance.append("Resolve applicability conflicts between legal basis and regulator modules.")
+    if applicability_uncertain:
+        retry_guidance.append("Verify whether any claimed exemption or perimeter route is actually available.")
+    if not google_web_search_used:
+        retry_guidance.append("Use google_web_search and cite official sources before finalizing.")
+    if not source_urls:
+        retry_guidance.append("Return official source URLs for legal basis and cited rules.")
     return {
         "overall_verdict": overall_verdict,
         "risk_level": risk_level,
         "summary": summary,
+        "applicability_summary": applicability_summary,
         "validated_findings": validated_findings,
         "safe_rewrite": safe_rewrite,
+        "conflicts": dedupe_preserve_order(conflicts),
+        "retry_required": retry_required,
+        "retry_targets": list(PIPELINE_STAGE_ORDER) if retry_required else [],
+        "retry_reason": "; ".join(dedupe_preserve_order(retry_guidance)),
+        "retry_guidance": dedupe_preserve_order(retry_guidance),
+        "source_verification": {
+            "verification_timestamp": "",
+            "official_urls": source_urls,
+            "google_web_search_used": google_web_search_used,
+            "manual_review_required": manual_review_required,
+        },
         "manual_review_required": manual_review_required,
     }
 def coerce_validation_output(
     stage: dict[str, Any],
     *,
+    legal_basis_output: dict[str, Any],
     module_outputs: dict[str, dict[str, Any]],
+    pass_number: int,
 ) -> dict[str, Any]:
     parsed = stage.get("parsed_output")
+    fallback = synthesize_validation_output(legal_basis_output, module_outputs, pass_number=pass_number)
     if not isinstance(parsed, dict):
         return fallback
+    applicability_summary_raw = parsed.get("applicability_summary")
+    applicability_summary = dict(fallback["applicability_summary"])
+    if isinstance(applicability_summary_raw, dict):
+        for module_name in REGULATOR_STAGE_ORDER:
+            applicability_summary[module_name] = normalize_applicability(applicability_summary_raw.get(module_name))
     validated_findings: list[dict[str, Any]] = []
+    for finding in parsed.get("validated_findings", []):
+        if isinstance(finding, dict):
+            normalized_module = str(finding.get("module") or "").strip().lower()
+            if normalized_module not in {"legal_basis", *REGULATOR_STAGE_ORDER}:
+                normalized_module = "legal_basis"
             validated_findings.append(
                 {
+                    "module": normalized_module,
                     "issue": str(finding.get("issue") or "Unspecified issue"),
                     "rule_ref": str(finding.get("rule_ref") or "Unknown"),
+                    "source_url": str(finding.get("source_url") or ""),
                     "severity": str(finding.get("severity") or "ADVISORY").upper(),
+                    "confidence": normalize_confidence(finding.get("confidence")),
                     "why": str(finding.get("why") or "No explanation provided."),
                     "fix": str(finding.get("fix") or "No fix provided."),
                 }
     if risk_level not in {"low", "medium", "high"}:
         risk_level = fallback["risk_level"]
+    source_verification = normalize_source_verification(parsed.get("source_verification"))
+    manual_review_required = bool(
+        parsed.get("manual_review_required", False)
+        or fallback["manual_review_required"]
+        or source_verification.get("manual_review_required", False)
+    )
+    retry_required = bool(parsed.get("retry_required", False) or fallback["retry_required"])
+    if pass_number > VALIDATION_RETRY_PASSES:
+        retry_required = False
+    retry_targets = [
+        normalize_stage_name(item)
+        for item in parsed.get("retry_targets", [])
+        if normalize_stage_name(item)
+    ]
+    if retry_required and not retry_targets:
+        retry_targets = list(PIPELINE_STAGE_ORDER)
     conflicts = parsed.get("conflicts")
+    retry_guidance = parsed.get("retry_guidance")
     return {
         "overall_verdict": str(parsed.get("overall_verdict") or fallback["overall_verdict"]).upper(),
         "risk_level": risk_level,
         "summary": str(parsed.get("summary") or fallback["summary"]),
+        "applicability_summary": applicability_summary,
         "validated_findings": validated_findings,
         "safe_rewrite": str(parsed.get("safe_rewrite") or fallback["safe_rewrite"]),
         "conflicts": conflicts if isinstance(conflicts, list) else fallback["conflicts"],
+        "retry_required": retry_required,
+        "retry_targets": retry_targets,
+        "retry_reason": str(parsed.get("retry_reason") or fallback["retry_reason"]),
+        "retry_guidance": retry_guidance if isinstance(retry_guidance, list) else fallback["retry_guidance"],
+        "source_verification": {
+            "verification_timestamp": str(
+                source_verification.get("verification_timestamp")
+                or fallback["source_verification"]["verification_timestamp"]
+            ),
+            "official_urls": source_verification.get("official_urls")
+            or fallback["source_verification"]["official_urls"],
+            "google_web_search_used": bool(
+                source_verification.get("google_web_search_used")
+                or fallback["source_verification"]["google_web_search_used"]
+            ),
+            "manual_review_required": manual_review_required,
+        },
+        "manual_review_required": manual_review_required,
     }
                 "fix": str(finding.get("fix") or "No fix provided."),
                 "module": str(finding.get("module") or "unknown"),
                 "severity": str(finding.get("severity") or "ADVISORY"),
+                "confidence": normalize_confidence(finding.get("confidence")),
+                "source_url": str(finding.get("source_url") or ""),
             }
         )
         "overall_verdict": validation_output.get("overall_verdict", "MANUAL_REVIEW"),
         "manual_review_required": bool(validation_output.get("manual_review_required", False)),
         "conflicts": validation_output.get("conflicts", []),
+        "applicability_summary": validation_output.get("applicability_summary", {}),
+        "source_verification": validation_output.get("source_verification", {}),
     }
+def execute_parallel_stage_group(
+    stage_prompts: dict[str, str],
+    *,
+    ad_text: str,
+    request_id: str,
+    trace_metadata: dict[str, Any] | None = None,
+) -> dict[str, dict[str, Any]]:
+    stage_results: dict[str, dict[str, Any]] = {}
+    if not stage_prompts:
+        return stage_results
+    worker_count = min(PIPELINE_STAGE_WORKERS, len(stage_prompts))
+    with ThreadPoolExecutor(max_workers=worker_count) as executor:
+        future_map = {
+            executor.submit(
+                run_named_stage,
+                stage_name,
+                prompt,
+                ad_text=ad_text,
+                request_id=request_id,
+                trace_metadata={"parallel_group": True, **(trace_metadata or {})},
+            ): stage_name
+            for stage_name, prompt in stage_prompts.items()
+        }
+        for future in as_completed(future_map):
+            stage_name = future_map[future]
+            try:
+                stage_results[stage_name] = future.result()
+            except Exception as err:
+                stage_results[stage_name] = {
+                    "stage": stage_name,
+                    "ok": False,
+                    "status": 500,
+                    "parsed_output": None,
+                    "raw_output": None,
+                    "error": f"Unexpected stage error: {err}",
+                }
+    return stage_results
 def run_review_pipeline(
     *,
     ad_text: str,
         metadata={"request_id": request_id, **(trace_metadata or {})},
         tags=["review-pipeline"],
     ) as (_run, outputs):
+        passes: list[dict[str, Any]] = []
+        retry_context: dict[str, Any] | None = None
+        final_validation_output: dict[str, Any] | None = None
+        for pass_number in range(1, VALIDATION_RETRY_PASSES + 2):
+            stage_prompts = {
+                stage_name: build_parallel_stage_prompt(
+                    stage_name,
+                    ad_text=ad_text,
+                    extra_context=extra_context,
+                    image_at_path=image_at_path,
+                    system_prompt=system_prompt,
+                    pass_number=pass_number,
+                    prior_passes=passes,
+                    retry_context=retry_context,
+                    request_id=request_id,
+                )
+                for stage_name in PIPELINE_STAGE_ORDER
+            }
+            stage_results = execute_parallel_stage_group(
+                stage_prompts,
+                ad_text=ad_text,
+                request_id=request_id,
+                trace_metadata={"pass_number": pass_number, **(trace_metadata or {})},
+            )
+            legal_basis_stage = stage_results.get("legal_basis") or {
+                "stage": "legal_basis",
+                "ok": False,
+                "status": 500,
+                "parsed_output": None,
+                "raw_output": None,
+                "error": "Legal basis stage missing.",
+            }
+            legal_basis_output = coerce_legal_basis_output(
+                legal_basis_stage,
+                ad_text=ad_text,
+                image_at_path=image_at_path,
+            )
+            module_stage_results: dict[str, dict[str, Any]] = {}
+            module_outputs: dict[str, dict[str, Any]] = {}
+            for module_name in REGULATOR_STAGE_ORDER:
+                module_stage = stage_results.get(module_name) or {
+                    "stage": module_name,
+                    "ok": False,
+                    "status": 500,
+                    "parsed_output": None,
+                    "raw_output": None,
+                    "error": f"{module_name.upper()} stage missing.",
+                }
+                module_stage_results[module_name] = module_stage
+                module_outputs[module_name] = coerce_module_output(module_name, module_stage)
+            validation_prompt = build_validation_prompt(
                 ad_text=ad_text,
                 extra_context=extra_context,
                 image_at_path=image_at_path,
                 system_prompt=system_prompt,
+                pass_number=pass_number,
+                legal_basis_output=legal_basis_output,
+                module_outputs=module_outputs,
+                prior_passes=passes,
+                retry_context=retry_context,
                 request_id=request_id,
             )
+            validation_stage = run_named_stage(
+                "validation",
+                validation_prompt,
                 ad_text=ad_text,
                 request_id=request_id,
+                trace_metadata={"pass_number": pass_number, **(trace_metadata or {})},
+            )
+            validation_output = coerce_validation_output(
+                validation_stage,
+                legal_basis_output=legal_basis_output,
+                module_outputs=module_outputs,
+                pass_number=pass_number,
             )
+            pass_record = {
+                "pass_number": pass_number,
+                "parallel_stage_order": list(PIPELINE_STAGE_ORDER),
+                "parallel_stages": {
+                    "legal_basis": {
+                        "stage": legal_basis_stage,
+                        "output": legal_basis_output,
+                    },
+                    **{
+                        module_name: {
+                            "stage": module_stage_results[module_name],
+                            "output": module_outputs[module_name],
+                        }
+                        for module_name in REGULATOR_STAGE_ORDER
+                    },
+                },
+                "validation": {
+                    "stage": validation_stage,
+                    "output": validation_output,
+                },
+            }
+            passes.append(pass_record)
+            if validation_output.get("retry_required") and pass_number <= VALIDATION_RETRY_PASSES:
+                retry_context = {
+                    "retry_reason": validation_output.get("retry_reason", ""),
+                    "retry_targets": validation_output.get("retry_targets", list(PIPELINE_STAGE_ORDER)),
+                    "retry_guidance": validation_output.get("retry_guidance", []),
+                    "prior_validation_output": validation_output,
+                }
+                continue
+            final_validation_output = validation_output
+            break
+        if final_validation_output is None:
+            final_validation_output = passes[-1]["validation"]["output"]
+        legacy_output = build_legacy_output(final_validation_output)
         pipeline_output = {
             "request_id": request_id,
             "input_mode": infer_input_mode(ad_text, image_at_path),
+            "parallel_stage_order": list(PIPELINE_STAGE_ORDER),
+            "retry_performed": len(passes) > 1,
+            "total_passes": len(passes),
+            "passes": passes,
+            "final_validation": final_validation_output,
             "legacy_output": legacy_output,
         }
         outputs["pipeline_output"] = sanitize_for_langsmith(pipeline_output, ad_text=ad_text)