File size: 28,653 Bytes
be32374
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
"""DriftCall demo Space β€” Gradio 5.x entrypoint.

Implements ``docs/modules/deploy_demo_space.md`` (sealed). Single-file demo:
mic β†’ ASR β†’ DriftCallEnv β†’ Gemma 3n E2B (base | trained LoRA) β†’ TTS β†’ speaker
with a live trace panel and a manual drift-injection dropdown.

Hard rules:
- Heavy deps (``gradio``, ``spaces``, ``peft``, ``torch``, ``transformers``,
  ``unsloth``) are imported lazily inside callables so this module imports
  cleanly in CI / on machines without GPUs / Gradio.
- ``infer_turn`` never writes to disk and never calls push-to-hub.
- Latency budget: ≀ 8 s on ZeroGPU warm, ≀ 12 s on A10G warm.
- All 9 error modes (deploy_demo_space.md Β§5) surface as ``status_msg`` and
  positional safe defaults; the UI never crashes.
"""

from __future__ import annotations

import contextlib
import logging
import os
import threading
import time
import uuid
from dataclasses import dataclass, field
from typing import TYPE_CHECKING, Any, Literal

import numpy as np

from cells.step_04_models import ActionType, DriftCallAction
from cells.step_06_drift_injector import list_patterns
from cells.step_10_env import DriftCallEnv

if TYPE_CHECKING:
    import pandas as pd

logger = logging.getLogger(__name__)


# ---------------------------------------------------------------------------
# Public types + constants
# ---------------------------------------------------------------------------

CheckpointId = Literal["base", "trained"]

_MAX_SESSIONS: int = 10
_IDLE_TTL_S: int = 900
_GPU_DURATION_S: int = 60
_LATENCY_BUDGET_S: float = 8.0
_TRACE_COLUMNS: tuple[str, ...] = (
    "turn_idx",
    "actor",
    "action_or_event",
    "tool_response_preview",
    "reward_delta",
)
_BASE_MODEL_ID_DEFAULT: str = "unsloth/gemma-3n-E2B-it"
_TRAINED_ADAPTER_ID_DEFAULT: str = "DGXAI/gemma-3n-e2b-driftcall-lora"
_HARDWARE_ENV_VAR: str = "DRIFTCALL_HARDWARE"
_HARDWARE_FALLBACK_ENV_VAR: str = "DRIFTCALL_HARDWARE_FALLBACK"
_TRACE_PREVIEW_LEN: int = 120
_FALLBACK_SR_HZ: int = 16000
_FALLBACK_SILENCE_LEN: int = _FALLBACK_SR_HZ  # 1 s of silence
_DRIFT_PATTERN_IDS: tuple[str, ...] = tuple(p.id for p in list_patterns())


# ---------------------------------------------------------------------------
# Errors (deploy_demo_space.md Β§5)
# ---------------------------------------------------------------------------


class TrainedAdapterMissingError(RuntimeError):
    """5.2 β€” LoRA download failed at boot or adapter file corrupt."""


class CheckpointMismatchError(RuntimeError):
    """5.5 β€” LoRA was trained on a different base_model_id."""


class SessionCapacityError(RuntimeError):
    """5.7 β€” > 10 concurrent sessions."""


class EnvStepError(RuntimeError):
    """5.8 β€” env raised during step()."""


class ZeroGPUUnavailableError(RuntimeError):
    """5.1 β€” @spaces.GPU request rejected."""


class AudioDecodeError(RuntimeError):
    """5.6 β€” ASR could not decode mic audio."""


# ---------------------------------------------------------------------------
# DemoSessionState (deploy_demo_space.md Β§4.1)
# ---------------------------------------------------------------------------


@dataclass
class TraceRow:
    turn_idx: int
    actor: Literal["user", "agent", "env", "drift", "reward"]
    action_or_event: str
    tool_response_preview: str
    reward_delta: float


@dataclass
class DemoSessionState:
    """Per-tab state. Mutated only by ``demo.app_gradio`` itself."""

    session_id: str
    env: DriftCallEnv
    last_observation: Any | None = None
    episode_trace: list[TraceRow] = field(default_factory=list)
    audio_buffer: list[bytes] = field(default_factory=list)
    current_checkpoint: CheckpointId = "base"
    turn_idx: int = 0
    created_at_ms: int = 0
    last_activity_ms: int = 0


# ---------------------------------------------------------------------------
# Process-wide registry
# ---------------------------------------------------------------------------

_REGISTRY: dict[str, DemoSessionState] = {}
_REGISTRY_LOCK = threading.Lock()


def _now_ms() -> int:
    return int(time.time() * 1000)


def _make_env() -> DriftCallEnv:
    """Build a fresh env. Audio boundary is enabled in production but the
    constructor requires real engines; tests inject a stub via ``_make_env``
    monkeypatch."""

    return DriftCallEnv()


def get_session(session_id: str) -> DemoSessionState:
    """Return the session for this UUID or create a fresh one. Β§3.3."""

    with _REGISTRY_LOCK:
        existing = _REGISTRY.get(session_id)
        if existing is not None:
            existing.last_activity_ms = _now_ms()
            return existing
        if len(_REGISTRY) >= _MAX_SESSIONS:
            raise SessionCapacityError(
                f"demo at capacity ({_MAX_SESSIONS} concurrent sessions)"
            )
        env = _make_env()
        state = DemoSessionState(
            session_id=session_id,
            env=env,
            created_at_ms=_now_ms(),
            last_activity_ms=_now_ms(),
        )
        _REGISTRY[session_id] = state
        return state


def reset_session(session_id: str) -> DemoSessionState:
    """Hard-reset: close env, drop trace, re-allocate. Β§3.5."""

    with _REGISTRY_LOCK:
        existing = _REGISTRY.pop(session_id, None)
    if existing is not None:
        try:
            existing.env.close()
        except Exception:
            logger.exception("env.close raised on reset_session for %s", session_id)
    return get_session(session_id)


def gc_sessions(max_idle_s: int = _IDLE_TTL_S) -> int:
    """Evict sessions idle past ``max_idle_s``. Returns the count evicted."""

    cutoff_ms = _now_ms() - max_idle_s * 1000
    with _REGISTRY_LOCK:
        stale = [sid for sid, s in _REGISTRY.items() if s.last_activity_ms < cutoff_ms]
        for sid in stale:
            entry = _REGISTRY.pop(sid)
            try:
                entry.env.close()
            except Exception:
                logger.exception("env.close raised on gc for %s", sid)
    return len(stale)


def _clear_registry_for_tests() -> None:
    """Pytest helper β€” never used in production code paths."""

    with _REGISTRY_LOCK:
        for entry in _REGISTRY.values():
            with contextlib.suppress(Exception):
                entry.env.close()
        _REGISTRY.clear()


# ---------------------------------------------------------------------------
# DriftToggleBridge (deploy_demo_space.md Β§2.5, Β§3.8, Β§7.3)
# ---------------------------------------------------------------------------


class DriftToggleBridge:
    """One-slot per-session queue with last-write-wins coalescence."""

    def __init__(self) -> None:
        self._slots: dict[str, str] = {}
        self._lock = threading.Lock()

    def queue(self, session_id: str, pattern_id: str | None) -> None:
        with self._lock:
            if pattern_id is None:
                self._slots.pop(session_id, None)
            else:
                self._slots[session_id] = pattern_id

    def consume(self, session_id: str) -> str | None:
        with self._lock:
            return self._slots.pop(session_id, None)


_DEFAULT_BRIDGE = DriftToggleBridge()


def get_drift_bridge() -> DriftToggleBridge:
    return _DEFAULT_BRIDGE


# ---------------------------------------------------------------------------
# Trace panel (deploy_demo_space.md Β§2.6, Β§4.3)
# ---------------------------------------------------------------------------


def render_trace(state: DemoSessionState) -> pd.DataFrame:
    """Pure rendering β€” never mutates ``state``."""

    import pandas as pd

    rows = [
        {
            "turn_idx": r.turn_idx,
            "actor": r.actor,
            "action_or_event": r.action_or_event,
            "tool_response_preview": r.tool_response_preview,
            "reward_delta": float(r.reward_delta),
        }
        for r in state.episode_trace
    ]
    return pd.DataFrame(rows, columns=list(_TRACE_COLUMNS))


def _empty_trace_df() -> pd.DataFrame:
    import pandas as pd

    return pd.DataFrame([], columns=list(_TRACE_COLUMNS))


# ---------------------------------------------------------------------------
# ModelLoader (deploy_demo_space.md Β§2.3, Β§3.2)
# ---------------------------------------------------------------------------


class ModelLoader:
    """Process-wide singleton. Holds the 4-bit base model + LoRA adapter."""

    def __init__(
        self,
        *,
        base_model_id: str = _BASE_MODEL_ID_DEFAULT,
        trained_adapter_id: str = _TRAINED_ADAPTER_ID_DEFAULT,
        max_seq_length: int = 4096,
    ) -> None:
        self._base_model_id = base_model_id
        self._trained_adapter_id = trained_adapter_id
        self._max_seq_length = max_seq_length
        self._model: Any | None = None
        self._tokenizer: Any | None = None
        self._trained_available: bool = False
        self._lock = threading.Lock()

    def _load_base(self) -> tuple[Any, Any]:
        """Load the 4-bit base model. Patched by tests via ``_load_base``."""

        from transformers import AutoModelForCausalLM, AutoTokenizer

        tok_cls: Any = AutoTokenizer
        model_cls: Any = AutoModelForCausalLM
        tokenizer = tok_cls.from_pretrained(self._base_model_id)
        model = model_cls.from_pretrained(self._base_model_id)
        return model, tokenizer

    def _try_mount_adapter(self, model: Any) -> Any | None:
        """Mount ``self._trained_adapter_id`` as the ``driftcall`` LoRA. None on miss."""

        try:
            from peft import PeftModel
        except ImportError as exc:
            logger.warning("peft import failed: %s", exc)
            return None
        try:
            return PeftModel.from_pretrained(
                model, self._trained_adapter_id, adapter_name="driftcall"
            )
        except CheckpointMismatchError as exc:
            logger.warning("checkpoint mismatch on LoRA mount: %s", exc)
            return None
        except Exception as exc:
            # Captures EntryNotFoundError, HTTPError(404), generic peft failures.
            logger.warning("LoRA mount failed: %s", exc)
            return None

    def ensure_loaded(self) -> None:
        """Lazy load β€” first ZeroGPU-decorated call invokes this."""

        with self._lock:
            if self._model is not None:
                return
            base_model, tokenizer = self._load_base()
            self._tokenizer = tokenizer
            wrapped = self._try_mount_adapter(base_model)
            if wrapped is None:
                self._model = base_model
                self._trained_available = False
            else:
                self._model = wrapped
                self._trained_available = True

    def is_trained_available(self) -> bool:
        return self._trained_available

    def generate(
        self,
        messages: list[dict[str, str]],
        *,
        checkpoint: CheckpointId,
        max_new_tokens: int = 256,
        temperature: float = 0.2,
        top_p: float = 0.95,
        seed: int = 0,
    ) -> str:
        """Run the correct adapter and return the completion text."""

        self.ensure_loaded()
        if checkpoint == "trained" and not self._trained_available:
            raise TrainedAdapterMissingError(
                "Trained adapter unavailable; falling back to base"
            )
        model = self._model
        if model is None:
            raise RuntimeError("model unexpectedly None")

        if checkpoint == "base":
            ctx = model.disable_adapter() if hasattr(model, "disable_adapter") else _NullCtx()
            with ctx:
                return self._run(model, messages, max_new_tokens, temperature, top_p, seed)
        # trained
        if hasattr(model, "set_adapter"):
            model.set_adapter("driftcall")
        if hasattr(model, "enable_adapter_layers"):
            model.enable_adapter_layers()
        return self._run(model, messages, max_new_tokens, temperature, top_p, seed)

    def _run(
        self,
        model: Any,
        messages: list[dict[str, str]],
        max_new_tokens: int,
        temperature: float,
        top_p: float,
        seed: int,
    ) -> str:
        """Default implementation hits HF generate. Tests stub via ``_run``."""

        try:
            return str(
                model.generate(
                    messages=messages,
                    max_new_tokens=max_new_tokens,
                    temperature=temperature,
                    top_p=top_p,
                    seed=seed,
                )
            )
        except Exception as exc:
            raise RuntimeError(f"model.generate raised: {exc}") from exc


class _NullCtx:
    """Trivial context manager fallback when peft.disable_adapter is absent."""

    def __enter__(self) -> _NullCtx:
        return self

    def __exit__(self, *exc: Any) -> Literal[False]:
        return False


_MODEL_LOADER: ModelLoader | None = None
_MODEL_LOADER_LOCK = threading.Lock()


def get_model_loader() -> ModelLoader:
    """Return process-wide singleton, instantiated on first call."""

    global _MODEL_LOADER
    with _MODEL_LOADER_LOCK:
        if _MODEL_LOADER is None:
            _MODEL_LOADER = ModelLoader()
        return _MODEL_LOADER


def _reset_model_loader_for_tests() -> None:
    global _MODEL_LOADER
    with _MODEL_LOADER_LOCK:
        _MODEL_LOADER = None


# ---------------------------------------------------------------------------
# Hardware probing (deploy_demo_space.md Β§3.1)
# ---------------------------------------------------------------------------


@dataclass(frozen=True)
class HardwareProbe:
    zerogpu: bool
    a10g: bool


def _probe_hardware() -> HardwareProbe:
    """Probe both targets. Patched in tests."""

    fallback = os.environ.get(_HARDWARE_FALLBACK_ENV_VAR, "")
    hardware = os.environ.get(_HARDWARE_ENV_VAR, "zero-gpu")
    return HardwareProbe(zerogpu=hardware == "zero-gpu", a10g=fallback == "a10g")


class DeploymentAbortedError(RuntimeError):
    """Raised when neither ZeroGPU nor A10G is available."""


def deploy_check() -> str:
    """Return the README front-matter ``hardware:`` value to write.

    Raises ``DeploymentAbortedError("both-gpus-unavailable")`` when both
    fallbacks fail; the pitch reverts to a pre-recorded video.
    """

    probe = _probe_hardware()
    if probe.zerogpu:
        return "zero-gpu"
    if probe.a10g:
        logger.info("zero-gpu unavailable; falling back to A10G small")
        return "a10g-small"
    logger.warning("Fall back to pre-recorded video β€” see risk_book.md")
    raise DeploymentAbortedError("both-gpus-unavailable")


# ---------------------------------------------------------------------------
# Audio helpers
# ---------------------------------------------------------------------------


def _safe_silence() -> tuple[int, np.ndarray]:
    """1 s of silence at 16 kHz β€” used as the safe-default audio output."""

    return _FALLBACK_SR_HZ, np.zeros(_FALLBACK_SILENCE_LEN, dtype=np.float32)


def _safe_defaults() -> tuple[str, tuple[int, np.ndarray], pd.DataFrame, dict[str, float], str]:
    return "", _safe_silence(), _empty_trace_df(), {}, ""


def _truncate_preview(payload: Any) -> str:
    text = str(payload)
    if len(text) <= _TRACE_PREVIEW_LEN:
        return text
    return text[: _TRACE_PREVIEW_LEN - 1] + "…"


# ---------------------------------------------------------------------------
# infer_turn (deploy_demo_space.md Β§2.2)
# ---------------------------------------------------------------------------


def infer_turn(
    audio_tuple: tuple[int, np.ndarray] | None,
    checkpoint: CheckpointId,
    manual_drift: str | None,
    session_id: str,
    *,
    text_input: str | None = None,
) -> tuple[str, tuple[int, np.ndarray], pd.DataFrame, dict[str, float], str]:
    """Handle one mic-to-speaker turn. Never raises β€” surfaces every error
    via the ``status_msg`` slot of the return tuple."""

    # 1. Session β€” error 5.7.
    try:
        state = get_session(session_id)
    except SessionCapacityError:
        empty, silence, df, defaults, _ = _safe_defaults()
        return empty, silence, df, defaults, "Demo at capacity β€” try again in a minute."

    # 2. Audio input β€” error 5.3.
    if audio_tuple is None and not (text_input and text_input.strip()):
        empty, silence, _df, empty_rewards, _ = _safe_defaults()
        return (
            empty,
            silence,
            render_trace(state),
            empty_rewards,
            "No audio received; press mic or type a brief.",
        )

    # 3. ASR β€” error 5.6.
    transcript: str = ""
    if audio_tuple is not None:
        try:
            transcript = _run_asr(audio_tuple)
        except AudioDecodeError:
            empty2, silence2, _df2, empty_rewards2, _ = _safe_defaults()
            return (
                empty2,
                silence2,
                render_trace(state),
                empty_rewards2,
                "Could not decode mic audio; please try again.",
            )
    else:
        transcript = (text_input or "").strip()

    # 4. Drift consume.
    forced = get_drift_bridge().consume(session_id)
    if manual_drift is not None:
        forced = manual_drift

    # 5. Build action β€” first-turn agent uses transcript as a SPEAK.
    state.turn_idx += 1
    state.episode_trace.append(
        TraceRow(
            turn_idx=state.turn_idx,
            actor="user",
            action_or_event=transcript,
            tool_response_preview="",
            reward_delta=0.0,
        )
    )
    if forced is not None:
        state.episode_trace.append(
            TraceRow(
                turn_idx=state.turn_idx,
                actor="drift",
                action_or_event=f"manual:{forced}",
                tool_response_preview="",
                reward_delta=0.0,
            )
        )

    action = DriftCallAction(
        action_type=ActionType.SPEAK,
        message=transcript or "(empty)",
    )

    # 6. Env step β€” error 5.8.
    try:
        if forced is not None:
            obs = state.env.step(action, force_drift_pattern=forced)
        else:
            obs = state.env.step(action)
        state.last_observation = obs
        state.episode_trace.append(
            TraceRow(
                turn_idx=state.turn_idx,
                actor="env",
                action_or_event="200 OK",
                tool_response_preview=_truncate_preview(obs.last_transcript),
                reward_delta=0.0,
            )
        )
    except Exception as exc:
        state.episode_trace.append(
            TraceRow(
                turn_idx=state.turn_idx,
                actor="env",
                action_or_event=f"rejected: {exc.__class__.__name__}",
                tool_response_preview=_truncate_preview(str(exc)),
                reward_delta=0.0,
            )
        )
        return (
            transcript,
            _safe_silence(),
            render_trace(state),
            {},
            f"Env rejected action: {exc}; episode unchanged.",
        )

    # 7. Generate β€” errors 5.1 / 5.2 / 5.4.
    loader = get_model_loader()
    use_checkpoint: CheckpointId = checkpoint
    if checkpoint == "trained" and not loader.is_trained_available():
        use_checkpoint = "base"
        status_warning = "Trained adapter unavailable; showing base model only."
    else:
        status_warning = ""

    reply: str
    try:
        reply = _generate_with_retries(loader, transcript, use_checkpoint)
    except TrainedAdapterMissingError:
        use_checkpoint = "base"
        status_warning = "Trained adapter unavailable; showing base model only."
        try:
            reply = _generate_with_retries(loader, transcript, use_checkpoint)
        except Exception as exc2:
            return (
                transcript,
                _safe_silence(),
                render_trace(state),
                {},
                f"Generate failed: {exc2}",
            )
    except _OOMRetryFailure:
        return (
            transcript,
            _safe_silence(),
            render_trace(state),
            {},
            "GPU out of memory this turn; reducing context and retrying.",
        )
    except _ZeroGPUFailure:
        return (
            transcript,
            _safe_silence(),
            render_trace(state),
            {},
            "GPU unavailable; the demo is running on CPU and will be slow.",
        )
    except _TimeoutFailure:
        return (
            transcript,
            _safe_silence(),
            render_trace(state),
            {},
            "Turn timed out after 60 s β€” the model was slow; try again.",
        )
    except Exception as exc:
        return (
            transcript,
            _safe_silence(),
            render_trace(state),
            {},
            f"Generate failed: {exc}",
        )

    state.episode_trace.append(
        TraceRow(
            turn_idx=state.turn_idx,
            actor="agent",
            action_or_event=f"SPEAK \"{reply[:60]}\"",
            tool_response_preview="",
            reward_delta=0.0,
        )
    )

    # 8. TTS.
    try:
        audio_out = _run_tts(reply, lang_hint="en")
    except Exception:
        audio_out = _safe_silence()

    rewards: dict[str, float] = {}
    state.last_activity_ms = _now_ms()
    return transcript, audio_out, render_trace(state), rewards, status_warning


# ---------------------------------------------------------------------------
# Subroutines (kept module-level so tests can patch each one)
# ---------------------------------------------------------------------------


class _OOMRetryFailure(RuntimeError):
    pass


class _ZeroGPUFailure(RuntimeError):
    pass


class _TimeoutFailure(RuntimeError):
    pass


def _run_asr(audio_tuple: tuple[int, np.ndarray]) -> str:
    """Default ASR path. Tests patch this to bypass the audio singleton."""

    sr, wav = audio_tuple
    if sr != 16000:
        # Silent fallback rather than raising β€” gradio mic clips are usually 16k.
        return ""
    pcm_bytes = wav.astype(np.float32).tobytes()
    from cells.step_09_audio import get_asr_engine

    asr = get_asr_engine()
    result = asr.transcribe(pcm_bytes, "en")
    return result.text


def _run_tts(text: str, *, lang_hint: str = "en") -> tuple[int, np.ndarray]:
    """Default TTS path."""

    from cells.step_09_audio import get_tts_engine

    tts = get_tts_engine()
    lang_for_tts: Any = lang_hint
    out: tuple[int, np.ndarray] = tts.synthesize_to_gradio(text, lang_for_tts)
    return out


def _generate_with_retries(
    loader: ModelLoader, transcript: str, checkpoint: CheckpointId
) -> str:
    """Wraps ``ModelLoader.generate`` with the OOM / ZeroGPU retry policy."""

    messages = [{"role": "user", "content": transcript or "(empty)"}]
    try:
        return loader.generate(messages, checkpoint=checkpoint, max_new_tokens=256)
    except ZeroGPUUnavailableError:
        # 5.1 β€” retry once, then fall back.
        time.sleep(0)  # advisory; tests patch
        try:
            return loader.generate(messages, checkpoint=checkpoint, max_new_tokens=256)
        except ZeroGPUUnavailableError as exc:
            raise _ZeroGPUFailure() from exc
    except TimeoutError as exc:
        raise _TimeoutFailure() from exc
    except Exception as exc:
        # Treat any CUDA OOM (real or stub) as 5.4.
        msg = str(exc).lower()
        if "out of memory" in msg or exc.__class__.__name__ == "OutOfMemoryError":
            try:
                _empty_cuda_cache()
                # Shrink context: drop oldest message + reduce tokens.
                shrunk = messages[1:] if len(messages) > 1 else messages
                return loader.generate(
                    shrunk, checkpoint=checkpoint, max_new_tokens=128
                )
            except Exception as exc2:
                msg2 = str(exc2).lower()
                if "out of memory" in msg2 or exc2.__class__.__name__ == "OutOfMemoryError":
                    raise _OOMRetryFailure() from exc2
                raise
        raise


def _empty_cuda_cache() -> None:
    """Best-effort CUDA cache clear. Tests patch this."""

    try:
        import torch

        if hasattr(torch, "cuda") and hasattr(torch.cuda, "empty_cache"):
            torch.cuda.empty_cache()
    except Exception:
        return


# ---------------------------------------------------------------------------
# Warmup + UI builder
# ---------------------------------------------------------------------------


def warmup_on_boot() -> None:
    """Page in CUDA kernels: ASR + TTS + dummy generate."""

    try:
        from cells.step_09_audio import get_asr_engine, get_tts_engine

        asr = get_asr_engine()
        tts = get_tts_engine()
        if hasattr(asr, "warmup"):
            asr.warmup()
        if hasattr(tts, "warmup"):
            tts.warmup()
    except Exception:
        logger.exception("audio warmup failed")
    try:
        loader = get_model_loader()
        loader.ensure_loaded()
        loader.generate(
            [{"role": "user", "content": "warmup"}], checkpoint="base", max_new_tokens=4
        )
    except Exception:
        logger.exception("model warmup failed")


def build_ui() -> Any:
    """Construct the Gradio Blocks graph. Idempotent and pure."""

    import gradio as gr

    loader = get_model_loader()
    trained_ok = False
    try:
        trained_ok = loader.is_trained_available()
    except Exception:
        trained_ok = False
    radio_choices = ["base", "trained"] if trained_ok else ["base"]
    radio_label = (
        "Checkpoint"
        if trained_ok
        else "Checkpoint (Trained adapter unavailable at boot β€” base only)"
    )
    drift_choices: list[Any] = [*_DRIFT_PATTERN_IDS, None]

    with gr.Blocks(title="DriftCall Demo") as demo:
        session_state = gr.State(value=str(uuid.uuid4()))
        with gr.Row():
            mic = gr.Audio(
                sources=["microphone"],
                type="numpy",
                label="Speak your brief",
            )
            speaker = gr.Audio(
                type="numpy",
                label="Speaker",
                interactive=False,
            )
        with gr.Row():
            checkpoint = gr.Radio(
                choices=radio_choices,
                value="base",
                label=radio_label,
            )
            drift = gr.Dropdown(
                choices=drift_choices,
                value=None,
                label="Manual drift",
            )
        textbox = gr.Textbox(
            placeholder="Or type a brief here",
            label="Text fallback",
        )
        transcript = gr.Textbox(label="Transcript", interactive=False)
        trace = gr.DataFrame(
            value=_empty_trace_df(),
            wrap=True,
            max_height=400,
            interactive=False,
            headers=list(_TRACE_COLUMNS),
        )
        rewards = gr.JSON(label="Rewards")
        status = gr.Textbox(label="Status", interactive=False)
        reset_btn = gr.Button("New episode")

        def _on_submit(
            mic_in: Any,
            ckpt: Any,
            drift_pat: Any,
            sid: Any,
            text_in: Any,
        ) -> Any:
            return infer_turn(mic_in, ckpt, drift_pat, sid, text_input=text_in)

        mic.stop_recording(
            _on_submit,
            inputs=[mic, checkpoint, drift, session_state, textbox],
            outputs=[transcript, speaker, trace, rewards, status],
        )

        def _on_reset(sid: Any) -> Any:
            reset_session(sid)
            return "", _empty_trace_df(), {}, "Episode reset."

        reset_btn.click(
            _on_reset,
            inputs=[session_state],
            outputs=[transcript, trace, rewards, status],
        )

    return demo


def _launch_for_production() -> None:
    warmup_on_boot()
    ui = build_ui()
    ui.launch(server_name="0.0.0.0", server_port=7860, ssr_mode=False)


if __name__ == "__main__":
    _launch_for_production()


__all__ = [
    "AudioDecodeError",
    "CheckpointId",
    "CheckpointMismatchError",
    "DemoSessionState",
    "DeploymentAbortedError",
    "DriftToggleBridge",
    "EnvStepError",
    "HardwareProbe",
    "ModelLoader",
    "SessionCapacityError",
    "TraceRow",
    "TrainedAdapterMissingError",
    "ZeroGPUUnavailableError",
    "build_ui",
    "deploy_check",
    "gc_sessions",
    "get_drift_bridge",
    "get_model_loader",
    "get_session",
    "infer_turn",
    "render_trace",
    "reset_session",
    "warmup_on_boot",
]