Spaces:
Sleeping
Sleeping
File size: 16,134 Bytes
4220abe 54ae6b2 082eeec 1586f03 fe78cbe 554ad7f 4ff65b8 5ef8031 262f512 4f4369e 54ae6b2 990b408 2f166af b1bd3b9 e75cf4f 71b7030 990b408 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | # HF Space Build Error Log β rubentuesday/vocal-mirror
This file is committed alongside every fix so the repo retains full context of what broke and why.
---
## Iteration 1 β 2026-04-11
**Stage:** CONFIG_ERROR
**Error:** `No candidate PyTorch version found for ZeroGPU`
**Root cause:** `requirements.txt` pinned `torch==2.5.1+cu121` and `torchaudio==2.5.1+cu121` with `--extra-index-url https://download.pytorch.org/whl/cu121`. ZeroGPU manages its own CUDA PyTorch installation and rejects spaces that pin a `+cu121`-suffixed variant β it fails at config parse time before any package install.
**Fix applied:**
- Removed `--extra-index-url https://download.pytorch.org/whl/cu121` from `requirements.txt`
- Removed `torch==2.5.1+cu121` and `torchaudio==2.5.1+cu121` from `requirements.txt` (ZeroGPU provides these)
- Changed `gradio>=5.0.0,<6.0` β `gradio==4.44.1` in `requirements.txt` (project rule: pin to 4.44.1)
- Changed `sdk_version: 5.0.0` β `sdk_version: 4.44.1` in `README.md` YAML frontmatter
**Result:** FAIL β CONFIG_ERROR resolved, but caused new RUNTIME_ERROR (see Iteration 2). Gradio 4.44.1 was wrong choice β reverted.
---
## Iteration 2 β 2026-04-11
**Stage:** RUNTIME_ERROR
**Error 1 (first):** `TypeError: unhashable type: 'dict'` in `jinja2/utils.py` β Gradio 4.x Jinja template cache bug
**Error 2:** `ValueError: When localhost is not accessible, a shareable link must be created. Please set share=True` β Gradio 4.x requires share=True on remote hosts
**Root cause:** Downgrading to `gradio==4.44.1` reintroduced two known Gradio 4.x bugs. The commit history already shows `c0a2ea8` explicitly upgraded to 5.x to fix the Jinja crash. Both errors are 4.x-only issues fixed in 5.x. The "pin to 4.44.1" instruction in the task brief was outdated.
**Fix applied:**
- Reverted `requirements.txt`: `gradio==4.44.1` β `gradio>=5.0.0,<6.0`
- Reverted `README.md`: `sdk_version: 4.44.1` β `sdk_version: 5.0.0`
**Result:** PASS β Space reached RUNNING stage. `/health_hf` returns 308 (route missing). Fixed in Iteration 3.
---
## Iteration 3 β 2026-04-11
**Stage:** RUNNING but `/health_hf` returns 308 Permanent Redirect (no such route)
**Root cause:** `app.py` only has `demo.launch()` with no custom routes. Gradio 5.x redirects unknown paths to `/`.
**Fix applied:** Switched from `demo.launch()` to `gr.mount_gradio_app()` pattern:
- Added `FastAPI` app with `@app.get("/health_hf")` returning `{"status": "ok"}`
- Replaced `demo.launch()` with `app = gr.mount_gradio_app(app, demo, path="/")`
- `@spaces.GPU` decorator still handles ZeroGPU GPU allocation independently
**Result:** FAIL β RUNTIME_ERROR exit code 0. gr.mount_gradio_app() returns immediately; nothing blocks the process. Fixed in Iteration 4.
---
## Iteration 4 β 2026-04-11
**Stage:** RUNTIME_ERROR β `Exit code: 0. Reason: ` (clean exit, process didn't stay alive)
**Root cause:** `gr.mount_gradio_app()` returns an ASGI app object but doesn't start a server. Without `demo.launch()` blocking, `app.py` runs to completion and exits.
**Fix applied:** Added `uvicorn.run(app, host="0.0.0.0", port=7860)` after the mount call to start the ASGI server and block the process.
**Result:** FAIL β RUNTIME_ERROR "No @spaces.GPU function detected during startup". `uvicorn.run()` bypasses `spaces.zero.gradio` launch wrapper that scans for GPU functions. ZeroGPU requires `demo.launch()`. Fixed in Iteration 5.
---
## Iteration 5 β 2026-04-11
**Stage:** RUNTIME_ERROR β `No @spaces.GPU function detected during startup`
**Root cause:** `gr.mount_gradio_app()` + `uvicorn.run()` bypasses the `spaces.zero.gradio` interceptor of `demo.launch()`. ZeroGPU scans for `@spaces.GPU` decorated functions inside that interceptor β never gets called, so GPU functions aren't registered.
**Fix applied:** Reverted to bare `demo.launch()`. Added `/health_hf` by monkey-patching `gradio.routes.App.create_app` to inject the route into the Gradio FastAPI app at creation time, before ZeroGPU starts the server.
**Result:** FAIL β "Application unable to start for an unknown reason". The `create_app.__func__` access likely failed (AttributeError or TypeError) in Gradio 5.x, crashing startup silently. Fixed in Iteration 6.
---
## Iteration 6 β 2026-04-11
**Stage:** RUNTIME_ERROR β "Application unable to start for an unknown reason"
**Root cause:** Monkey-patching `gradio.routes.App.create_app.__func__` crashed at import/startup time in Gradio 5.x. The `__func__` access pattern assumes `create_app` is a classmethod β if the signature or descriptor changed in 5.x, this raises AttributeError and kills the process before any server starts.
**Fix applied:** Replaced monkey-patch with a daemon thread that polls `demo.server` (set by Gradio after `demo.launch()` initializes the server) and injects `/health_hf` once available. `demo.launch()` stays bare β ZeroGPU detection works normally. Thread is a no-op if injection fails.
**Result:** FAIL β Space is RUNNING but `/health_hf` still returns 308. `demo.server` is never set in the polling thread's context (ZeroGPU runs the real server in a GPU worker, not the same process). Fixed in Iteration 7.
---
## Iteration 7 β 2026-04-11
**Stage:** RUNNING but `/health_hf` still returns 308
**Root cause:** In ZeroGPU, the actual Gradio server runs in a separate GPU worker process. `demo.server` is never set in the main process, so the daemon thread's poll always fails and the route is never injected.
**Fix applied:** Use `demo.launch(prevent_thread_lock=True)` β the spaces interceptor still detects `@spaces.GPU` functions, then starts the server in a background thread in the same process and returns. After `launch()` returns, `demo.server.app` is accessible and we add `/health_hf`. Main thread blocked via `threading.Event().wait()` (avoids relying on `demo.block_thread()` existing in Gradio 5.x).
**Result:** FAIL β `AttributeError: 'Server' object has no attribute 'app'`. Gradio 5.x's `Server` wraps uvicorn β the FastAPI app lives at `server.config.app`, not `server.app`. Fixed in Iteration 8.
---
## Iteration 8 β 2026-04-11
**Stage:** RUNTIME_ERROR β `AttributeError: 'Server' object has no attribute 'app'`
**Root cause:** `demo.server` is a Gradio `Server` (wrapping uvicorn). In uvicorn, the ASGI app is stored in `server.config.app` (the `Config` object passed at construction), not directly on `server.app`.
**Fix applied:** Changed `demo.server.app.get(...)` β `demo.server.config.app.get(...)`.
**Result:** FAIL β Space RUNNING but `/health_hf` still 308. `demo.server.config.app.get()` adds route AFTER Gradio's catch-all `/{path_name:path}` is already registered. FastAPI matches routes in insertion order β catch-all added first wins. Fixed in Iteration 9.
---
## Iteration 9 β 2026-04-11
**Stage:** RUNNING but `/health_hf` returns 308
**Root cause:** Adding `@app.get("/health_hf")` after `create_app` appends the route AFTER Gradio's catch-all `/{path_name:path}`. FastAPI/Starlette matches routes in registration order β the catch-all was registered first and intercepts everything, including `/health_hf`.
**Fix applied:** Use Starlette middleware (`BaseHTTPMiddleware`) patched into Gradio's `create_app`. Middleware runs BEFORE any route matching, so `/health_hf` is intercepted before the catch-all. Reverted to bare `demo.launch()` (ZeroGPU works). Entire patch wrapped in `try/except` so failures are silent and don't prevent startup.
**Result:** PASS β β Space RUNNING, `GET /health_hf` β `{"status":"ok"}` HTTP 200. All done after 9 iterations.
---
## Iteration 10 β 2026-04-12
**Stage:** RUNNING but "Run Benchmark" throws OSError
**Error:** `OSError: Could not load this library: /usr/local/lib/python3.10/site-packages/torchaudio/lib/_torchaudio.abi3.so`
**Root cause (via runtime logs):** `qwen-tts` depends on `torchaudio`. `pip install qwen-tts` upgraded `torchaudio` to the latest PyPI release which was compiled against CUDA 13 (`libcudart.so.13`). ZeroGPU A10G runs CUDA 12, so `libcudart.so.13` is not present. Full import chain: `from qwen_tts import Qwen3TTSModel` β `speech_vq.py` β `import torchaudio.compliance.kaldi` β `torchaudio/__init__.py` β `torchaudio._extension` β `torch.ops.load_library("_torchaudio.abi3.so")` β `OSError: libcudart.so.13`.
**Fix applied:** Pinned `torchaudio==2.5.1` in `requirements.txt` BEFORE the `qwen-tts` line. torchaudio 2.5.1 (Nov 2024) was compiled against CUDA 12 and prevents pip from upgrading to a CUDA-13 version. `kaldi.fbank()` (the only torchaudio function qwen-tts calls from this path) is a CPU-only DSP operation β no GPU needed.
**Result:** PASS β β Space RUNNING with new SHA 990b408, `/health_hf` β 200. Benchmark fix deployed.
---
## Iteration 11 β 2026-04-12
**Stage:** RUNNING β benchmark redesign (not a build error)
**Change:** Replaced static `np.zeros` reference + arbitrary test text with a live microphone enrollment simulation. New UI: user records one of the 3 frontend enrollment phrases via Gradio `Audio` input, benchmark clones their voice and synthesizes an AI response ("Great job! Now let's keep the conversation going. How was your day?"), returns RTF result + playable audio output. Mirrors the actual frontend UX: enroll β clone β hear AI response.
**Files changed:** `app.py` only.
**Result:** FAIL β space RUNNING, `/health_hf` 200, but Gradio API returns 500 Internal Server Error. UI loads but "Start β" button fails. See Iteration 12.
---
## Iteration 12 β 2026-04-13
**Stage:** RUNNING but Gradio API `/gradio_api/info` returns 500 Internal Server Error
**Error:** `File "/usr/local/lib/python3.10/site-packages/gradio_client/utils.py", line 967, in _json_schema_to_python_type` β crash during API schema generation
**Root cause (via runtime logs):** Gradio generates a JSON schema for all function signatures when serving `/gradio_api/info`. The `gpu_chat_turn` function had type hints `ref: np.ndarray, history: list, turn_count: int, l1: str, l2: str`. `gradio_client`'s `json_schema_to_python_type` in `_json_schema_to_python_type` cannot serialize `numpy.ndarray` into a JSON schema β it crashes on the list comprehension at line 967β968 trying to build property descriptions. This crash propagates through Starlette's middleware stack, resulting in a 500 on every request (including the frontend's queue/event polling calls).
**Fix applied:**
- Removed all type hints from `gpu_enroll_and_greet` and `gpu_chat_turn` signatures β Gradio's schema generator only inspects annotated parameters
- Changed `gpu_enroll_and_greet` to return `ref.tolist()` (plain Python list) instead of `np.ndarray` β keeps State JSON-serializable
- Changed `gpu_chat_turn` to accept `ref_list` (plain list) and convert to `np.ndarray` internally via `np.array(ref_list, dtype=np.float32)` before passing to `synthesize()`
- No changes to callbacks β `on_enroll` stores whatever the function returns; `on_send` passes it through unchanged
**Files changed:** `app.py` only.
**Result:** FAIL β same crash persists. Removing np.ndarray type hints did not resolve it. Root cause was actually the gr.State(dict) itself, not the function signature. See Iteration 13.
---
## Iteration 13 β 2026-04-13
**Stage:** RUNNING but `/gradio_api/info` still returns 500
**Error:** `TypeError: argument of type 'bool' is not iterable` at `gradio_client/utils.py:882 β get_type β if "const" in schema`
**Root cause:** Removing np.ndarray type hints in Iteration 12 did not fix the crash. The actual source is `gr.State({"l1": "en", "l2": "es", "ref": None, "history": [], "turn_count": 0})`. When Gradio generates the API schema for this State, it calls `_json_schema_to_python_type` on the dict schema. The dict's JSON Schema representation has `additionalProperties: True` (a Python bool, per JSON Schema spec). The schema generator then does `if "const" in schema` where `schema` is already a Python bool `True`, causing `TypeError: argument of type 'bool' is not iterable`. This happens in `gradio_client/utils.py` at line 882 regardless of function type hints β it's triggered by the State type itself.
**Fix applied:** Replaced single `gr.State(dict)` with **5 flat, primitive `gr.State` objects**:
- `state_l1 = gr.State("en")` β string, safe
- `state_l2 = gr.State("es")` β string, safe
- `state_ref = gr.State([])` β empty list (no numpy), safe
- `state_history = gr.State([])` β list of dicts (plain JSON), safe
- `state_turn_count = gr.State(0)` β int, safe
All callbacks updated to accept/return these flat states. `ref_list` (a Python list) is passed as `state_ref` and converted to `np.ndarray` inside `gpu_chat_turn` only. Full `app.py` rewrite.
**Files changed:** `app.py` only.
**Result:** PASS β β Space RUNNING, Gradio UI fully functional (language select β enrollment β chat β wall at turn 7), `/health_hf` β 200. See session 2026-04-13 for subsequent full-backend migration.
---
## Iteration 14 β 2026-04-13 (session 2)
**Stage:** Full backend migration attempt β `gr.mount_gradio_app()` approach
**Goal:** Serve FastAPI REST API (all `/session/*` endpoints) alongside Gradio UI so the Vercel React frontend can talk directly to the HF Space instead of Railway.
**Approach:** Replaced `demo.launch()` with `app = gr.mount_gradio_app(api, demo, path="/ui")` where `api` is a standalone `FastAPI()` instance with all endpoints defined as routes.
**Error:** `RUNTIME_ERROR` β Space exits with code 0 (clean exit).
**Root cause:** HF Spaces with `sdk: gradio` require `demo.launch()` to start and block the server. `gr.mount_gradio_app()` returns an ASGI app object but does not start a server β same as Iteration 4 (the process runs to completion and exits immediately).
**Fix applied:** See Iteration 15.
---
## Iteration 15 β 2026-04-13 (session 2)
**Stage:** RUNTIME_ERROR β Space exits code 0 after `gr.mount_gradio_app()`
**Approach:** Switched to `include_router()` pattern: patched `gradio.routes.App.create_app` to call `gapp.include_router(_vmr)` (adding all API routes to Gradio's internal FastAPI app), then ended with `demo.launch()` to keep the process alive.
**Error:** `GET /health` β `HTTP 308 Permanent Redirect` (location: `/`). All API routes return 308.
**Root cause:** Gradio 5.x registers a catch-all SPA route `/{path_name:path}` during `create_app`. FastAPI matches routes in insertion order β the catch-all is registered first (inside Gradio's own `create_app` logic), so any routes added afterward via `include_router()` are never matched. Every unknown path gets 308-redirected to `/` before our routes are evaluated.
**Key lesson:** `include_router()` appends routes AFTER the catch-all β they will never be reached in Gradio 5.x.
**Fix applied:** See Iteration 16.
---
## Iteration 16 β 2026-04-13 (session 2)
**Stage:** RUNNING but all API routes return 308 via `include_router()`
**Root cause:** Same Gradio 5.x SPA catch-all issue as Iteration 9 but for custom routes instead of `/health_hf`. `include_router()` is append-only and cannot insert before the catch-all.
**Fix applied:** Implemented all REST API endpoints as a single `BaseHTTPMiddleware` subclass (`_VocalMirrorAPI`) with regex-based path dispatch. Middleware runs BEFORE any route matching (same pattern that fixed `/health_hf` in Iteration 9). `demo.launch()` stays bare. Session state in-memory dict, audio in `/tmp/`, background thread for enrollment, `asyncio.run_in_executor` for `gpu_tts()` from async context.
**Result:** PASS β
- `GET /health` β `{"status":"ok"}` HTTP 200 β
- `GET /vm-config` β `{"wall_turn_count":7}` β (named `/vm-config` to avoid shadowing Gradio's own `/config`)
- `POST /session/start` β returns session_id + word_list β
- `GET /session/{id}/wall_status` β `{"show_wall":false,"turn_count":0}` β
- Gradio product UI at `/` still fully functional β
Space is RUNNING on zero-a10g with SHA ef1ba6a.
|