Spaces:
Running
Running
| # Background batch jobs (launcher) | |
| Large launcher queues can be processed **server-side** in **chunks**: SerpAPI for Google results, then fetch + score (OpenRouter AI or heuristic). State lives in **Upstash Redis**; **Upstash QStash** wakes your deployment (e.g. Hugging Face Space) with HTTPS callbacks so work continues after you close the tab. | |
| ## Troubleshooting: “Job not found” right after starting a batch | |
| The Upstash Redis client **parses JSON on `GET` by default**. This app stores each job as a **JSON string** with `SET` and expects a **string** back on `GET`. If deserialization is left on, `GET` returns an **object**, the loader treats the job as missing, and the status page shows **Job not found** even though Redis has the key. The code sets `automaticDeserialization: false` on the Redis client (`lib/batch-jobs/redis.ts`). After upgrading, restart the deployment so a new process picks up the client config. | |
| ## Required services | |
| 1. **SerpAPI** — `SERPAPI_API_KEY`. | |
| 2. **Upstash Redis** — REST URL and token: | |
| - `UPSTASH_REDIS_REST_URL` | |
| - `UPSTASH_REDIS_REST_TOKEN` | |
| 3. **Upstash QStash** — publish token and signing keys (for verifying callbacks): | |
| - `QSTASH_TOKEN` — used to **enqueue** the next chunk after each run. | |
| - `QSTASH_CURRENT_SIGNING_KEY` — used to **verify** incoming requests to the chunk endpoint. | |
| - `QSTASH_NEXT_SIGNING_KEY` — optional; use during key rotation. | |
| ## Public URL (critical for QStash) | |
| QStash must call a **stable HTTPS URL** that reaches your app. Set: | |
| - **`BATCH_PUBLIC_APP_URL`** — origin only, no trailing slash, e.g. `https://your-space.hf.space` or your Vercel URL. | |
| ### Hugging Face Spaces (important) | |
| Do **not** use the gallery link as the public URL: | |
| - Wrong: `https://huggingface.co/spaces/zimejin/Job-Scorer` (Space page on hf.co — this is **not** your container’s API host). | |
| - Right: `https://<subdomain>.hf.space` — the direct app URL shown when you open the Space (**App** tab / embedded app). It usually looks like `https://zimejin-job-scorer.hf.space` (check your Space’s **Settings → Details** for the exact `*.hf.space` value). | |
| Use that **`https://....hf.space`** value (no path) for `BATCH_PUBLIC_APP_URL`, and use the **same** URL in the browser when you run batch jobs so `/api/...` calls hit your app. | |
| The app publishes chunk work to: | |
| `{BATCH_PUBLIC_APP_URL}/api/internal/batch-jobs/chunk` | |
| Signing verification uses this same base URL. If verification fails, confirm the URL matches exactly what QStash calls (scheme, host, path). | |
| ## Optional: OpenRouter (AI scoring) | |
| - `OPENROUTER_API_KEY` | |
| If unset, scoring falls back to heuristics. | |
| ## Optional tuning (environment) | |
| | Variable | Purpose | | |
| |----------|---------| | |
| | `BATCH_SERP_DELAY_MS` | Delay between SerpAPI calls within a chunk (default 1200). | | |
| | `BATCH_SCORE_DELAY_MS` | Delay between listing fetch/score steps (default 1500). | | |
| | `BATCH_QUERIES_PER_CHUNK` | Max Serp queries processed per QStash delivery (default 5). | | |
| | `BATCH_SCORES_PER_CHUNK` | Max listings scored per delivery (default 3). | | |
| | `BATCH_MAX_STORED_HITS` | Cap on stored passing hits in Redis (default 500). | | |
| | `BATCH_MIN_ROLE_FIT` | Default min role score (overridable in API body). | | |
| | `BATCH_MIN_GLOBAL_REMOTE` | Default min remote score. | | |
| | `BATCH_ALLOWED_VERDICTS` | Comma list: `Strong Match`, `Possible`, `Skip`. | | |
| ## Manual resume (debug / recovery) | |
| If QStash delivery fails, you can advance the job one chunk with: | |
| `POST /api/launcher/batch-jobs/{id}/resume` | |
| Header: | |
| `Authorization: Bearer {BATCH_RESUME_SECRET}` | |
| Set `BATCH_RESUME_SECRET` in the deployment environment. This runs the same logic as the QStash callback without signature verification. | |
| ## API | |
| - **`POST /api/launcher/batch-jobs`** — body: `{ queries: [{ q, tbs? }], options }`. Returns `{ id, status }` and enqueues the first chunk. | |
| - **`GET /api/launcher/batch-jobs/{id}`** — progress + top hits (sorted, capped by `topK`). | |
| - **`POST /api/internal/batch-jobs/chunk`** — QStash-only (or manual resume via the resume route above); body `{ jobId }`. | |
| ## UI | |
| **Job search launcher** → **Background batch job**: precision checkbox, caps, and **Start background batch** opens `/job-launcher/batch/{id}` for polling. | |
| ### Recent jobs and JSON export (browser) | |
| - **Recent batch jobs** on the launcher is stored in **`localStorage` only** (this browser / profile). Clearing site data removes the list; it is not synced to a server. | |
| - On the batch status page, **Download JSON** saves an archival snapshot of the current job payload (plus `exportedAt` metadata). Server-side jobs still **expire from Redis after about 7 days** as before—export if you need a long-term copy. | |