Job-Scorer / docs /BATCH_JOBS.md
zimejin's picture
Remove launcher Smart Search; background batch is the only async path
e20deee
|
Raw
History Blame Contribute Delete
4.71 kB
# Background batch jobs (launcher)
Large launcher queues can be processed **server-side** in **chunks**: SerpAPI for Google results, then fetch + score (OpenRouter AI or heuristic). State lives in **Upstash Redis**; **Upstash QStash** wakes your deployment (e.g. Hugging Face Space) with HTTPS callbacks so work continues after you close the tab.
## Troubleshooting: “Job not found” right after starting a batch
The Upstash Redis client **parses JSON on `GET` by default**. This app stores each job as a **JSON string** with `SET` and expects a **string** back on `GET`. If deserialization is left on, `GET` returns an **object**, the loader treats the job as missing, and the status page shows **Job not found** even though Redis has the key. The code sets `automaticDeserialization: false` on the Redis client (`lib/batch-jobs/redis.ts`). After upgrading, restart the deployment so a new process picks up the client config.
## Required services
1. **SerpAPI**`SERPAPI_API_KEY`.
2. **Upstash Redis** — REST URL and token:
- `UPSTASH_REDIS_REST_URL`
- `UPSTASH_REDIS_REST_TOKEN`
3. **Upstash QStash** — publish token and signing keys (for verifying callbacks):
- `QSTASH_TOKEN` — used to **enqueue** the next chunk after each run.
- `QSTASH_CURRENT_SIGNING_KEY` — used to **verify** incoming requests to the chunk endpoint.
- `QSTASH_NEXT_SIGNING_KEY` — optional; use during key rotation.
## Public URL (critical for QStash)
QStash must call a **stable HTTPS URL** that reaches your app. Set:
- **`BATCH_PUBLIC_APP_URL`** — origin only, no trailing slash, e.g. `https://your-space.hf.space` or your Vercel URL.
### Hugging Face Spaces (important)
Do **not** use the gallery link as the public URL:
- Wrong: `https://huggingface.co/spaces/zimejin/Job-Scorer` (Space page on hf.co — this is **not** your container’s API host).
- Right: `https://<subdomain>.hf.space` — the direct app URL shown when you open the Space (**App** tab / embedded app). It usually looks like `https://zimejin-job-scorer.hf.space` (check your Space’s **Settings → Details** for the exact `*.hf.space` value).
Use that **`https://....hf.space`** value (no path) for `BATCH_PUBLIC_APP_URL`, and use the **same** URL in the browser when you run batch jobs so `/api/...` calls hit your app.
The app publishes chunk work to:
`{BATCH_PUBLIC_APP_URL}/api/internal/batch-jobs/chunk`
Signing verification uses this same base URL. If verification fails, confirm the URL matches exactly what QStash calls (scheme, host, path).
## Optional: OpenRouter (AI scoring)
- `OPENROUTER_API_KEY`
If unset, scoring falls back to heuristics.
## Optional tuning (environment)
| Variable | Purpose |
|----------|---------|
| `BATCH_SERP_DELAY_MS` | Delay between SerpAPI calls within a chunk (default 1200). |
| `BATCH_SCORE_DELAY_MS` | Delay between listing fetch/score steps (default 1500). |
| `BATCH_QUERIES_PER_CHUNK` | Max Serp queries processed per QStash delivery (default 5). |
| `BATCH_SCORES_PER_CHUNK` | Max listings scored per delivery (default 3). |
| `BATCH_MAX_STORED_HITS` | Cap on stored passing hits in Redis (default 500). |
| `BATCH_MIN_ROLE_FIT` | Default min role score (overridable in API body). |
| `BATCH_MIN_GLOBAL_REMOTE` | Default min remote score. |
| `BATCH_ALLOWED_VERDICTS` | Comma list: `Strong Match`, `Possible`, `Skip`. |
## Manual resume (debug / recovery)
If QStash delivery fails, you can advance the job one chunk with:
`POST /api/launcher/batch-jobs/{id}/resume`
Header:
`Authorization: Bearer {BATCH_RESUME_SECRET}`
Set `BATCH_RESUME_SECRET` in the deployment environment. This runs the same logic as the QStash callback without signature verification.
## API
- **`POST /api/launcher/batch-jobs`** — body: `{ queries: [{ q, tbs? }], options }`. Returns `{ id, status }` and enqueues the first chunk.
- **`GET /api/launcher/batch-jobs/{id}`** — progress + top hits (sorted, capped by `topK`).
- **`POST /api/internal/batch-jobs/chunk`** — QStash-only (or manual resume via the resume route above); body `{ jobId }`.
## UI
**Job search launcher****Background batch job**: precision checkbox, caps, and **Start background batch** opens `/job-launcher/batch/{id}` for polling.
### Recent jobs and JSON export (browser)
- **Recent batch jobs** on the launcher is stored in **`localStorage` only** (this browser / profile). Clearing site data removes the list; it is not synced to a server.
- On the batch status page, **Download JSON** saves an archival snapshot of the current job payload (plus `exportedAt` metadata). Server-side jobs still **expire from Redis after about 7 days** as before—export if you need a long-term copy.