Spaces:

zimejin
/

Job-Scorer

Running

App Files Files Community

Job-Scorer / docs /BATCH_JOBS.md

zimejin

Remove launcher Smart Search; background batch is the only async path

e20deee 3 months ago

preview code

Raw

History Blame Contribute Delete

4.71 kB

Background batch jobs (launcher)

Large launcher queues can be processed server-side in chunks: SerpAPI for Google results, then fetch + score (OpenRouter AI or heuristic). State lives in Upstash Redis; Upstash QStash wakes your deployment (e.g. Hugging Face Space) with HTTPS callbacks so work continues after you close the tab.

Troubleshooting: “Job not found” right after starting a batch

The Upstash Redis client parses JSON on GET by default. This app stores each job as a JSON string with SET and expects a string back on GET. If deserialization is left on, GET returns an object, the loader treats the job as missing, and the status page shows Job not found even though Redis has the key. The code sets automaticDeserialization: false on the Redis client (lib/batch-jobs/redis.ts). After upgrading, restart the deployment so a new process picks up the client config.

Required services

SerpAPI — SERPAPI_API_KEY.
Upstash Redis — REST URL and token:
- UPSTASH_REDIS_REST_URL
- UPSTASH_REDIS_REST_TOKEN
Upstash QStash — publish token and signing keys (for verifying callbacks):
- QSTASH_TOKEN — used to enqueue the next chunk after each run.
- QSTASH_CURRENT_SIGNING_KEY — used to verify incoming requests to the chunk endpoint.
- QSTASH_NEXT_SIGNING_KEY — optional; use during key rotation.

Public URL (critical for QStash)

QStash must call a stable HTTPS URL that reaches your app. Set:

BATCH_PUBLIC_APP_URL — origin only, no trailing slash, e.g. https://your-space.hf.space or your Vercel URL.

Hugging Face Spaces (important)

Do not use the gallery link as the public URL:

Wrong: https://huggingface.co/spaces/zimejin/Job-Scorer (Space page on hf.co — this is not your container’s API host).
Right: https://<subdomain>.hf.space — the direct app URL shown when you open the Space (App tab / embedded app). It usually looks like https://zimejin-job-scorer.hf.space (check your Space’s Settings → Details for the exact *.hf.space value).

Use that https://....hf.space value (no path) for BATCH_PUBLIC_APP_URL, and use the same URL in the browser when you run batch jobs so /api/... calls hit your app.

The app publishes chunk work to:

{BATCH_PUBLIC_APP_URL}/api/internal/batch-jobs/chunk

Signing verification uses this same base URL. If verification fails, confirm the URL matches exactly what QStash calls (scheme, host, path).

Optional: OpenRouter (AI scoring)

OPENROUTER_API_KEY

If unset, scoring falls back to heuristics.

Optional tuning (environment)

Variable	Purpose
`BATCH_SERP_DELAY_MS`	Delay between SerpAPI calls within a chunk (default 1200).
`BATCH_SCORE_DELAY_MS`	Delay between listing fetch/score steps (default 1500).
`BATCH_QUERIES_PER_CHUNK`	Max Serp queries processed per QStash delivery (default 5).
`BATCH_SCORES_PER_CHUNK`	Max listings scored per delivery (default 3).
`BATCH_MAX_STORED_HITS`	Cap on stored passing hits in Redis (default 500).
`BATCH_MIN_ROLE_FIT`	Default min role score (overridable in API body).
`BATCH_MIN_GLOBAL_REMOTE`	Default min remote score.
`BATCH_ALLOWED_VERDICTS`	Comma list: `Strong Match`, `Possible`, `Skip`.

Manual resume (debug / recovery)

If QStash delivery fails, you can advance the job one chunk with:

POST /api/launcher/batch-jobs/{id}/resume

Header:

Authorization: Bearer {BATCH_RESUME_SECRET}

Set BATCH_RESUME_SECRET in the deployment environment. This runs the same logic as the QStash callback without signature verification.

API

POST /api/launcher/batch-jobs — body: { queries: [{ q, tbs? }], options }. Returns { id, status } and enqueues the first chunk.
GET /api/launcher/batch-jobs/{id} — progress + top hits (sorted, capped by topK).
POST /api/internal/batch-jobs/chunk — QStash-only (or manual resume via the resume route above); body { jobId }.

UI

Job search launcher → Background batch job: precision checkbox, caps, and Start background batch opens /job-launcher/batch/{id} for polling.

Recent jobs and JSON export (browser)

Recent batch jobs on the launcher is stored in localStorage only (this browser / profile). Clearing site data removes the list; it is not synced to a server.
On the batch status page, Download JSON saves an archival snapshot of the current job payload (plus exportedAt metadata). Server-side jobs still expire from Redis after about 7 days as before—export if you need a long-term copy.