Job-Scorer / docs /BATCH_JOBS.md
zimejin's picture
Remove launcher Smart Search; background batch is the only async path
e20deee
|
Raw
History Blame Contribute Delete
4.71 kB

Background batch jobs (launcher)

Large launcher queues can be processed server-side in chunks: SerpAPI for Google results, then fetch + score (OpenRouter AI or heuristic). State lives in Upstash Redis; Upstash QStash wakes your deployment (e.g. Hugging Face Space) with HTTPS callbacks so work continues after you close the tab.

Troubleshooting: “Job not found” right after starting a batch

The Upstash Redis client parses JSON on GET by default. This app stores each job as a JSON string with SET and expects a string back on GET. If deserialization is left on, GET returns an object, the loader treats the job as missing, and the status page shows Job not found even though Redis has the key. The code sets automaticDeserialization: false on the Redis client (lib/batch-jobs/redis.ts). After upgrading, restart the deployment so a new process picks up the client config.

Required services

  1. SerpAPISERPAPI_API_KEY.
  2. Upstash Redis — REST URL and token:
    • UPSTASH_REDIS_REST_URL
    • UPSTASH_REDIS_REST_TOKEN
  3. Upstash QStash — publish token and signing keys (for verifying callbacks):
    • QSTASH_TOKEN — used to enqueue the next chunk after each run.
    • QSTASH_CURRENT_SIGNING_KEY — used to verify incoming requests to the chunk endpoint.
    • QSTASH_NEXT_SIGNING_KEY — optional; use during key rotation.

Public URL (critical for QStash)

QStash must call a stable HTTPS URL that reaches your app. Set:

  • BATCH_PUBLIC_APP_URL — origin only, no trailing slash, e.g. https://your-space.hf.space or your Vercel URL.

Hugging Face Spaces (important)

Do not use the gallery link as the public URL:

  • Wrong: https://huggingface.co/spaces/zimejin/Job-Scorer (Space page on hf.co — this is not your container’s API host).
  • Right: https://<subdomain>.hf.space — the direct app URL shown when you open the Space (App tab / embedded app). It usually looks like https://zimejin-job-scorer.hf.space (check your Space’s Settings → Details for the exact *.hf.space value).

Use that https://....hf.space value (no path) for BATCH_PUBLIC_APP_URL, and use the same URL in the browser when you run batch jobs so /api/... calls hit your app.

The app publishes chunk work to:

{BATCH_PUBLIC_APP_URL}/api/internal/batch-jobs/chunk

Signing verification uses this same base URL. If verification fails, confirm the URL matches exactly what QStash calls (scheme, host, path).

Optional: OpenRouter (AI scoring)

  • OPENROUTER_API_KEY

If unset, scoring falls back to heuristics.

Optional tuning (environment)

Variable Purpose
BATCH_SERP_DELAY_MS Delay between SerpAPI calls within a chunk (default 1200).
BATCH_SCORE_DELAY_MS Delay between listing fetch/score steps (default 1500).
BATCH_QUERIES_PER_CHUNK Max Serp queries processed per QStash delivery (default 5).
BATCH_SCORES_PER_CHUNK Max listings scored per delivery (default 3).
BATCH_MAX_STORED_HITS Cap on stored passing hits in Redis (default 500).
BATCH_MIN_ROLE_FIT Default min role score (overridable in API body).
BATCH_MIN_GLOBAL_REMOTE Default min remote score.
BATCH_ALLOWED_VERDICTS Comma list: Strong Match, Possible, Skip.

Manual resume (debug / recovery)

If QStash delivery fails, you can advance the job one chunk with:

POST /api/launcher/batch-jobs/{id}/resume

Header:

Authorization: Bearer {BATCH_RESUME_SECRET}

Set BATCH_RESUME_SECRET in the deployment environment. This runs the same logic as the QStash callback without signature verification.

API

  • POST /api/launcher/batch-jobs — body: { queries: [{ q, tbs? }], options }. Returns { id, status } and enqueues the first chunk.
  • GET /api/launcher/batch-jobs/{id} — progress + top hits (sorted, capped by topK).
  • POST /api/internal/batch-jobs/chunk — QStash-only (or manual resume via the resume route above); body { jobId }.

UI

Job search launcherBackground batch job: precision checkbox, caps, and Start background batch opens /job-launcher/batch/{id} for polling.

Recent jobs and JSON export (browser)

  • Recent batch jobs on the launcher is stored in localStorage only (this browser / profile). Clearing site data removes the list; it is not synced to a server.
  • On the batch status page, Download JSON saves an archival snapshot of the current job payload (plus exportedAt metadata). Server-side jobs still expire from Redis after about 7 days as before—export if you need a long-term copy.