Resource Cycle Runbook
Use this runbook whenever you want to repeat the resource update process without re-explaining it.
Daily automation (already enabled)
- Workflow: ../.github/workflows/resource_sync.yml
- Schedule: every day at 04:00 UTC via GitHub Actions cron.
- Output: reviews existing resources for stale/deleted links and non-Pashto/low-value entries (removing only with strong logged reasons), updates ../resources/catalog/pending_candidates.json, auto-promotes valid non-duplicate entries into ../resources/catalog/resources.json, regenerates views, and opens a review PR.
Manual run (single command)
Run from repository root:
python scripts/run_resource_cycle.py --limit 25
What it executes:
python scripts/review_existing_resources.pypython scripts/sync_resources.py --limit 25python scripts/promote_candidates.pypython scripts/validate_resource_catalog.pypython scripts/generate_resource_views.pypython scripts/check_links.pypython -m pytest -q
Candidate sources in the sync step include Kaggle datasets, Hugging Face datasets/models/spaces, GitHub repositories, GitLab repositories, Zenodo records, Dataverse datasets, DataCite DOI records, and paper endpoints (arXiv, Semantic Scholar, OpenAlex, Crossref).
Discovery-only mode + manual promotion
If you want fresh candidates without auto-promotion:
- Run
python scripts/run_resource_cycle.py --discover-only --limit 25. - Review ../resources/catalog/pending_candidates.json.
- Manually move selected entries into ../resources/catalog/resources.json.
- Re-run
python scripts/run_resource_cycle.py --skip-pytest. - Commit and push.
Guardrails
- Auto-promotion accepts only entries that pass dedupe, URL-availability checks, and catalog validation checks.
- Existing resources are auto-removed only for strong reasons (for example confirmed hard-missing links, duplicates, or missing Pashto relevance), with reasons stored in
resources/catalog/removal_log.json. - Keep
status: verifiedfor entries that pass automation checks and repository review. - Do not promote "reference-only" resources where Pashto is incidental; only Pashto-centric resources are eligible.
- Treat spelling variants as valid Pashto markers during review (
pashto,pukhto,pushto,pakhto,pashto-script). - Generated files must be committed after catalog updates.
Versioning for Daily Bot Updates
- Daily candidate-sync updates from GitHub Actions (
resource_sync.yml) are resource updates. - When those updates are reviewed and released, increment the third figure in
vMAJOR.CODE.RESOURCE. - Example sequence:
v1.1.1(code fix) ->v1.1.2(bot resource release).