pashto-language-resources / docs /resource_cycle_runbook.md
musaw
sync(hf): snapshot origin main after resource audit cycle
194828a

Resource Cycle Runbook

Use this runbook whenever you want to repeat the resource update process without re-explaining it.

Daily automation (already enabled)

Manual run (single command)

Run from repository root:

python scripts/run_resource_cycle.py --limit 25

What it executes:

  1. python scripts/review_existing_resources.py
  2. python scripts/sync_resources.py --limit 25
  3. python scripts/promote_candidates.py
  4. python scripts/validate_resource_catalog.py
  5. python scripts/generate_resource_views.py
  6. python scripts/check_links.py
  7. python -m pytest -q

Candidate sources in the sync step include Kaggle datasets, Hugging Face datasets/models/spaces, GitHub repositories, GitLab repositories, Zenodo records, Dataverse datasets, DataCite DOI records, and paper endpoints (arXiv, Semantic Scholar, OpenAlex, Crossref).

Discovery-only mode + manual promotion

If you want fresh candidates without auto-promotion:

  1. Run python scripts/run_resource_cycle.py --discover-only --limit 25.
  2. Review ../resources/catalog/pending_candidates.json.
  3. Manually move selected entries into ../resources/catalog/resources.json.
  4. Re-run python scripts/run_resource_cycle.py --skip-pytest.
  5. Commit and push.

Guardrails

  • Auto-promotion accepts only entries that pass dedupe, URL-availability checks, and catalog validation checks.
  • Existing resources are auto-removed only for strong reasons (for example confirmed hard-missing links, duplicates, or missing Pashto relevance), with reasons stored in resources/catalog/removal_log.json.
  • Keep status: verified for entries that pass automation checks and repository review.
  • Do not promote "reference-only" resources where Pashto is incidental; only Pashto-centric resources are eligible.
  • Treat spelling variants as valid Pashto markers during review (pashto, pukhto, pushto, pakhto, pashto-script).
  • Generated files must be committed after catalog updates.

Versioning for Daily Bot Updates

  • Daily candidate-sync updates from GitHub Actions (resource_sync.yml) are resource updates.
  • When those updates are reviewed and released, increment the third figure in vMAJOR.CODE.RESOURCE.
  • Example sequence: v1.1.1 (code fix) -> v1.1.2 (bot resource release).