pashto-language-resources / docs /resource_cycle_runbook.md

musaw

sync(hf): snapshot origin main after resource audit cycle

194828a 12 days ago

2.93 kB

	# Resource Cycle Runbook

	Use this runbook whenever you want to repeat the resource update process without re-explaining it.

	## Daily automation (already enabled)
	- Workflow: [../.github/workflows/resource_sync.yml](../.github/workflows/resource_sync.yml)
	- Schedule: every day at 04:00 UTC via GitHub Actions cron.
	- Output: reviews existing resources for stale/deleted links and non-Pashto/low-value entries (removing only with strong logged reasons), updates [../resources/catalog/pending_candidates.json](../resources/catalog/pending_candidates.json), auto-promotes valid non-duplicate entries into [../resources/catalog/resources.json](../resources/catalog/resources.json), regenerates views, and opens a review PR.

	## Manual run (single command)
	Run from repository root:

	```bash
	python scripts/run_resource_cycle.py --limit 25
	```

	What it executes:
	1. `python scripts/review_existing_resources.py`
	2. `python scripts/sync_resources.py --limit 25`
	3. `python scripts/promote_candidates.py`
	4. `python scripts/validate_resource_catalog.py`
	5. `python scripts/generate_resource_views.py`
	6. `python scripts/check_links.py`
	7. `python -m pytest -q`

	Candidate sources in the sync step include Kaggle datasets, Hugging Face datasets/models/spaces, GitHub repositories, GitLab repositories, Zenodo records, Dataverse datasets, DataCite DOI records, and paper endpoints (arXiv, Semantic Scholar, OpenAlex, Crossref).

	## Discovery-only mode + manual promotion
	If you want fresh candidates without auto-promotion:
	1. Run `python scripts/run_resource_cycle.py --discover-only --limit 25`.
	2. Review [../resources/catalog/pending_candidates.json](../resources/catalog/pending_candidates.json).
	3. Manually move selected entries into [../resources/catalog/resources.json](../resources/catalog/resources.json).
	4. Re-run `python scripts/run_resource_cycle.py --skip-pytest`.
	5. Commit and push.

	## Guardrails
	- Auto-promotion accepts only entries that pass dedupe, URL-availability checks, and catalog validation checks.
	- Existing resources are auto-removed only for strong reasons (for example confirmed hard-missing links, duplicates, or missing Pashto relevance), with reasons stored in `resources/catalog/removal_log.json`.
	- Keep `status: verified` for entries that pass automation checks and repository review.
	- Do not promote "reference-only" resources where Pashto is incidental; only Pashto-centric resources are eligible.
	- Treat spelling variants as valid Pashto markers during review (`pashto`, `pukhto`, `pushto`, `pakhto`, `pashto-script`).
	- Generated files must be committed after catalog updates.

	## Versioning for Daily Bot Updates

	- Daily candidate-sync updates from GitHub Actions (`resource_sync.yml`) are resource updates.
	- When those updates are reviewed and released, increment the third figure in `vMAJOR.CODE.RESOURCE`.
	- Example sequence: `v1.1.1` (code fix) -> `v1.1.2` (bot resource release).