pashto-language-resources / docs /resource_cycle_runbook.md
musaw
sync(hf): snapshot origin main after resource audit cycle
194828a
# Resource Cycle Runbook
Use this runbook whenever you want to repeat the resource update process without re-explaining it.
## Daily automation (already enabled)
- Workflow: [../.github/workflows/resource_sync.yml](../.github/workflows/resource_sync.yml)
- Schedule: every day at 04:00 UTC via GitHub Actions cron.
- Output: reviews existing resources for stale/deleted links and non-Pashto/low-value entries (removing only with strong logged reasons), updates [../resources/catalog/pending_candidates.json](../resources/catalog/pending_candidates.json), auto-promotes valid non-duplicate entries into [../resources/catalog/resources.json](../resources/catalog/resources.json), regenerates views, and opens a review PR.
## Manual run (single command)
Run from repository root:
```bash
python scripts/run_resource_cycle.py --limit 25
```
What it executes:
1. `python scripts/review_existing_resources.py`
2. `python scripts/sync_resources.py --limit 25`
3. `python scripts/promote_candidates.py`
4. `python scripts/validate_resource_catalog.py`
5. `python scripts/generate_resource_views.py`
6. `python scripts/check_links.py`
7. `python -m pytest -q`
Candidate sources in the sync step include Kaggle datasets, Hugging Face datasets/models/spaces, GitHub repositories, GitLab repositories, Zenodo records, Dataverse datasets, DataCite DOI records, and paper endpoints (arXiv, Semantic Scholar, OpenAlex, Crossref).
## Discovery-only mode + manual promotion
If you want fresh candidates without auto-promotion:
1. Run `python scripts/run_resource_cycle.py --discover-only --limit 25`.
2. Review [../resources/catalog/pending_candidates.json](../resources/catalog/pending_candidates.json).
3. Manually move selected entries into [../resources/catalog/resources.json](../resources/catalog/resources.json).
4. Re-run `python scripts/run_resource_cycle.py --skip-pytest`.
5. Commit and push.
## Guardrails
- Auto-promotion accepts only entries that pass dedupe, URL-availability checks, and catalog validation checks.
- Existing resources are auto-removed only for strong reasons (for example confirmed hard-missing links, duplicates, or missing Pashto relevance), with reasons stored in `resources/catalog/removal_log.json`.
- Keep `status: verified` for entries that pass automation checks and repository review.
- Do not promote "reference-only" resources where Pashto is incidental; only Pashto-centric resources are eligible.
- Treat spelling variants as valid Pashto markers during review (`pashto`, `pukhto`, `pushto`, `pakhto`, `pashto-script`).
- Generated files must be committed after catalog updates.
## Versioning for Daily Bot Updates
- Daily candidate-sync updates from GitHub Actions (`resource_sync.yml`) are resource updates.
- When those updates are reviewed and released, increment the third figure in `vMAJOR.CODE.RESOURCE`.
- Example sequence: `v1.1.1` (code fix) -> `v1.1.2` (bot resource release).