| # Scripts | |
| Automation scripts for quality checks, resource catalog validation, and search index generation. | |
| ## Available scripts | |
| - `validate_normalization.py`: validate normalization seed TSV format and rules. | |
| - `check_links.py`: ensure markdown links are clickable (optional online reachability check). | |
| - `validate_resource_catalog.py`: validate `resources/catalog/resources.json`. | |
| - `generate_resource_views.py`: generate `resources/*/README.md`, `resources/README.md`, and `docs/search/resources.json` from the catalog. | |
| - `sync_resources.py`: collect new candidate Pashto resources from Kaggle, Hugging Face (datasets/models/spaces), GitHub, GitLab, OpenAlex, Crossref, Zenodo, Dataverse, DataCite, arXiv, and Semantic Scholar into `resources/catalog/pending_candidates.json`. | |
| - `promote_candidates.py`: auto-promote valid non-duplicate entries from `pending_candidates.json` into `resources/catalog/resources.json`. | |
| - `review_existing_resources.py`: review current catalog resources, remove stale/removed entries only with strong reasons, and log removals in `resources/catalog/removal_log.json`. | |
| - `run_resource_cycle.py`: run the full repeatable resource cycle with one command. | |
| ## Usage | |
| Validate normalization seed file: | |
| ```bash | |
| python scripts/validate_normalization.py data/processed/normalization_seed_v0.1.tsv | |
| ``` | |
| Validate resource catalog: | |
| ```bash | |
| python scripts/validate_resource_catalog.py | |
| ``` | |
| Generate markdown and search index from catalog: | |
| ```bash | |
| python scripts/generate_resource_views.py | |
| ``` | |
| Sync candidate resources for maintainer review: | |
| ```bash | |
| python scripts/sync_resources.py --limit 20 | |
| ``` | |
| Review existing resources and remove stale entries before discovery: | |
| ```bash | |
| python scripts/review_existing_resources.py | |
| ``` | |
| Run stricter relevance cleanup mode: | |
| ```bash | |
| python scripts/review_existing_resources.py --enforce-pashto-relevance | |
| ``` | |
| Auto-promote valid candidates into verified catalog: | |
| ```bash | |
| python scripts/promote_candidates.py | |
| ``` | |
| Auto-promote while skipping online URL availability checks: | |
| ```bash | |
| python scripts/promote_candidates.py --skip-url-check | |
| ``` | |
| Run full repeatable cycle: | |
| ```bash | |
| python scripts/run_resource_cycle.py --limit 25 | |
| ``` | |
| Run discovery only: | |
| ```bash | |
| python scripts/run_resource_cycle.py --discover-only --limit 25 | |
| ``` | |
| Check markdown links format: | |
| ```bash | |
| python scripts/check_links.py | |
| ``` | |
| Check markdown links and verify URLs online: | |
| ```bash | |
| python scripts/check_links.py --online | |
| ``` | |