| # Scripts | |
| Automation scripts for quality checks, resource catalog validation, and search index generation. | |
| ## Available scripts | |
| - `validate_normalization.py`: validate normalization seed TSV format and rules. | |
| - `check_links.py`: ensure markdown links are clickable (optional online reachability check). | |
| - `validate_resource_catalog.py`: validate `resources/catalog/resources.json`. | |
| - `generate_resource_views.py`: generate `resources/*/README.md`, `resources/README.md`, and `docs/search/resources.json` from the catalog. | |
| - `sync_resources.py`: collect new candidate Pashto resources from Kaggle, Hugging Face (datasets/models/spaces), GitHub repositories, and paper endpoints into `resources/catalog/pending_candidates.json`. | |
| - `run_resource_cycle.py`: run the full repeatable resource cycle with one command. | |
| ## Usage | |
| Validate normalization seed file: | |
| ```bash | |
| python scripts/validate_normalization.py data/processed/normalization_seed_v0.1.tsv | |
| ``` | |
| Validate resource catalog: | |
| ```bash | |
| python scripts/validate_resource_catalog.py | |
| ``` | |
| Generate markdown and search index from catalog: | |
| ```bash | |
| python scripts/generate_resource_views.py | |
| ``` | |
| Sync candidate resources for maintainer review: | |
| ```bash | |
| python scripts/sync_resources.py --limit 20 | |
| ``` | |
| Run full repeatable cycle: | |
| ```bash | |
| python scripts/run_resource_cycle.py --limit 25 | |
| ``` | |
| Run discovery only: | |
| ```bash | |
| python scripts/run_resource_cycle.py --discover-only --limit 25 | |
| ``` | |
| Check markdown links format: | |
| ```bash | |
| python scripts/check_links.py | |
| ``` | |
| Check markdown links and verify URLs online: | |
| ```bash | |
| python scripts/check_links.py --online | |
| ``` | |