Scripts
Automation scripts for quality checks, resource catalog validation, and search index generation.
Available scripts
validate_normalization.py: validate normalization seed TSV format and rules.check_links.py: ensure markdown links are clickable (optional online reachability check).validate_resource_catalog.py: validateresources/catalog/resources.json.generate_resource_views.py: generateresources/*/README.md,resources/README.md, anddocs/search/resources.jsonfrom the catalog.sync_resources.py: collect new candidate Pashto resources from Kaggle, Hugging Face (datasets/models/spaces), GitHub repositories, and paper endpoints intoresources/catalog/pending_candidates.json.run_resource_cycle.py: run the full repeatable resource cycle with one command.
Usage
Validate normalization seed file:
python scripts/validate_normalization.py data/processed/normalization_seed_v0.1.tsv
Validate resource catalog:
python scripts/validate_resource_catalog.py
Generate markdown and search index from catalog:
python scripts/generate_resource_views.py
Sync candidate resources for maintainer review:
python scripts/sync_resources.py --limit 20
Run full repeatable cycle:
python scripts/run_resource_cycle.py --limit 25
Run discovery only:
python scripts/run_resource_cycle.py --discover-only --limit 25
Check markdown links format:
python scripts/check_links.py
Check markdown links and verify URLs online:
python scripts/check_links.py --online