Pukhto_Pashto / scripts /README.md
musaw
Expand resource cycle for projects/code and promote new Pashto sources
081627f
# Scripts
Automation scripts for quality checks, resource catalog validation, and search index generation.
## Available scripts
- `validate_normalization.py`: validate normalization seed TSV format and rules.
- `check_links.py`: ensure markdown links are clickable (optional online reachability check).
- `validate_resource_catalog.py`: validate `resources/catalog/resources.json`.
- `generate_resource_views.py`: generate `resources/*/README.md`, `resources/README.md`, and `docs/search/resources.json` from the catalog.
- `sync_resources.py`: collect new candidate Pashto resources from Kaggle, Hugging Face (datasets/models/spaces), GitHub repositories, and paper endpoints into `resources/catalog/pending_candidates.json`.
- `run_resource_cycle.py`: run the full repeatable resource cycle with one command.
## Usage
Validate normalization seed file:
```bash
python scripts/validate_normalization.py data/processed/normalization_seed_v0.1.tsv
```
Validate resource catalog:
```bash
python scripts/validate_resource_catalog.py
```
Generate markdown and search index from catalog:
```bash
python scripts/generate_resource_views.py
```
Sync candidate resources for maintainer review:
```bash
python scripts/sync_resources.py --limit 20
```
Run full repeatable cycle:
```bash
python scripts/run_resource_cycle.py --limit 25
```
Run discovery only:
```bash
python scripts/run_resource_cycle.py --discover-only --limit 25
```
Check markdown links format:
```bash
python scripts/check_links.py
```
Check markdown links and verify URLs online:
```bash
python scripts/check_links.py --online
```