File size: 2,451 Bytes
f13fd7c f725a8a f13fd7c d2f0b77 f13fd7c 2f53244 6f1c8bd 194828a 574cd8c d2f0b77 f13fd7c 194828a 6f1c8bd 194828a 574cd8c f13fd7c d2f0b77 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | # Scripts
Automation scripts for quality checks, resource catalog validation, and search index generation.
## Available scripts
- `validate_normalization.py`: validate normalization seed TSV format and rules.
- `check_links.py`: ensure markdown links are clickable (optional online reachability check).
- `validate_resource_catalog.py`: validate `resources/catalog/resources.json`.
- `generate_resource_views.py`: generate `resources/*/README.md`, `resources/README.md`, and `docs/search/resources.json` from the catalog.
- `sync_resources.py`: collect new candidate Pashto resources from Kaggle, Hugging Face (datasets/models/spaces), GitHub, GitLab, OpenAlex, Crossref, Zenodo, Dataverse, DataCite, arXiv, and Semantic Scholar into `resources/catalog/pending_candidates.json`.
- `promote_candidates.py`: auto-promote valid non-duplicate entries from `pending_candidates.json` into `resources/catalog/resources.json`.
- `review_existing_resources.py`: review current catalog resources, remove stale/removed entries only with strong reasons, and log removals in `resources/catalog/removal_log.json`.
- `run_resource_cycle.py`: run the full repeatable resource cycle with one command.
## Usage
Validate normalization seed file:
```bash
python scripts/validate_normalization.py data/processed/normalization_seed_v0.1.tsv
```
Validate resource catalog:
```bash
python scripts/validate_resource_catalog.py
```
Generate markdown and search index from catalog:
```bash
python scripts/generate_resource_views.py
```
Sync candidate resources for maintainer review:
```bash
python scripts/sync_resources.py --limit 20
```
Review existing resources and remove stale entries before discovery:
```bash
python scripts/review_existing_resources.py
```
Run stricter relevance cleanup mode:
```bash
python scripts/review_existing_resources.py --enforce-pashto-relevance
```
Auto-promote valid candidates into verified catalog:
```bash
python scripts/promote_candidates.py
```
Auto-promote while skipping online URL availability checks:
```bash
python scripts/promote_candidates.py --skip-url-check
```
Run full repeatable cycle:
```bash
python scripts/run_resource_cycle.py --limit 25
```
Run discovery only:
```bash
python scripts/run_resource_cycle.py --discover-only --limit 25
```
Check markdown links format:
```bash
python scripts/check_links.py
```
Check markdown links and verify URLs online:
```bash
python scripts/check_links.py --online
```
|