Add selective HF parquet shard download support (--hf-files, --hf-subdir, --max-shards, --list-shards) ab68c56 verified dignity045 commited on 1 day ago
Fix config validation: text_column lives under source.text_column f1d3097 verified dignity045 commited on 1 day ago
Initial GrandLine implementation: deterministic shard-first dataset preprocessing for LLM pretraining ed59144 verified dignity045 commited on 1 day ago