Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
dignity045
/
grandline
like
0
dataset-preprocessing
llm-pretraining
tokenization
deduplication
data-pipeline
ml-intern
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
main
grandline
/
state
245 Bytes
Ctrl+K
Ctrl+K
1 contributor
History:
1 commit
dignity045
Initial GrandLine implementation: deterministic shard-first dataset preprocessing for LLM pretraining
ed59144
verified
1 day ago
.gitkeep
245 Bytes
Initial GrandLine implementation: deterministic shard-first dataset preprocessing for LLM pretraining
1 day ago