Post
33
Pro tip2: You can treat HF datasets as versioned repos by pinning a specific revision (tag, branch or commit) when downloading files. ๐ง
This ensures your data processing pipelines always use the exact dataset state before passing the data to the model. It enables reproducible pipelines and allows for reliable outputs of your ML system.
This ensures your data processing pipelines always use the exact dataset state before passing the data to the model. It enables reproducible pipelines and allows for reliable outputs of your ML system.
from huggingface_hub import hf_hub_download
data_path = hf_hub_download(
repo_id="lysandre/arxiv-nlp",
filename="train.parquet",
repo_type="dataset",
revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a"
)