Parag Ekbote PRO
AINovice2005
AI & ML interests
ML Engineer passionate about taking models from research to production. 1 year supporting tech startups. Active OSS contributor.
Recent Activity
posted an
update
about 5 hours ago
Pro tip2: You can treat HF datasets as versioned repos by pinning a specific revision (tag, branch or commit) when downloading files. 🧠
This ensures your data processing pipelines always use the exact dataset state before passing the data to the model. It enables reproducible pipelines and allows for reliable outputs of your ML system.
```python
from huggingface_hub import hf_hub_download
data_path = hf_hub_download(
repo_id="lysandre/arxiv-nlp",
filename="train.parquet",
repo_type="dataset",
revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a"
)
``` upvoted an article about 9 hours ago
Ulysses Sequence Parallelism: Training with Million-Token Contexts