The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper โข 2506.05209 โข Published Jun 5, 2025 โข 61
Towards Best Practices for Open Datasets for LLM Training Paper โข 2501.08365 โข Published Jan 14, 2025 โข 62