Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper • 2506.08300 • Published Jun 10, 2025 • 9
institutional/institutional-books-topic-classifier-bert Text Classification • 0.2B • Updated Jun 12, 2025 • 19 • 12
institutional/institutional-books-topic-classifier-bert Text Classification • 0.2B • Updated Jun 12, 2025 • 19 • 12
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published Jan 14, 2025 • 62