Collections

Discover the best community collections!

Collections trending this week
Dual Channel Global Customer-Agent Interaction Datasets
Sample Datasets of dual-channel call center audio with separate agent and customer channels for ASR, diarization, and conversational AI training.
STEM & Non-STEM Q&A Datasets for LLM Training
Sample datasets from a 6.5M+ enterprise-grade Q&A corpus across STEM and Non-STEM domains, built for LLM training, instruction tuning, and evaluation.
Dual Channel Global Customer-Agent Interaction Datasets
Sample Datasets of dual-channel call center audio with separate agent and customer channels for ASR, diarization, and conversational AI training.
Academic Textbook Corpora for LLM Training
Sample of a 2.2B+ word textbook corpus across 32K+ books, 5K+ subjects, and 14 languages for LLM training and multilingual knowledge modeling.
STEM & Non-STEM Q&A Datasets for LLM Training
Sample datasets from a 6.5M+ enterprise-grade Q&A corpus across STEM and Non-STEM domains, built for LLM training, instruction tuning, and evaluation.