Drawing inspiration from how human programmers “selectively skim” source code during development and debugging, SWE-Pruner performs task-aware adaptive pruning for long contexts.
We’ve released two conversational speech datasets from oto on Hugging Face 🤗 Both are based on real, casual, full-duplex conversations, but with slightly different focuses.
Dataset 1: Processed / curated subset otoearth/otoSpeech-full-duplex-processed-141h * Full-duplex, spontaneous multi-speaker conversations * Participants filtered for high audio quality * PII removal and audio enhancement applied * Designed for training and benchmarking S2S or dialogue models
Dataset 2: Larger raw(er) release otoearth/otoSpeech-full-duplex-280h * Same collection pipeline, with broader coverage * More diversity in speakers, accents, and conversation styles * Useful for analysis, filtering, or custom preprocessing experiments
We intentionally split the release to support different research workflows: clean and ready-to-use vs. more exploratory and research-oriented use.
The datasets are currently private, but we’re happy to approve access requests — feel free to request access if you’re interested.
If you’re working on speech-to-speech (S2S) models or are curious about full-duplex conversational data, we’d love to discuss and exchange ideas together.