AI & ML interests

None defined yet.

Recent Activity

leq6c  updated a dataset 4 days ago
otoearth/otoSpeech-full-duplex-280h
leq6c  updated a collection 4 days ago
otoSpeech Full-duplex Dataset
View all activity

consome2 
posted an update 2 days ago
view post
Post
4576
We’ve released two conversational speech datasets from oto on Hugging Face 🤗
Both are based on real, casual, full-duplex conversations, but with slightly different focuses.

Dataset 1: Processed / curated subset
otoearth/otoSpeech-full-duplex-processed-141h
* Full-duplex, spontaneous multi-speaker conversations
* Participants filtered for high audio quality
* PII removal and audio enhancement applied
* Designed for training and benchmarking S2S or dialogue models

Dataset 2: Larger raw(er) release
otoearth/otoSpeech-full-duplex-280h
* Same collection pipeline, with broader coverage
* More diversity in speakers, accents, and conversation styles
* Useful for analysis, filtering, or custom preprocessing experiments

We intentionally split the release to support different research workflows:
clean and ready-to-use vs. more exploratory and research-oriented use.

The datasets are currently private, but we’re happy to approve access requests — feel free to request access if you’re interested.

If you’re working on speech-to-speech (S2S) models or are curious about full-duplex conversational data, we’d love to discuss and exchange ideas together.

Feedback and ideas are very welcome!
  • 2 replies
·