Spaces:

otoearth
/

README

Running

Data Collection Protocol

by consome2 - opened Oct 8, 2025

oto org Oct 8, 2025

This thread finalizes the technical spec for data collection: sampling/bit depth, per-speaker channels, background noise policy, file naming, and minimal metadata (region, accent, age band, topic tags).

consome2

oto org Oct 8, 2025

•

edited Oct 8, 2025

Data Collection Protocol – Draft Proposal (v0.1)

Summary
Proposed spec for collecting spontaneous two-speaker conversations: audio parameters, per-speaker channeling, metadata, random topic prompts, and an in-recording self-redaction feature.

A. Audio & Channeling

Format: PCM WAV, 48 kHz, 16-bit
Channeling: Dual-mono (Speaker A / Speaker B) per session; optional mixed-down reference track
Environment: Everyday ambient noise allowed; copyrighted audio (music/TV/podcasts) is not allowed

B. Session Design

Session length: 25 minutes (fixed) per session
Chunking: Not used. If 5-minute chunking becomes operationally necessary, we will open a separate discussion before adopting it
Starter prompts: Show 1–3 random topics at session start (e.g., favorite foods, countries visited, recent books)
Task-oriented dialogs: Out of scope for now; may be added if community consensus emerges (e.g., trip planning, brainstorming, tongue-twisters, word-chain)

C. File Naming (TBD)

Example (session-level files):
YYYYMMDD_sessionID_speaker{A|B}.wav
Companion JSON (session-level): YYYYMMDD_sessionID.meta.json
Do not place any personally identifying information in file names.

D. Metadata (per participant / session)
All profile fields are opt-out (per-field). Items may be masked or withheld at release time based on re-identification risk.

speaker_id — stable pseudonymous public ID per participant (e.g., spk_6Z4G3Y9Q). A separate internal ID is maintained privately for withdrawal/compliance workflows
age (e.g., 32)
gender (self-identified)
nationality
birth_country and birth_state/prefecture
accent (e.g., Japanese English)
first_language, second_language
education_level (e.g., bachelor’s degree)
MBTI (optional)
occupation (e.g., entrepreneur)
residence_country
interests (e.g., AI, Crypto)
device/OS/microphone (technical metadata)
network/latency logs (quality metadata)

E. In-Recording Privacy Control

Self-redaction: During recording, a speaker can delete the most recent 10 seconds of their own speech, which is removed on-device and not uploaded
UI hint: A “rewind 10s → delete” control with a confirmation dialog

Open Questions

Naming convention: Start with YYYYMMDD_sessionID_speakerX or include short codes for language/locale/topic?
Chunking trigger: Under what operational conditions (if any) should we introduce 5-minute chunking (e.g., moderation backlog thresholds)?
High-risk fields: For items like alma mater, should we adopt “collect-yes / public-by-request or withheld-by-default”?
Self-redaction duration: Keep 10s fixed, or offer selectable 5/10/20s?
Non-dual-mono submissions: Should we accept WebRTC-style separated streams as an alternative to dual-mono?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment