# 📚 Dataset Guidelines ## 🏷️ Minimum metadata - Speaker ID (anonymized) - Approximate age band - Gender (optional/self-declared) - Dialect/region - Recording environment and device class ## 🎧 Audio quality basics - Prefer 16kHz+ clean speech - Avoid clipping and heavy background noise - Keep transcript aligned with spoken content ## ✍️ Text policy - Use agreed normalization rules - Keep punctuation consistent - Track alternate spellings in glossary