| # 📚 Dataset Guidelines | |
| ## 🏷️ Minimum metadata | |
| - Speaker ID (anonymized) | |
| - Approximate age band | |
| - Gender (optional/self-declared) | |
| - Dialect/region | |
| - Recording environment and device class | |
| ## 🎧 Audio quality basics | |
| - Prefer 16kHz+ clean speech | |
| - Avoid clipping and heavy background noise | |
| - Keep transcript aligned with spoken content | |
| ## ✍️ Text policy | |
| - Use agreed normalization rules | |
| - Keep punctuation consistent | |
| - Track alternate spellings in glossary | |