# 📚 Dataset Guidelines

## 🏷️ Minimum metadata
- Speaker ID (anonymized)
- Approximate age band
- Gender (optional/self-declared)
- Dialect/region
- Recording environment and device class

## 🎧 Audio quality basics
- Prefer 16kHz+ clean speech
- Avoid clipping and heavy background noise
- Keep transcript aligned with spoken content

## ✍️ Text policy
- Use agreed normalization rules
- Keep punctuation consistent
- Track alternate spellings in glossary