π Project Purpose
β Why this project exists
Pashto remains underrepresented in open AI speech/language resources. This project exists to close that gap through community collaboration.
π Mission
Create high-quality open resources that enable Pashto to work reliably in:
- Speech recognition (ASR)
- Text-to-speech (TTS)
- Translation and NLP tooling
β What success looks like
- Public Pashto datasets with clear quality standards
- Reproducible baseline models and training pipelines
- Public benchmark/leaderboard for fair model comparison
- Open desktop/API demos that real users can run
ποΈ Non-commercial commitment
This initiative is community-first and public-benefit oriented. The project is not being built for proprietary lock-in or short-term commercialization.
π§ Principles
- Openness: data/model/process transparency
- Inclusivity: dialect and accent diversity
- Quality: strong labeling/review standards
- Reproducibility: scripts, configs, and documented experiments
- Continuity: release cadence and long-term maintenance
π¦ Scope (v1 foundation)
- Build core repository and contributor workflows
- Launch Pashto data collection and validation pipeline
- Publish ASR and TTS baselines
- Publish first benchmark set and metrics
π« Out of scope (for now)
- Closed paid APIs as the only path
- Private datasets without reproducible provenance
- Productization before core language quality is established