task_categories: - automatic-speech-recognition - text-to-speech language: - rw - en - fr tags: - numeracy - synthetic - education - rwanda
π Early Numeracy Math Curriculum (Rwanda)
Dataset Description
This dataset contains a 75-item synthetic curriculum designed for training and evaluating offline AI Math Tutors for early learners (P1-P3) in Rwanda. It includes localized math stems, interrogative phrasing (e.g., using "zingahe?"), and synthetic child audio representations.
Curriculum Structure
The JSON curriculum is divided into five core skill bands (Difficulty 1-10):
countingadditionsubtractionnumber_senseword_problems
Synthetic Audio Generation
To safely simulate early-learner interactions without recording real children, this dataset includes a synthetic audio baseline:
- Base Model: Meta MMS (VITS) across EN, FR, and KIN.
- Perturbation: Audio is pitch-shifted (+6 semitones) and tempo-adjusted (1.15x) to mimic the acoustic profile of a 6-to-9-year-old child.
- Text Normalization: Includes raw digit-to-word mapping to prevent silent audio outputs during equation generation (e.g.,
20 + 10mapped toTwenty plus ten).
Use Cases
Intended for optimizing low-parameter ASR models (like Whisper-Tiny) to recognize high-pitched, code-switched numerical answers in extreme low-resource environments (tablets without GPU/internet).
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support