task_categories: - automatic-speech-recognition - text-to-speech language: - rw - en - fr tags: - numeracy - synthetic - education - rwanda

📊 Early Numeracy Math Curriculum (Rwanda)

Dataset Description

This dataset contains a 75-item synthetic curriculum designed for training and evaluating offline AI Math Tutors for early learners (P1-P3) in Rwanda. It includes localized math stems, interrogative phrasing (e.g., using "zingahe?"), and synthetic child audio representations.

Curriculum Structure

The JSON curriculum is divided into five core skill bands (Difficulty 1-10):

counting
addition
subtraction
number_sense
word_problems

Synthetic Audio Generation

To safely simulate early-learner interactions without recording real children, this dataset includes a synthetic audio baseline:

Base Model: Meta MMS (VITS) across EN, FR, and KIN.
Perturbation: Audio is pitch-shifted (+6 semitones) and tempo-adjusted (1.15x) to mimic the acoustic profile of a 6-to-9-year-old child.
Text Normalization: Includes raw digit-to-word mapping to prevent silent audio outputs during equation generation (e.g., 20 + 10 mapped to Twenty plus ten).

Use Cases

Intended for optimizing low-parameter ASR models (like Whisper-Tiny) to recognize high-pitched, code-switched numerical answers in extreme low-resource environments (tablets without GPU/internet).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support