task_categories: - automatic-speech-recognition - text-to-speech language: - rw - en - fr tags: - numeracy - synthetic - education - rwanda

πŸ“Š Early Numeracy Math Curriculum (Rwanda)

Dataset Description

This dataset contains a 75-item synthetic curriculum designed for training and evaluating offline AI Math Tutors for early learners (P1-P3) in Rwanda. It includes localized math stems, interrogative phrasing (e.g., using "zingahe?"), and synthetic child audio representations.

Curriculum Structure

The JSON curriculum is divided into five core skill bands (Difficulty 1-10):

  • counting
  • addition
  • subtraction
  • number_sense
  • word_problems

Synthetic Audio Generation

To safely simulate early-learner interactions without recording real children, this dataset includes a synthetic audio baseline:

  • Base Model: Meta MMS (VITS) across EN, FR, and KIN.
  • Perturbation: Audio is pitch-shifted (+6 semitones) and tempo-adjusted (1.15x) to mimic the acoustic profile of a 6-to-9-year-old child.
  • Text Normalization: Includes raw digit-to-word mapping to prevent silent audio outputs during equation generation (e.g., 20 + 10 mapped to Twenty plus ten).

Use Cases

Intended for optimizing low-parameter ASR models (like Whisper-Tiny) to recognize high-pitched, code-switched numerical answers in extreme low-resource environments (tablets without GPU/internet).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support