Abstract audio waveform flowing into Unicode Braille cells

HAPTOS VIBRA V1

HAPTOS VIBRA V1 is a Whisper Large-v3 fine-tune for direct speech-to-Unicode-Braille generation. It maps spoken audio into normalized Braille text without an intermediate Latin transcription step.

The current release targets:

Language Output system Control token
English UEB Grade 1, Unicode Braille `<
French BFU integral 6-dot, Unicode Braille `<
Evaluation metrics for HAPTOS VIBRA V1

Intended Use

This model is intended for research and controlled product prototyping around accessible audio-to-Braille workflows:

  • speech-to-Braille experiments for English UEB Grade 1 and French BFU integral output
  • evaluation of direct Braille decoding versus ASR-plus-translator cascades
  • internal accessibility tooling where model output is reviewed or post-processed before use

This release is not a certified Braille transcription system and should not be used as the sole source for legal, medical, educational, or high-stakes accessible documents.

Model Details

HAPTOS VIBRA architecture and release boundary
  • Base model: openai/whisper-large-v3
  • Architecture: Whisper encoder-decoder, fine-tuned for Braille token generation
  • Checkpoint format: sharded safetensors
  • Tokenizer: Whisper tokenizer extended with HAPTOS language/control tokens and Braille output tokens
  • Primary output: Unicode Braille text
  • Raw training data: not redistributed in this repository

Evaluation Snapshot

The release card reports the strongest available held-out evaluation artifact currently packaged with the project.

Split Language Samples Braille CER Dot error rate Valid Braille Latin leakage
held-out English 2,000 3.47% 9.23% 100.00% 0.00%

Additional smoke evaluations were run during the training workflow:

Run Language Samples Braille CER Dot error rate Valid Braille Latin leakage
vibra_alpha_large/eval_smoke English 40 1.23% 3.16% 100.00% 0.00%
vibra_alpha_stage1/eval_smoke English 40 1.98% 5.97% 100.00% 0.00%

Metrics are computed on Unicode Braille output:

  • Braille CER: character error rate over Braille characters
  • Dot error rate: mismatch rate over six-dot Braille dot patterns
  • Valid Braille rate: share of outputs containing valid Braille characters
  • Latin leakage rate: share of outputs leaking Latin text

Usage

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

repo_id = "annaelmoussa/haptos-vibra"

processor = AutoProcessor.from_pretrained(repo_id)
model = AutoModelForSpeechSeq2Seq.from_pretrained(repo_id)

asr = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
)

result = asr("sample.wav")
print(result["text"])

For deterministic language routing, prepend or force the corresponding HAPTOS control token in the generation path used by your application.

Data

Training and evaluation data are derived from Mozilla Data Collective / Common Voice 25 validated speech for English and French. The repository contains model artifacts only. Raw audio archives and licensed source data are intentionally not included.

Data preparation applies:

  • language-specific Braille table conversion with Liblouis
  • duration and text filters
  • speaker-safe train/dev/test splitting
  • output validation for Braille-only generation

Limitations

  • The release card metrics currently emphasize English held-out evaluation. French evaluation should be expanded before any production release.
  • The model may produce Braille errors on names, abbreviations, punctuation-heavy utterances, accents, code-switching, noisy audio, or out-of-domain speech.
  • Generated Braille should be validated by downstream rules or expert review before distribution to end users.
  • This checkpoint inherits Whisper-family behavior and may still hallucinate or normalize content unexpectedly under poor audio conditions.

Release Hygiene

This Hub repository is scoped to deployable model assets:

  • model config and generation config
  • tokenizer and processor files
  • sharded safetensors weights
  • professional model card assets

Optimizer states, trainer checkpoints, raw audio, manifests, temporary logs, and local experiment outputs are deliberately excluded.

Downloads last month
20
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for annaelmoussa/haptos-vibra

Finetuned
(844)
this model

Evaluation results

  • Braille character error rate on Mozilla Data Collective Common Voice 25
    self-reported
    0.035
  • Braille dot error rate on Mozilla Data Collective Common Voice 25
    self-reported
    0.092
  • Valid Braille output rate on Mozilla Data Collective Common Voice 25
    self-reported
    1.000
  • Latin leakage rate on Mozilla Data Collective Common Voice 25
    self-reported
    0.000