Abstract audio waveform flowing into Unicode Braille cells

HAPTOS VIBRA V1

HAPTOS VIBRA V1 is a Whisper Large-v3 fine-tune for direct speech-to-Unicode-Braille generation. It maps spoken audio into normalized Braille text without an intermediate Latin transcription step.

The current release targets:

Language	Output system	Control token
English	UEB Grade 1, Unicode Braille	`<
French	BFU integral 6-dot, Unicode Braille	`<

Intended Use

This model is intended for research and controlled product prototyping around accessible audio-to-Braille workflows:

speech-to-Braille experiments for English UEB Grade 1 and French BFU integral output
evaluation of direct Braille decoding versus ASR-plus-translator cascades
internal accessibility tooling where model output is reviewed or post-processed before use

This release is not a certified Braille transcription system and should not be used as the sole source for legal, medical, educational, or high-stakes accessible documents.

Model Details

HAPTOS VIBRA architecture and release boundary

Base model: openai/whisper-large-v3
Architecture: Whisper encoder-decoder, fine-tuned for Braille token generation
Checkpoint format: sharded safetensors
Tokenizer: Whisper tokenizer extended with HAPTOS language/control tokens and Braille output tokens
Primary output: Unicode Braille text
Raw training data: not redistributed in this repository

Evaluation Snapshot

The release card reports the strongest available held-out evaluation artifact currently packaged with the project.

Split	Language	Samples	Braille CER	Dot error rate	Valid Braille	Latin leakage
held-out	English	2,000	3.47%	9.23%	100.00%	0.00%

Additional smoke evaluations were run during the training workflow:

Run	Language	Samples	Braille CER	Dot error rate	Valid Braille	Latin leakage
`vibra_alpha_large/eval_smoke`	English	40	1.23%	3.16%	100.00%	0.00%
`vibra_alpha_stage1/eval_smoke`	English	40	1.98%	5.97%	100.00%	0.00%

Metrics are computed on Unicode Braille output:

Braille CER: character error rate over Braille characters
Dot error rate: mismatch rate over six-dot Braille dot patterns
Valid Braille rate: share of outputs containing valid Braille characters
Latin leakage rate: share of outputs leaking Latin text

Usage

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

repo_id = "annaelmoussa/haptos-vibra"

processor = AutoProcessor.from_pretrained(repo_id)
model = AutoModelForSpeechSeq2Seq.from_pretrained(repo_id)

asr = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
)

result = asr("sample.wav")
print(result["text"])

For deterministic language routing, prepend or force the corresponding HAPTOS control token in the generation path used by your application.

Data

Training and evaluation data are derived from Mozilla Data Collective / Common Voice 25 validated speech for English and French. The repository contains model artifacts only. Raw audio archives and licensed source data are intentionally not included.

Data preparation applies:

language-specific Braille table conversion with Liblouis
duration and text filters
speaker-safe train/dev/test splitting
output validation for Braille-only generation

Limitations

The release card metrics currently emphasize English held-out evaluation. French evaluation should be expanded before any production release.
The model may produce Braille errors on names, abbreviations, punctuation-heavy utterances, accents, code-switching, noisy audio, or out-of-domain speech.
Generated Braille should be validated by downstream rules or expert review before distribution to end users.
This checkpoint inherits Whisper-family behavior and may still hallucinate or normalize content unexpectedly under poor audio conditions.

Release Hygiene

This Hub repository is scoped to deployable model assets:

model config and generation config
tokenizer and processor files
sharded safetensors weights
professional model card assets

Optimizer states, trainer checkpoints, raw audio, manifests, temporary logs, and local experiment outputs are deliberately excluded.

Downloads last month: 1

Safetensors

Model size

2B params

Tensor type

F32

Model tree for annaelmoussa/haptos-vibra

Base model

openai/whisper-large-v3

Finetuned

(873)

this model

Evaluation results

Braille character error rate on Mozilla Data Collective Common Voice 25
self-reported

0.035
Braille dot error rate on Mozilla Data Collective Common Voice 25
self-reported

0.092
Valid Braille output rate on Mozilla Data Collective Common Voice 25
self-reported

1.000
Latin leakage rate on Mozilla Data Collective Common Voice 25
self-reported

0.000