Instructions to use annaelmoussa/haptos-vibra with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use annaelmoussa/haptos-vibra with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="annaelmoussa/haptos-vibra")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("annaelmoussa/haptos-vibra") model = AutoModelForSpeechSeq2Seq.from_pretrained("annaelmoussa/haptos-vibra") - Notebooks
- Google Colab
- Kaggle
HAPTOS VIBRA V1
HAPTOS VIBRA V1 is a Whisper Large-v3 fine-tune for direct speech-to-Unicode-Braille generation. It maps spoken audio into normalized Braille text without an intermediate Latin transcription step.
The current release targets:
| Language | Output system | Control token |
|---|---|---|
| English | UEB Grade 1, Unicode Braille | `< |
| French | BFU integral 6-dot, Unicode Braille | `< |
Intended Use
This model is intended for research and controlled product prototyping around accessible audio-to-Braille workflows:
- speech-to-Braille experiments for English UEB Grade 1 and French BFU integral output
- evaluation of direct Braille decoding versus ASR-plus-translator cascades
- internal accessibility tooling where model output is reviewed or post-processed before use
This release is not a certified Braille transcription system and should not be used as the sole source for legal, medical, educational, or high-stakes accessible documents.
Model Details
- Base model:
openai/whisper-large-v3 - Architecture: Whisper encoder-decoder, fine-tuned for Braille token generation
- Checkpoint format: sharded
safetensors - Tokenizer: Whisper tokenizer extended with HAPTOS language/control tokens and Braille output tokens
- Primary output: Unicode Braille text
- Raw training data: not redistributed in this repository
Evaluation Snapshot
The release card reports the strongest available held-out evaluation artifact currently packaged with the project.
| Split | Language | Samples | Braille CER | Dot error rate | Valid Braille | Latin leakage |
|---|---|---|---|---|---|---|
| held-out | English | 2,000 | 3.47% | 9.23% | 100.00% | 0.00% |
Additional smoke evaluations were run during the training workflow:
| Run | Language | Samples | Braille CER | Dot error rate | Valid Braille | Latin leakage |
|---|---|---|---|---|---|---|
vibra_alpha_large/eval_smoke |
English | 40 | 1.23% | 3.16% | 100.00% | 0.00% |
vibra_alpha_stage1/eval_smoke |
English | 40 | 1.98% | 5.97% | 100.00% | 0.00% |
Metrics are computed on Unicode Braille output:
- Braille CER: character error rate over Braille characters
- Dot error rate: mismatch rate over six-dot Braille dot patterns
- Valid Braille rate: share of outputs containing valid Braille characters
- Latin leakage rate: share of outputs leaking Latin text
Usage
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
repo_id = "annaelmoussa/haptos-vibra"
processor = AutoProcessor.from_pretrained(repo_id)
model = AutoModelForSpeechSeq2Seq.from_pretrained(repo_id)
asr = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
)
result = asr("sample.wav")
print(result["text"])
For deterministic language routing, prepend or force the corresponding HAPTOS control token in the generation path used by your application.
Data
Training and evaluation data are derived from Mozilla Data Collective / Common Voice 25 validated speech for English and French. The repository contains model artifacts only. Raw audio archives and licensed source data are intentionally not included.
Data preparation applies:
- language-specific Braille table conversion with Liblouis
- duration and text filters
- speaker-safe train/dev/test splitting
- output validation for Braille-only generation
Limitations
- The release card metrics currently emphasize English held-out evaluation. French evaluation should be expanded before any production release.
- The model may produce Braille errors on names, abbreviations, punctuation-heavy utterances, accents, code-switching, noisy audio, or out-of-domain speech.
- Generated Braille should be validated by downstream rules or expert review before distribution to end users.
- This checkpoint inherits Whisper-family behavior and may still hallucinate or normalize content unexpectedly under poor audio conditions.
Release Hygiene
This Hub repository is scoped to deployable model assets:
- model config and generation config
- tokenizer and processor files
- sharded
safetensorsweights - professional model card assets
Optimizer states, trainer checkpoints, raw audio, manifests, temporary logs, and local experiment outputs are deliberately excluded.
- Downloads last month
- 20
Model tree for annaelmoussa/haptos-vibra
Base model
openai/whisper-large-v3Evaluation results
- Braille character error rate on Mozilla Data Collective Common Voice 25self-reported0.035
- Braille dot error rate on Mozilla Data Collective Common Voice 25self-reported0.092
- Valid Braille output rate on Mozilla Data Collective Common Voice 25self-reported1.000
- Latin leakage rate on Mozilla Data Collective Common Voice 25self-reported0.000