Braille256-v3: Grade Infinity Universal Braille Model

A 27.8M parameter language model trained natively on Braille Unicode using SentencePiece Unigram tokenization for superior compression.

Key Features

  • SentencePiece Unigram: Likelihood-optimized tokenization (superior to BPE)
  • 4096 Vocabulary: Learned contractions across 7 languages
  • Multilingual: English, French, German, Spanish, Italian, Portuguese, Dutch
  • Cross-Linguistic Patterns: Discovers universal Braille compressions

Learned Contractions

Token Braille Languages
15 ⠞⠓⠑ (the) English
22 ⠟⠥⠑ (que) Spanish/French/Portuguese
23 ⠁⠝⠙ (and) English
25 ⠕⠋ (of) English
17 ⠙⠑ (de) Spanish/French/Italian/Portuguese
18 ⠇⠁ (la) Spanish/French/Italian

Training Details

Metric Value
Parameters 27.8M
Vocabulary 4096 (Unigram)
Training Steps 15,000
Final Loss 2.17
Training Time 3h 22m (MPS)
Corpus 32M Braille chars (7 languages)

Architecture

Hidden Size: 512
Layers: 8
Attention Heads: 8
Max Sequence Length: 1024
Tokenizer: SentencePiece Unigram

Usage

from braille_unigram_model import Braille256UnigramModel, BrailleUnigramTokenizer

model = Braille256UnigramModel.from_pretrained("ryanscottbarrett/braille256-v3")
tokenizer = BrailleUnigramTokenizer.from_pretrained("ryanscottbarrett/braille256-v3/tokenizer")

# Encode Braille text
text = "⠞⠓⠑⠀⠟⠥⠊⠉⠅⠀⠃⠗⠕⠺⠝⠀⠋⠕⠭"
tokens = tokenizer.encode(text)
print(f"Compression: {len(text)}/{len(tokens)} = {len(text)/len(tokens):.2f}x")

Research Goals

This model is part of the Grade Infinity Braille research project:

  1. Can neural networks discover universal Braille contractions?
  2. Do cross-linguistic patterns emerge from multilingual training?
  3. Is SentencePiece Unigram superior to BPE for Braille?

Citation

@misc{braille256v3,
  author = {Ryan Barrett},
  title = {Braille256-v3: Grade Infinity Universal Braille Model},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ryanscottbarrett/braille256-v3}
}

License

MIT

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support