Braille256-v4: 8-dot Grade Infinity Universal Braille Model
The first generative language model trained natively on 8-dot Braille Unicode (U+2800-U+28FF), enabling 256 base patterns for richer multimodal computation.
Why 8-dot Braille?
| Feature | 6-dot (v3) | 8-dot (v4) |
|---|---|---|
| Base patterns | 64 | 256 |
| Byte encoding | Partial | Full |
| Computer Braille | Limited | Native |
| Compression potential | Lower | Higher |
| Multimodal computation | Basic | Rich |
8-dot Braille enables direct byte-to-Braille mapping, making it ideal for:
- Binary data encoding in Braille
- Computer Braille compatibility
- Multi-layer computation with richer pattern space
- Multimodal AI with tactile output
Model Details
| Metric | Value |
|---|---|
| Parameters | 29.9M |
| Vocabulary | 8192 (Unigram) |
| Base Patterns | 256 (8-dot) |
| Training Steps | 15,000 |
| Training Time | 3h 37m (MPS) |
| Corpus | 163M chars (7 languages) |
| Tokens using dots 7/8 | 913 |
Architecture
Hidden Size: 512
Layers: 8
Attention Heads: 8
Max Sequence Length: 1024
Tokenizer: SentencePiece Unigram (8192 vocab)
Learned Contractions (8-dot)
| Token | Braille | Meaning |
|---|---|---|
| 6 | ⠞⠓⠑ | the |
| 10 | ⠁⠝⠙ | and |
| 11 | ⠕⠋ | of |
| 12 | ⠞⠕ | to |
| 18 | ⠞⠓⠁⠞ | that |
| 13 | ⣂ | 8-dot pattern |
Usage
import torch
from train_grade_infinity_8dot import Braille8DotModel, Braille8DotConfig, Braille8DotTokenizer
# Load model
model_dir = "ryanscottbarrett/braille256-v4"
config = Braille8DotConfig.from_pretrained(model_dir)
model = Braille8DotModel(config)
model.load_state_dict(torch.load(f"{model_dir}/pytorch_model.bin", map_location="cpu"))
model.eval()
# Load tokenizer
tokenizer = Braille8DotTokenizer(f"{model_dir}/tokenizer.model")
# Generate
prompt = "⠞⠓⠑⠀"
input_ids = torch.tensor([tokenizer.encode(prompt)])
output = model.generate(input_ids, max_length=50, temperature=0.8)
print(tokenizer.decode(output[0].tolist()))
Byte-to-Braille Encoding
8-dot Braille enables direct binary encoding:
def byte_to_braille(byte: int) -> str:
return chr(0x2800 + byte)
def bytes_to_braille(data: bytes) -> str:
return ''.join(byte_to_braille(b) for b in data)
# Encode any binary data as Braille!
text = "Hello"
braille = bytes_to_braille(text.encode('utf-8'))
print(braille) # ⡈⡥⡬⡬⡯
Research Goals
This model advances the Grade Infinity Braille research:
- 256-pattern vocabulary: Full 8-dot Braille utilization
- Multimodal computation: Richer pattern space for AI
- Universal encoding: Any data → Braille → AI processing
- Accessibility-first AI: Native Braille computation
Comparison with v3
| Model | Patterns | Vocab | Params | Use Case |
|---|---|---|---|---|
| v3 (6-dot) | 64 | 4096 | 27.8M | Literary Braille |
| v4 (8-dot) | 256 | 8192 | 29.9M | Computer/Multimodal |
Citation
@misc{braille256v4,
author = {Ryan Barrett},
title = {Braille256-v4: 8-dot Grade Infinity Universal Braille Model},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/ryanscottbarrett/braille256-v4}
}
License
MIT
- Downloads last month
- 13