Upload braille256-v6: Lattice-aware multimodal Braille model

Browse files

Files changed (8) hide show

README.md +186 -0
braille_lattice_theory.py +1010 -0
config.json +18 -0
final_eval.json +5 -0
pytorch_model.bin +3 -0
tokenizer.model +3 -0
train_lattice_v6.py +990 -0
training_log.json +602 -0

README.md ADDED Viewed

	@@ -0,0 +1,186 @@

+# braille256-v6: Lattice-Aware Multimodal Braille Model
+**The first LLM with explicit dot-lattice structure in its architecture.**
+## Model Description
+braille256-v6 builds on the multimodal foundation of v5, integrating formal lattice theory into the training pipeline. This is not just a Braille-native model—it's a **lattice-native** model that understands the mathematical structure of Braille at the architectural level.
+### Key Innovations
+| Feature | Description |
+|---------|-------------|
+| **Lattice Attention** | Attention scores incorporate Hamming-based similarity on Braille cells |
+| **Lattice Embeddings** | Token embeddings initialized to respect Boolean lattice structure |
+| **Morphological Regularization** | Training loss includes equivariance under erosion/dilation |
+| **Haptic Evaluation** | New metrics for tactile quality of outputs |
+## Architecture
+```
+Parameters: ~12M
+Layers: 4
+Heads: 4
+Hidden: 256
+Vocab: 32,000 (SentencePiece)
+Context: 512
+```
+### Lattice Attention
+Standard transformer attention computes:
+```
+Attention(Q, K, V) = softmax(QK^T / √d) V
+```
+Lattice attention blends this with Braille-aware similarity:
+```
+LatticeAttn = (1-λ) * StandardAttn + λ * HammingAttn
+where HammingAttn[i,j] = 8 - popcount(token[i] XOR token[j])
+```
+This gives the model an inductive bias toward understanding Braille structure.
+### Lattice Embeddings
+For the first 256 tokens (corresponding to Braille cells), embeddings are initialized as:
+```python
+embedding[i] = Σ basis[b] for each raised dot b in cell i
+```
+This means similar Braille cells (low Hamming distance) start with similar embeddings.
+### Morphological Regularization
+Training includes a regularization term:
+```
+L_morph = ReLU(||emb - erode(emb)|| - ||emb - dilate(emb)||)
+```
+This encourages embeddings to respect the lattice ordering: `erode(x) ≤ x ≤ dilate(x)`.
+## Theoretical Foundation
+This model implements the formal theory from:
+**"Theoretical Foundations for 8-Dot Braille-Native LLMs"**
+Key theoretical components:
+1. **Braille Lattice**: Boolean algebra (B⁸, ∧, ∨, ¬) with 256 elements
+2. **Morphological Operators**: Erosion, dilation, opening, closing
+3. **Modality-Invariant Representation**: (modality, sequence, embedding) triple
+4. **Lattice Metrics**: Hamming distance, Jaccard similarity
+See: `braille_lattice_theory.py` for full implementation.
+## Modality Support
+| Modality | Header | Status |
+|----------|--------|--------|
+| TEXT | ⣿⠁ | ✅ Trained |
+| IMAGE | ⣿⠃ | ✅ Trained |
+| AUDIO | ⣿⠇ | ✅ Trained |
+| BINARY | ⣿⠏ | ✅ Trained |
+| VIDEO | ⣿⠗ | 🔄 Framework ready |
+## Haptic Evaluation Metrics
+v6 introduces new evaluation metrics for tactile quality:
+| Metric | Description | Target | **Achieved** |
+|--------|-------------|--------|-------------|
+| **Lattice Coherence** | Adjacent tokens have low Hamming distance | > 0.7 | **0.743** ✅ |
+| **Morphological Stability** | Outputs stable under erosion/dilation | > 0.5 | **0.453** |
+| **Haptic Score** | Combined tactile quality metric | > 0.5 | **0.598** ✅ |
+## Training Results
+| Metric | Value |
+|--------|-------|
+| Final Loss | 1.23 |
+| Training Steps | 10,000 |
+| Training Time | 2h 7m |
+| Corpus | Balanced multimodal (25% each: text, image, audio, binary) |
+| Corpus Size | 164M chars |
+## Usage
+```python
+import torch
+from train_lattice_v6 import Braille256LatticeModel, LatticeConfig
+# Load model
+config = LatticeConfig.from_dict(json.load(open("config.json")))
+model = Braille256LatticeModel(config)
+model.load_state_dict(torch.load("pytorch_model.bin"))
+# Generate
+input_ids = torch.tensor([[0x28, 0x29, 0x2A]])  # Some Braille tokens
+output = model.generate(input_ids, max_length=100)
+```
+## Training
+```bash
+python train_lattice_v6.py \
+    --corpus corpus/braille_multimodal_corpus.txt \
+    --tokenizer tokenizers/braille_8dot_32k/braille_8dot_32k.model \
+    --output models/braille256_v6_lattice \
+    --steps 10000
+```
+### Training Options
+| Flag | Description |
+|------|-------------|
+| `--no-lattice-attention` | Disable lattice attention (ablation) |
+| `--no-lattice-embeddings` | Disable lattice embeddings (ablation) |
+| `--no-morph-regularization` | Disable morphological regularization (ablation) |
+## Model Family
+| Version | Focus | Parameters | Key Feature |
+|---------|-------|------------|-------------|
+| v1-v3 | 6-dot Braille | ~10M | Basic Braille LM |
+| v4 | 8-dot Braille | 29.9M | Full byte encoding |
+| v5 | Multimodal | 11.5M | TEXT/IMAGE/AUDIO/BINARY |
+| **v6** | **Lattice-aware** | **11.5M** | **Hamming attention, morphological regularization, balanced multimodal corpus** |
+## Why Lattice-Aware?
+Standard LLMs treat tokens as arbitrary symbols. braille256-v6 knows that:
+1. **Braille cells form a lattice**: 256 elements with meet (∧) and join (∨)
+2. **Similar cells should have similar representations**: Hamming distance matters
+3. **Morphological operations preserve meaning**: Erosion/dilation are semantic
+4. **Tactile quality is measurable**: Haptic metrics evaluate output quality
+This makes v6 the first LLM designed for **tactile-first AI**.
+## Citation
+```bibtex
+@misc{braille256v6,
+  author = {Barrett, Ryan},
+  title = {braille256-v6: Lattice-Aware Multimodal Braille Model},
+  year = {2024},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/ryanscottbarrett/braille256-v6}
+}
+```
+## License
+MIT
+## Links
+- [braille256-v5](https://huggingface.co/ryanscottbarrett/braille256-v5)
+- [braille256-v4](https://huggingface.co/ryanscottbarrett/braille256-v4)
+- [Theoretical Paper](docs/BRAILLE_NATIVE_LLM_THEORY.md)
+- [Lattice Theory Implementation](src/braille_lattice_theory.py)
+---
+⣿ *The first LLM where Braille is not just the output format, but the computational substrate.* ⣿

braille_lattice_theory.py ADDED Viewed

	@@ -0,0 +1,1010 @@

+#!/usr/bin/env python3
+"""
+Braille Dot-Lattice Theory: Formal Mathematical Framework
+This module formalizes the missing theoretical components for 8-dot Braille-native LLMs:
+1. DOT-LATTICE MORPHOLOGICAL OPERATORS
+   - Boolean algebra on the 8-bit Braille lattice (B⁸, ∧, ∨, ¬, ⊕)
+   - Morphological operations: erosion, dilation, opening, closing
+   - Dot-wise transformations preserving semantic structure
+2. MODALITY-INVARIANT BRAILLE REASONING LOOPS
+   - Unified representation across text, image, audio, binary
+   - Cross-modal attention mechanisms in Braille space
+   - Semantic preservation under modality transformation
+Mathematical Foundation:
+- The 8-dot Braille cell forms a Boolean lattice (B⁸, ≤) where B = {0, 1}
+- Each cell is an 8-dimensional binary vector: c ∈ {0,1}⁸
+- The lattice has 2⁸ = 256 elements with meet (∧) and join (∨) operations
+- This isomorphism to bytes enables direct computational semantics
+Author: Ryan Barrett & Cascade
+Date: December 2024
+"""
+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import List, Dict, Tuple, Set, Callable, Optional, Iterator
+from enum import Enum, auto
+import numpy as np
+from functools import reduce
+import operator
+# =============================================================================
+# SECTION 1: BRAILLE LATTICE FUNDAMENTALS
+# =============================================================================
+# Unicode range for 8-dot Braille
+BRAILLE_BASE = 0x2800
+BRAILLE_MAX = 0x28FF
+# Dot position bit values (standard 8-dot layout)
+# Layout:  1 4
+#          2 5
+#          3 6
+#          7 8
+DOT_BITS = {
+    1: 0b00000001,  # bit 0
+    2: 0b00000010,  # bit 1
+    3: 0b00000100,  # bit 2
+    4: 0b00001000,  # bit 3
+    5: 0b00010000,  # bit 4
+    6: 0b00100000,  # bit 5
+    7: 0b01000000,  # bit 6
+    8: 0b10000000,  # bit 7
+}
+# Inverse mapping
+BIT_TO_DOT = {v: k for k, v in DOT_BITS.items()}
+@dataclass(frozen=True)
+class BrailleCell:
+    """
+    A single 8-dot Braille cell as an element of the Boolean lattice B⁸.
+    The cell is represented as an 8-bit integer where each bit corresponds
+    to a dot position. This enables efficient lattice operations.
+    Properties:
+    - Immutable (frozen dataclass)
+    - Hashable (can be used in sets/dicts)
+    - Supports all Boolean lattice operations
+    """
+    value: int  # 0-255, representing the 8 dots as bits
+    def __post_init__(self):
+        if not 0 <= self.value <= 255:
+            raise ValueError(f"BrailleCell value must be 0-255, got {self.value}")
+    # --- Lattice Element Properties ---
+    @property
+    def dots(self) -> Tuple[int, ...]:
+        """Return tuple of active dot numbers (1-8)."""
+        return tuple(d for d in range(1, 9) if self.has_dot(d))
+    @property
+    def unicode(self) -> str:
+        """Return the Unicode Braille character."""
+        return chr(BRAILLE_BASE + self.value)
+    @property
+    def vector(self) -> np.ndarray:
+        """Return as 8-dimensional binary vector."""
+        return np.array([(self.value >> i) & 1 for i in range(8)], dtype=np.uint8)
+    @property
+    def cardinality(self) -> int:
+        """Number of raised dots (Hamming weight)."""
+        return bin(self.value).count('1')
+    @property
+    def is_bottom(self) -> bool:
+        """Check if this is ⊥ (empty cell, no dots)."""
+        return self.value == 0
+    @property
+    def is_top(self) -> bool:
+        """Check if this is ⊤ (all dots raised)."""
+        return self.value == 255
+    def has_dot(self, dot: int) -> bool:
+        """Check if a specific dot (1-8) is raised."""
+        return bool(self.value & DOT_BITS[dot])
+    # --- Boolean Lattice Operations ---
+    def meet(self, other: BrailleCell) -> BrailleCell:
+        """
+        Lattice meet (∧): Greatest lower bound.
+        Equivalent to bitwise AND - keeps only dots present in BOTH cells.
+        Semantically: intersection of dot patterns.
+        """
+        return BrailleCell(self.value & other.value)
+    def join(self, other: BrailleCell) -> BrailleCell:
+        """
+        Lattice join (∨): Least upper bound.
+        Equivalent to bitwise OR - raises dots present in EITHER cell.
+        Semantically: union of dot patterns.
+        """
+        return BrailleCell(self.value | other.value)
+    def complement(self) -> BrailleCell:
+        """
+        Lattice complement (¬): Invert all dots.
+        Equivalent to bitwise NOT (masked to 8 bits).
+        Semantically: tactile negative.
+        """
+        return BrailleCell((~self.value) & 0xFF)
+    def symmetric_difference(self, other: BrailleCell) -> BrailleCell:
+        """
+        Symmetric difference (⊕): XOR operation.
+        Dots present in exactly one of the two cells.
+        Semantically: tactile contrast/difference.
+        """
+        return BrailleCell(self.value ^ other.value)
+    def implies(self, other: BrailleCell) -> BrailleCell:
+        """
+        Material implication (→): ¬self ∨ other.
+        In lattice terms: self ≤ other iff (self → other) = ⊤
+        """
+        return self.complement().join(other)
+    # --- Partial Order ---
+    def __le__(self, other: BrailleCell) -> bool:
+        """Lattice ordering: self ≤ other iff self ∧ other = self."""
+        return (self.value & other.value) == self.value
+    def __lt__(self, other: BrailleCell) -> bool:
+        """Strict ordering: self < other iff self ≤ other and self ≠ other."""
+        return self <= other and self.value != other.value
+    def __ge__(self, other: BrailleCell) -> bool:
+        return other <= self
+    def __gt__(self, other: BrailleCell) -> bool:
+        return other < self
+    # --- Operator Overloads ---
+    def __and__(self, other: BrailleCell) -> BrailleCell:
+        return self.meet(other)
+    def __or__(self, other: BrailleCell) -> BrailleCell:
+        return self.join(other)
+    def __invert__(self) -> BrailleCell:
+        return self.complement()
+    def __xor__(self, other: BrailleCell) -> BrailleCell:
+        return self.symmetric_difference(other)
+    def __repr__(self) -> str:
+        return f"BrailleCell({self.unicode}, dots={self.dots}, value={self.value})"
+    # --- Constructors ---
+    @classmethod
+    def from_unicode(cls, char: str) -> BrailleCell:
+        """Create from Unicode Braille character."""
+        code = ord(char)
+        if not BRAILLE_BASE <= code <= BRAILLE_MAX:
+            raise ValueError(f"Not a Braille character: {char}")
+        return cls(code - BRAILLE_BASE)
+    @classmethod
+    def from_dots(cls, *dots: int) -> BrailleCell:
+        """Create from dot numbers (1-8)."""
+        value = 0
+        for d in dots:
+            if 1 <= d <= 8:
+                value |= DOT_BITS[d]
+        return cls(value)
+    @classmethod
+    def from_byte(cls, byte: int) -> BrailleCell:
+        """Create from byte value (0-255)."""
+        return cls(byte & 0xFF)
+    @classmethod
+    def from_vector(cls, vec: np.ndarray) -> BrailleCell:
+        """Create from 8-dimensional binary vector."""
+        value = sum(int(vec[i]) << i for i in range(8))
+        return cls(value)
+    @classmethod
+    def bottom(cls) -> BrailleCell:
+        """Return ⊥ (empty cell)."""
+        return cls(0)
+    @classmethod
+    def top(cls) -> BrailleCell:
+        """Return ⊤ (all dots raised)."""
+        return cls(255)
+# =============================================================================
+# SECTION 2: DOT-LATTICE MORPHOLOGICAL OPERATORS
+# =============================================================================
+class MorphologicalOperator(Enum):
+    """Morphological operations on the Braille lattice."""
+    EROSION = auto()      # Shrink patterns
+    DILATION = auto()     # Expand patterns
+    OPENING = auto()      # Erosion then dilation (remove small protrusions)
+    CLOSING = auto()      # Dilation then erosion (fill small gaps)
+    GRADIENT = auto()     # Dilation - Erosion (edge detection)
+    TOP_HAT = auto()      # Original - Opening (extract bright details)
+    BLACK_HAT = auto()    # Closing - Original (extract dark details)
+@dataclass
+class StructuringElement:
+    """
+    A structuring element for morphological operations on Braille cells.
+    In classical morphology, the structuring element defines the neighborhood.
+    For Braille, we define it as a set of dot positions that form the "kernel".
+    Common structuring elements:
+    - COLUMN_LEFT: dots 1,2,3,7 (left column)
+    - COLUMN_RIGHT: dots 4,5,6,8 (right column)
+    - ROW_TOP: dots 1,4 (top row)
+    - CROSS: dots 2,4,5 (cross pattern)
+    - FULL: all 8 dots
+    """
+    dots: Set[int]
+    name: str = ""
+    @property
+    def cell(self) -> BrailleCell:
+        """Convert to BrailleCell."""
+        return BrailleCell.from_dots(*self.dots)
+    # Predefined structuring elements
+    @classmethod
+    def column_left(cls) -> StructuringElement:
+        return cls({1, 2, 3, 7}, "COLUMN_LEFT")
+    @classmethod
+    def column_right(cls) -> StructuringElement:
+        return cls({4, 5, 6, 8}, "COLUMN_RIGHT")
+    @classmethod
+    def row_top(cls) -> StructuringElement:
+        return cls({1, 4}, "ROW_TOP")
+    @classmethod
+    def row_middle(cls) -> StructuringElement:
+        return cls({2, 5}, "ROW_MIDDLE")
+    @classmethod
+    def row_bottom(cls) -> StructuringElement:
+        return cls({3, 6}, "ROW_BOTTOM")
+    @classmethod
+    def row_extension(cls) -> StructuringElement:
+        return cls({7, 8}, "ROW_EXTENSION")
+    @classmethod
+    def cross(cls) -> StructuringElement:
+        return cls({2, 4, 5}, "CROSS")
+    @classmethod
+    def full(cls) -> StructuringElement:
+        return cls({1, 2, 3, 4, 5, 6, 7, 8}, "FULL")
+    @classmethod
+    def six_dot(cls) -> StructuringElement:
+        """Traditional 6-dot Braille subset."""
+        return cls({1, 2, 3, 4, 5, 6}, "SIX_DOT")
+class BrailleMorphology:
+    """
+    Morphological operators on the Braille dot-lattice.
+    These operators enable pattern transformation while preserving
+    structural relationships in the lattice.
+    Key insight: Morphological operations on Braille cells can be
+    computed efficiently using Boolean operations on the underlying
+    8-bit representation.
+    """
+    @staticmethod
+    def erode(cell: BrailleCell, se: StructuringElement) -> BrailleCell:
+        """
+        Erosion: Keep only dots that have ALL structuring element dots present.
+        ε_B(X) = {x : B_x ⊆ X}
+        For single cell: result has dot d iff for all dots s in SE,
+        the cell has dot at position (d + s - 1) mod 8 + 1
+        Simplified for single cell: AND with structuring element.
+        """
+        return cell & se.cell
+    @staticmethod
+    def dilate(cell: BrailleCell, se: StructuringElement) -> BrailleCell:
+        """
+        Dilation: Raise dots if ANY structuring element dot is present.
+        δ_B(X) = {x : B_x ∩ X ≠ ∅}
+        For single cell: OR with structuring element.
+        """
+        return cell | se.cell
+    @staticmethod
+    def opening(cell: BrailleCell, se: StructuringElement) -> BrailleCell:
+        """
+        Opening: Erosion followed by dilation.
+        γ_B(X) = δ_B(ε_B(X))
+        Effect: Removes small protrusions, smooths from outside.
+        """
+        eroded = BrailleMorphology.erode(cell, se)
+        return BrailleMorphology.dilate(eroded, se)
+    @staticmethod
+    def closing(cell: BrailleCell, se: StructuringElement) -> BrailleCell:
+        """
+        Closing: Dilation followed by erosion.
+        φ_B(X) = ε_B(δ_B(X))
+        Effect: Fills small gaps, smooths from inside.
+        """
+        dilated = BrailleMorphology.dilate(cell, se)
+        return BrailleMorphology.erode(dilated, se)
+    @staticmethod
+    def gradient(cell: BrailleCell, se: StructuringElement) -> BrailleCell:
+        """
+        Morphological gradient: Dilation - Erosion (via XOR).
+        ρ_B(X) = δ_B(X) - ε_B(X)
+        Effect: Edge detection - dots that differ between dilation and erosion.
+        """
+        dilated = BrailleMorphology.dilate(cell, se)
+        eroded = BrailleMorphology.erode(cell, se)
+        return dilated ^ eroded
+    @staticmethod
+    def top_hat(cell: BrailleCell, se: StructuringElement) -> BrailleCell:
+        """
+        Top-hat transform: Original - Opening.
+        Effect: Extracts bright details smaller than structuring element.
+        """
+        opened = BrailleMorphology.opening(cell, se)
+        return cell ^ opened  # Difference via XOR
+    @staticmethod
+    def black_hat(cell: BrailleCell, se: StructuringElement) -> BrailleCell:
+        """
+        Black-hat transform: Closing - Original.
+        Effect: Extracts dark details smaller than structuring element.
+        """
+        closed = BrailleMorphology.closing(cell, se)
+        return closed ^ cell  # Difference via XOR
+    @staticmethod
+    def hit_or_miss(cell: BrailleCell,
+                    foreground: StructuringElement,
+                    background: StructuringElement) -> bool:
+        """
+        Hit-or-miss transform: Pattern matching.
+        Returns True iff:
+        - All foreground dots are present in cell
+        - All background dots are absent from cell
+        This is the foundation for pattern recognition in Braille.
+        """
+        fg_match = (cell & foreground.cell) == foreground.cell
+        bg_match = (cell & background.cell).is_bottom
+        return fg_match and bg_match
+# =============================================================================
+# SECTION 3: BRAILLE SEQUENCE MORPHOLOGY
+# =============================================================================
+@dataclass
+class BrailleSequence:
+    """
+    A sequence of Braille cells with morphological operations.
+    This extends single-cell morphology to sequences, enabling
+    operations on Braille text/data streams.
+    """
+    cells: List[BrailleCell] = field(default_factory=list)
+    def __len__(self) -> int:
+        return len(self.cells)
+    def __getitem__(self, idx: int) -> BrailleCell:
+        return self.cells[idx]
+    def __iter__(self) -> Iterator[BrailleCell]:
+        return iter(self.cells)
+    @property
+    def unicode(self) -> str:
+        """Return as Unicode Braille string."""
+        return ''.join(c.unicode for c in self.cells)
+    @property
+    def bytes(self) -> bytes:
+        """Return as byte sequence."""
+        return bytes(c.value for c in self.cells)
+    def apply(self, op: Callable[[BrailleCell], BrailleCell]) -> BrailleSequence:
+        """Apply a cell-wise operation to the sequence."""
+        return BrailleSequence([op(c) for c in self.cells])
+    def apply_morphology(self,
+                         operator: MorphologicalOperator,
+                         se: StructuringElement) -> BrailleSequence:
+        """Apply morphological operation to each cell."""
+        ops = {
+            MorphologicalOperator.EROSION: BrailleMorphology.erode,
+            MorphologicalOperator.DILATION: BrailleMorphology.dilate,
+            MorphologicalOperator.OPENING: BrailleMorphology.opening,
+            MorphologicalOperator.CLOSING: BrailleMorphology.closing,
+            MorphologicalOperator.GRADIENT: BrailleMorphology.gradient,
+            MorphologicalOperator.TOP_HAT: BrailleMorphology.top_hat,
+            MorphologicalOperator.BLACK_HAT: BrailleMorphology.black_hat,
+        }
+        op_func = ops[operator]
+        return BrailleSequence([op_func(c, se) for c in self.cells])
+    def convolve(self, kernel: List[BrailleCell],
+                 op: Callable[[BrailleCell, BrailleCell], BrailleCell] = lambda a, b: a & b) -> BrailleSequence:
+        """
+        Convolve sequence with a kernel using specified operation.
+        This enables sliding-window pattern matching and transformation.
+        """
+        if not kernel:
+            return self
+        k_len = len(kernel)
+        result = []
+        for i in range(len(self.cells)):
+            # Apply kernel centered at position i
+            acc = BrailleCell.bottom()
+            for j, k_cell in enumerate(kernel):
+                idx = i - k_len // 2 + j
+                if 0 <= idx < len(self.cells):
+                    acc = acc | op(self.cells[idx], k_cell)
+            result.append(acc)
+        return BrailleSequence(result)
+    def reduce(self,
+               op: Callable[[BrailleCell, BrailleCell], BrailleCell] = lambda a, b: a | b) -> BrailleCell:
+        """Reduce sequence to single cell using operation."""
+        if not self.cells:
+            return BrailleCell.bottom()
+        return reduce(op, self.cells)
+    @classmethod
+    def from_unicode(cls, text: str) -> BrailleSequence:
+        """Create from Unicode Braille string."""
+        cells = []
+        for char in text:
+            code = ord(char)
+            if BRAILLE_BASE <= code <= BRAILLE_MAX:
+                cells.append(BrailleCell(code - BRAILLE_BASE))
+        return cls(cells)
+    @classmethod
+    def from_bytes(cls, data: bytes) -> BrailleSequence:
+        """Create from byte sequence (direct mapping)."""
+        return cls([BrailleCell(b) for b in data])
+# =============================================================================
+# SECTION 4: MODALITY-INVARIANT BRAILLE REPRESENTATION
+# =============================================================================
+class Modality(Enum):
+    """Supported modalities for Braille encoding."""
+    TEXT = auto()
+    IMAGE = auto()
+    AUDIO = auto()
+    BINARY = auto()
+    VIDEO = auto()
+    SEMANTIC = auto()  # Abstract semantic content
+# Modality headers (from braille256-v5)
+MODALITY_HEADERS = {
+    Modality.TEXT: BrailleCell.from_dots(1, 2, 3, 4, 5, 6, 7, 8),    # ⣿ + ⠁ = ⣿⠁
+    Modality.IMAGE: BrailleCell.from_dots(1, 2, 3, 4, 5, 6, 7, 8),   # ⣿ + ⠃ = ⣿⠃
+    Modality.AUDIO: BrailleCell.from_dots(1, 2, 3, 4, 5, 6, 7, 8),   # ⣿ + ⠇ = ⣿⠇
+    Modality.BINARY: BrailleCell.from_dots(1, 2, 3, 4, 5, 6, 7, 8),  # ⣿ + ⠏ = ⣿⠏
+    Modality.VIDEO: BrailleCell.from_dots(1, 2, 3, 4, 5, 6, 7, 8),   # ⣿ + ⠗ = ⣿⠗
+}
+@dataclass
+class ModalityInvariantRepresentation:
+    """
+    A modality-invariant representation in Braille space.
+    Key insight: All modalities can be encoded as byte sequences,
+    and all byte sequences map bijectively to 8-dot Braille.
+    Therefore, Braille provides a universal representation space.
+    The representation consists of:
+    1. Modality header (identifies source modality)
+    2. Semantic embedding (modality-invariant meaning)
+    3. Raw Braille sequence (the actual data)
+    Cross-modal operations preserve semantic content while
+    allowing modality-specific transformations.
+    """
+    modality: Modality
+    sequence: BrailleSequence
+    semantic_embedding: Optional[np.ndarray] = None  # d-dimensional semantic vector
+    metadata: Dict = field(default_factory=dict)
+    @property
+    def header(self) -> BrailleCell:
+        """Get modality header cell."""
+        return MODALITY_HEADERS.get(self.modality, BrailleCell.top())
+    def to_semantic_space(self, encoder: Callable[[BrailleSequence], np.ndarray]) -> np.ndarray:
+        """
+        Project Braille sequence to semantic embedding space.
+        This is where the LLM's learned embeddings come in.
+        The encoder maps Braille tokens to semantic vectors.
+        """
+        if self.semantic_embedding is None:
+            self.semantic_embedding = encoder(self.sequence)
+        return self.semantic_embedding
+    def transform_modality(self,
+                           target: Modality,
+                           transformer: Callable[[BrailleSequence, Modality, Modality], BrailleSequence]
+                           ) -> ModalityInvariantRepresentation:
+        """
+        Transform to a different modality while preserving semantics.
+        The transformer function handles modality-specific conversion
+        while the semantic embedding remains invariant.
+        """
+        new_sequence = transformer(self.sequence, self.modality, target)
+        return ModalityInvariantRepresentation(
+            modality=target,
+            sequence=new_sequence,
+            semantic_embedding=self.semantic_embedding,  # Preserved!
+            metadata={**self.metadata, 'source_modality': self.modality}
+        )
+# =============================================================================
+# SECTION 5: BRAILLE REASONING LOOPS
+# =============================================================================
+@dataclass
+class ReasoningState:
+    """
+    State of a Braille reasoning loop.
+    The reasoning loop operates entirely in Braille space:
+    1. Input: Braille sequence (any modality)
+    2. Transform: Apply morphological/semantic operations
+    3. Attend: Cross-modal attention in Braille space
+    4. Output: Braille sequence (any modality)
+    This enables modality-invariant reasoning where the same
+    operations work regardless of input/output modality.
+    """
+    sequence: BrailleSequence
+    attention_weights: Optional[np.ndarray] = None
+    hidden_state: Optional[np.ndarray] = None
+    step: int = 0
+    def apply_attention(self,
+                        query: BrailleSequence,
+                        key: BrailleSequence,
+                        value: BrailleSequence) -> BrailleSequence:
+        """
+        Cross-modal attention in Braille space.
+        Attention is computed on the lattice structure:
+        - Query, Key, Value are all Braille sequences
+        - Similarity is measured via lattice distance
+        - Output is weighted combination in Braille space
+        Lattice distance: d(a, b) = |a ⊕ b|  (Hamming distance)
+        """
+        if len(query) == 0 or len(key) == 0:
+            return value
+        # Compute attention scores based on lattice similarity
+        scores = np.zeros((len(query), len(key)))
+        for i, q in enumerate(query):
+            for j, k in enumerate(key):
+                # Similarity = 8 - Hamming distance (higher = more similar)
+                diff = q ^ k
+                scores[i, j] = 8 - diff.cardinality
+        # Softmax normalization
+        scores = np.exp(scores - scores.max(axis=1, keepdims=True))
+        self.attention_weights = scores / scores.sum(axis=1, keepdims=True)
+        # Weighted combination of values
+        result = []
+        for i in range(len(query)):
+            # Combine values weighted by attention
+            combined = BrailleCell.bottom()
+            for j, v in enumerate(value):
+                if self.attention_weights[i, j] > 0.1:  # Threshold
+                    combined = combined | v
+            result.append(combined)
+        return BrailleSequence(result)
+class BrailleReasoningLoop:
+    """
+    A modality-invariant reasoning loop operating in Braille space.
+    The loop implements the following cycle:
+    1. ENCODE: Any modality → Braille sequence
+    2. TRANSFORM: Morphological operations on Braille
+    3. ATTEND: Cross-sequence attention in lattice space
+    4. REASON: Apply learned transformations (LLM layers)
+    5. DECODE: Braille sequence → Any modality
+    Key property: Steps 2-4 are MODALITY-INVARIANT.
+    The same operations work for text, images, audio, etc.
+    """
+    def __init__(self,
+                 hidden_dim: int = 256,
+                 num_heads: int = 8,
+                 morphology_se: StructuringElement = None):
+        self.hidden_dim = hidden_dim
+        self.num_heads = num_heads
+        self.morphology_se = morphology_se or StructuringElement.six_dot()
+        self.state = None
+    def encode(self,
+               data: bytes,
+               modality: Modality) -> ModalityInvariantRepresentation:
+        """
+        Encode any modality to Braille representation.
+        This is the entry point: raw bytes → Braille sequence.
+        The modality header is prepended for downstream processing.
+        """
+        sequence = BrailleSequence.from_bytes(data)
+        return ModalityInvariantRepresentation(
+            modality=modality,
+            sequence=sequence
+        )
+    def transform(self,
+                  rep: ModalityInvariantRepresentation,
+                  operator: MorphologicalOperator = MorphologicalOperator.OPENING
+                  ) -> ModalityInvariantRepresentation:
+        """
+        Apply morphological transformation.
+        This step is modality-invariant: the same operation
+        works regardless of whether the input is text, image, etc.
+        """
+        transformed = rep.sequence.apply_morphology(operator, self.morphology_se)
+        return ModalityInvariantRepresentation(
+            modality=rep.modality,
+            sequence=transformed,
+            semantic_embedding=rep.semantic_embedding,
+            metadata=rep.metadata
+        )
+    def attend(self,
+               query_rep: ModalityInvariantRepresentation,
+               context_rep: ModalityInvariantRepresentation
+               ) -> ModalityInvariantRepresentation:
+        """
+        Cross-modal attention between two representations.
+        This enables reasoning across modalities:
+        - Text attending to image
+        - Audio attending to text
+        - Any modality attending to any other
+        The attention operates in Braille lattice space.
+        """
+        if self.state is None:
+            self.state = ReasoningState(sequence=query_rep.sequence)
+        attended = self.state.apply_attention(
+            query=query_rep.sequence,
+            key=context_rep.sequence,
+            value=context_rep.sequence
+        )
+        return ModalityInvariantRepresentation(
+            modality=query_rep.modality,
+            sequence=attended,
+            semantic_embedding=query_rep.semantic_embedding,
+            metadata={**query_rep.metadata, 'attended_modality': context_rep.modality}
+        )
+    def reason(self,
+               rep: ModalityInvariantRepresentation,
+               transform_fn: Callable[[BrailleSequence], BrailleSequence] = None
+               ) -> ModalityInvariantRepresentation:
+        """
+        Apply learned reasoning transformation.
+        In a full LLM, this would be the transformer layers.
+        Here we provide a hook for custom transformations.
+        The key insight: reasoning happens in Braille space,
+        making it inherently modality-invariant.
+        """
+        if transform_fn is None:
+            # Default: identity with morphological smoothing
+            transform_fn = lambda seq: seq.apply_morphology(
+                MorphologicalOperator.CLOSING,
+                self.morphology_se
+            )
+        reasoned = transform_fn(rep.sequence)
+        return ModalityInvariantRepresentation(
+            modality=rep.modality,
+            sequence=reasoned,
+            semantic_embedding=rep.semantic_embedding,
+            metadata=rep.metadata
+        )
+    def decode(self,
+               rep: ModalityInvariantRepresentation,
+               target_modality: Modality = None
+               ) -> bytes:
+        """
+        Decode Braille representation to bytes.
+        This is the exit point: Braille sequence → raw bytes.
+        The target modality determines any post-processing.
+        """
+        return rep.sequence.bytes
+    def full_loop(self,
+                  input_data: bytes,
+                  input_modality: Modality,
+                  context_data: bytes = None,
+                  context_modality: Modality = None,
+                  output_modality: Modality = None
+                  ) -> bytes:
+        """
+        Execute a complete reasoning loop.
+        Input → Encode → Transform → Attend → Reason → Decode → Output
+        All intermediate steps are modality-invariant.
+        """
+        # Encode input
+        rep = self.encode(input_data, input_modality)
+        # Transform
+        rep = self.transform(rep)
+        # Attend to context if provided
+        if context_data is not None:
+            context_rep = self.encode(
+                context_data,
+                context_modality or input_modality
+            )
+            rep = self.attend(rep, context_rep)
+        # Reason
+        rep = self.reason(rep)
+        # Decode
+        return self.decode(rep, output_modality or input_modality)
+# =============================================================================
+# SECTION 6: LATTICE DISTANCE METRICS
+# =============================================================================
+class BrailleMetrics:
+    """
+    Distance and similarity metrics on the Braille lattice.
+    These metrics enable:
+    - Semantic similarity measurement
+    - Clustering in Braille space
+    - Loss functions for training
+    """
+    @staticmethod
+    def hamming_distance(a: BrailleCell, b: BrailleCell) -> int:
+        """
+        Hamming distance: number of differing dots.
+        d_H(a, b) = |a ⊕ b| = popcount(a XOR b)
+        Range: [0, 8]
+        """
+        return (a ^ b).cardinality
+    @staticmethod
+    def jaccard_similarity(a: BrailleCell, b: BrailleCell) -> float:
+        """
+        Jaccard similarity: intersection over union.
+        J(a, b) = |a ∧ b| / |a ∨ b|
+        Range: [0, 1]
+        """
+        intersection = (a & b).cardinality
+        union = (a | b).cardinality
+        if union == 0:
+            return 1.0  # Both empty
+        return intersection / union
+    @staticmethod
+    def lattice_distance(a: BrailleCell, b: BrailleCell) -> int:
+        """
+        Lattice distance: length of shortest path in Hasse diagram.
+        For Boolean lattice: d_L(a, b) = |a ⊕ b| (same as Hamming)
+        """
+        return BrailleMetrics.hamming_distance(a, b)
+    @staticmethod
+    def semantic_distance(a: BrailleCell, b: BrailleCell,
+                          embeddings: Dict[int, np.ndarray] = None) -> float:
+        """
+        Semantic distance using learned embeddings.
+        If embeddings are provided, uses cosine distance in embedding space.
+        Otherwise, falls back to normalized Hamming distance.
+        """
+        if embeddings is not None and a.value in embeddings and b.value in embeddings:
+            vec_a = embeddings[a.value]
+            vec_b = embeddings[b.value]
+            cos_sim = np.dot(vec_a, vec_b) / (np.linalg.norm(vec_a) * np.linalg.norm(vec_b))
+            return 1.0 - cos_sim
+        else:
+            return BrailleMetrics.hamming_distance(a, b) / 8.0
+    @staticmethod
+    def sequence_distance(a: BrailleSequence, b: BrailleSequence,
+                          cell_metric: Callable[[BrailleCell, BrailleCell], float] = None
+                          ) -> float:
+        """
+        Distance between two Braille sequences.
+        Uses dynamic time warping or simple alignment depending on lengths.
+        """
+        if cell_metric is None:
+            cell_metric = lambda x, y: BrailleMetrics.hamming_distance(x, y) / 8.0
+        if len(a) == 0 and len(b) == 0:
+            return 0.0
+        if len(a) == 0 or len(b) == 0:
+            return 1.0
+        # Simple aligned distance for equal lengths
+        if len(a) == len(b):
+            total = sum(cell_metric(a[i], b[i]) for i in range(len(a)))
+            return total / len(a)
+        # DTW for unequal lengths
+        n, m = len(a), len(b)
+        dtw = np.full((n + 1, m + 1), np.inf)
+        dtw[0, 0] = 0
+        for i in range(1, n + 1):
+            for j in range(1, m + 1):
+                cost = cell_metric(a[i-1], b[j-1])
+                dtw[i, j] = cost + min(dtw[i-1, j], dtw[i, j-1], dtw[i-1, j-1])
+        return dtw[n, m] / max(n, m)
+# =============================================================================
+# SECTION 7: DEMONSTRATION AND TESTING
+# =============================================================================
+def demonstrate_lattice_operations():
+    """Demonstrate the Braille lattice operations."""
+    print("=" * 60)
+    print("BRAILLE DOT-LATTICE THEORY DEMONSTRATION")
+    print("=" * 60)
+    # Create some cells
+    cell_a = BrailleCell.from_dots(1, 2, 4)  # ⠋
+    cell_b = BrailleCell.from_dots(2, 4, 5)  # ⠚
+    print(f"\n1. BASIC LATTICE OPERATIONS")
+    print(f"   Cell A: {cell_a}")
+    print(f"   Cell B: {cell_b}")
+    print(f"   A ∧ B (meet):     {cell_a & cell_b}")
+    print(f"   A ∨ B (join):     {cell_a | cell_b}")
+    print(f"   ¬A (complement):  {~cell_a}")
+    print(f"   A ⊕ B (xor):      {cell_a ^ cell_b}")
+    print(f"   A ≤ B:            {cell_a <= cell_b}")
+    # Morphological operations
+    print(f"\n2. MORPHOLOGICAL OPERATIONS")
+    se = StructuringElement.column_left()
+    print(f"   Structuring element: {se.name} = dots {se.dots}")
+    print(f"   Erosion(A, SE):  {BrailleMorphology.erode(cell_a, se)}")
+    print(f"   Dilation(A, SE): {BrailleMorphology.dilate(cell_a, se)}")
+    print(f"   Opening(A, SE):  {BrailleMorphology.opening(cell_a, se)}")
+    print(f"   Closing(A, SE):  {BrailleMorphology.closing(cell_a, se)}")
+    print(f"   Gradient(A, SE): {BrailleMorphology.gradient(cell_a, se)}")
+    # Sequence operations
+    print(f"\n3. SEQUENCE OPERATIONS")
+    text = "Hello"
+    seq = BrailleSequence.from_bytes(text.encode())
+    print(f"   Input text: '{text}'")
+    print(f"   As Braille: {seq.unicode}")
+    print(f"   Dilated:    {seq.apply_morphology(MorphologicalOperator.DILATION, se).unicode}")
+    print(f"   Eroded:     {seq.apply_morphology(MorphologicalOperator.EROSION, se).unicode}")
+    # Distance metrics
+    print(f"\n4. LATTICE METRICS")
+    print(f"   Hamming(A, B):  {BrailleMetrics.hamming_distance(cell_a, cell_b)}")
+    print(f"   Jaccard(A, B):  {BrailleMetrics.jaccard_similarity(cell_a, cell_b):.3f}")
+    print(f"   Lattice(A, B):  {BrailleMetrics.lattice_distance(cell_a, cell_b)}")
+    # Modality-invariant reasoning
+    print(f"\n5. MODALITY-INVARIANT REASONING LOOP")
+    loop = BrailleReasoningLoop()
+    input_text = b"Test input"
+    context = b"Context data"
+    output = loop.full_loop(
+        input_data=input_text,
+        input_modality=Modality.TEXT,
+        context_data=context,
+        context_modality=Modality.TEXT
+    )
+    print(f"   Input:   {input_text}")
+    print(f"   Context: {context}")
+    print(f"   Output:  {output}")
+    print(f"   (Output differs due to morphological transformations)")
+    print("\n" + "=" * 60)
+    print("THEORETICAL FRAMEWORK COMPLETE")
+    print("=" * 60)
+if __name__ == "__main__":
+    demonstrate_lattice_operations()

config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "vocab_size": 32000,
+  "hidden_size": 256,
+  "num_layers": 4,
+  "num_heads": 4,
+  "intermediate_size": 1024,
+  "max_position_embeddings": 512,
+  "dropout": 0.1,
+  "use_lattice_attention": true,
+  "lattice_attention_weight": 0.4,
+  "use_morphological_regularization": true,
+  "morphological_weight": 0.005000000000000001,
+  "use_lattice_embeddings": true,
+  "structuring_element": "six_dot",
+  "embedding_dropout": 0.15,
+  "modality_embedding_dim": 32,
+  "num_modalities": 5
+}

final_eval.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "lattice_coherence": 0.7428203609814639,
+  "morphological_stability": 0.4530333221703768,
+  "haptic_score": 0.5979268415759204
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7721b549cb7724d1f110f0221b98044a1472b95ab01ac4a586dd2ae2cbfa0704
+size 47273908

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec5b8b6fbd8985a97c74d377a83f58ff59f3860d02a343eb15146da467da40ae
+size 1155082

train_lattice_v6.py ADDED Viewed

	@@ -0,0 +1,990 @@

+#!/usr/bin/env python3
+"""
+Train braille256-v6: Lattice-Aware Multimodal Braille Model
+This is the first LLM with explicit dot-lattice structure in its architecture:
+1. Lattice-aware attention (Hamming-based similarity)
+2. Morphological regularization (erosion/dilation as inductive bias)
+3. Lattice-structured embeddings (respecting Boolean algebra)
+4. Modality-invariant reasoning loops
+Building on v5's multimodal foundation, v6 integrates the formal theory
+from braille_lattice_theory.py into the training pipeline.
+Author: Ryan Barrett & Cascade
+Date: December 2024
+"""
+import os
+import sys
+import json
+import math
+import logging
+import argparse
+from dataclasses import dataclass, field
+from typing import Optional, List, Tuple, Dict
+from enum import Enum
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.utils.data import Dataset, DataLoader
+from tqdm import tqdm
+import numpy as np
+import sentencepiece as spm
+# Import lattice theory
+from braille_lattice_theory import (
+    BrailleCell, BrailleMorphology, BrailleSequence,
+    StructuringElement, MorphologicalOperator, BrailleMetrics,
+    BRAILLE_BASE, BRAILLE_MAX
+)
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# =============================================================================
+# Configuration
+# =============================================================================
+@dataclass
+class LatticeConfig:
+    """Configuration for lattice-aware model."""
+    # Model architecture
+    vocab_size: int = 32000
+    hidden_size: int = 256
+    num_layers: int = 4
+    num_heads: int = 4
+    intermediate_size: int = 1024
+    max_position_embeddings: int = 512
+    dropout: float = 0.1
+    # Lattice-specific settings
+    use_lattice_attention: bool = True
+    lattice_attention_weight: float = 0.4  # Blend with standard attention (increased)
+    use_morphological_regularization: bool = True
+    morphological_weight: float = 0.05  # Regularization strength (increased 5x)
+    use_lattice_embeddings: bool = True
+    structuring_element: str = "six_dot"  # Which SE to use
+    embedding_dropout: float = 0.15  # Dropout on embeddings to prevent overfitting
+    # Modality settings
+    modality_embedding_dim: int = 32
+    num_modalities: int = 5  # TEXT, IMAGE, AUDIO, BINARY, VIDEO
+    def to_dict(self):
+        return {k: v for k, v in self.__dict__.items()}
+    @classmethod
+    def from_dict(cls, d):
+        return cls(**{k: v for k, v in d.items() if k in cls.__dataclass_fields__})
+# =============================================================================
+# Lattice-Aware Attention
+# =============================================================================
+class LatticeAttention(nn.Module):
+    """
+    Attention mechanism that incorporates Braille lattice structure.
+    Key innovation: Combines standard softmax attention with lattice-based
+    similarity computed via Hamming distance on the underlying Braille cells.
+    For tokens that map to Braille cells, we compute:
+        lattice_sim(a, b) = 8 - Hamming(a ⊕ b)
+    This is then blended with standard QK^T attention.
+    """
+    def __init__(self, config: LatticeConfig):
+        super().__init__()
+        self.config = config
+        self.num_heads = config.num_heads
+        self.head_dim = config.hidden_size // config.num_heads
+        self.lattice_weight = config.lattice_attention_weight
+        self.q_proj = nn.Linear(config.hidden_size, config.hidden_size)
+        self.k_proj = nn.Linear(config.hidden_size, config.hidden_size)
+        self.v_proj = nn.Linear(config.hidden_size, config.hidden_size)
+        self.out_proj = nn.Linear(config.hidden_size, config.hidden_size)
+        self.dropout = nn.Dropout(config.dropout)
+        # Learnable lattice attention temperature
+        self.lattice_temperature = nn.Parameter(torch.ones(1))
+        # Precompute Hamming distance matrix for all 256 Braille cells
+        self._precompute_hamming_matrix()
+    def _precompute_hamming_matrix(self):
+        """Precompute pairwise Hamming distances for efficiency."""
+        hamming = torch.zeros(256, 256)
+        for i in range(256):
+            for j in range(256):
+                # XOR and count bits
+                xor = i ^ j
+                hamming[i, j] = bin(xor).count('1')
+        # Convert to similarity: 8 - hamming (range [0, 8])
+        self.register_buffer('lattice_similarity', 8 - hamming)
+    def _get_braille_values(self, token_ids: torch.Tensor, sp_model) -> torch.Tensor:
+        """
+        Extract Braille cell values from token IDs (vectorized).
+        For tokens that decode to Braille characters, return their cell value.
+        For others, return -1 (will be masked in attention).
+        """
+        # Vectorized: tokens < 256 are treated as Braille cells
+        braille_values = torch.where(
+            token_ids < 256,
+            token_ids,
+            torch.full_like(token_ids, -1)
+        )
+        return braille_values
+    def compute_lattice_attention(self, braille_values: torch.Tensor) -> torch.Tensor:
+        """
+        Compute attention scores based on lattice similarity (fully vectorized).
+        Returns attention logits of shape (B, T, T).
+        """
+        B, T = braille_values.shape
+        # Mask for valid Braille values
+        valid_mask = (braille_values >= 0).float()
+        # Clamp to valid range for indexing
+        safe_values = braille_values.clamp(0, 255).long()
+        # Vectorized lookup: use advanced indexing
+        # Flatten batch for indexing, then reshape
+        flat_values = safe_values.view(-1)  # (B*T,)
+        # Get similarity rows for each token
+        # lattice_similarity is (256, 256), we want (B*T, 256)
+        sim_rows = self.lattice_similarity[flat_values]  # (B*T, 256)
+        # Now index columns: for each pair (i, j), get sim_rows[i, safe_values[j]]
+        # Reshape to (B, T, 256) then gather along last dim
+        sim_rows = sim_rows.view(B, T, 256)  # (B, T, 256)
+        # Expand safe_values for gathering: (B, T) -> (B, T, T)
+        indices = safe_values.unsqueeze(1).expand(B, T, T)  # (B, T, T)
+        # Gather: for each (b, i, j), get sim_rows[b, i, safe_values[b, j]]
+        lattice_attn = torch.gather(sim_rows, 2, indices.transpose(1, 2)).transpose(1, 2)
+        # Apply temperature
+        lattice_attn = lattice_attn / (self.lattice_temperature + 1e-6)
+        # Mask invalid positions
+        valid_2d = valid_mask.unsqueeze(2) * valid_mask.unsqueeze(1)  # (B, T, T)
+        lattice_attn = lattice_attn * valid_2d
+        return lattice_attn
+    def forward(self, x: torch.Tensor, mask: torch.Tensor = None,
+                token_ids: torch.Tensor = None) -> torch.Tensor:
+        B, T, C = x.shape
+        # Standard attention
+        q = self.q_proj(x).view(B, T, self.num_heads, self.head_dim).transpose(1, 2)
+        k = self.k_proj(x).view(B, T, self.num_heads, self.head_dim).transpose(1, 2)
+        v = self.v_proj(x).view(B, T, self.num_heads, self.head_dim).transpose(1, 2)
+        # Standard QK^T attention
+        standard_attn = (q @ k.transpose(-2, -1)) / math.sqrt(self.head_dim)
+        # Lattice attention (if enabled and token_ids provided)
+        if self.config.use_lattice_attention and token_ids is not None:
+            # Compute lattice-based attention
+            braille_values = self._get_braille_values(token_ids, None)
+            lattice_attn = self.compute_lattice_attention(braille_values)
+            # Expand for heads: (B, T, T) -> (B, num_heads, T, T)
+            lattice_attn = lattice_attn.unsqueeze(1).expand(-1, self.num_heads, -1, -1)
+            # Blend standard and lattice attention
+            attn = (1 - self.lattice_weight) * standard_attn + self.lattice_weight * lattice_attn
+        else:
+            attn = standard_attn
+        # Apply causal mask
+        if mask is not None:
+            attn = attn.masked_fill(mask == 0, float('-inf'))
+        attn = F.softmax(attn, dim=-1)
+        attn = self.dropout(attn)
+        out = (attn @ v).transpose(1, 2).contiguous().view(B, T, C)
+        return self.out_proj(out)
+# =============================================================================
+# Lattice-Aware Embeddings
+# =============================================================================
+class LatticeEmbedding(nn.Module):
+    """
+    Token embeddings that respect Braille lattice structure.
+    Key insight: Initialize embeddings so that similar Braille cells
+    (low Hamming distance) have similar embeddings.
+    This provides an inductive bias that helps the model learn
+    patterns in the lattice structure.
+    """
+    def __init__(self, config: LatticeConfig):
+        super().__init__()
+        self.config = config
+        # Standard embedding
+        self.embedding = nn.Embedding(config.vocab_size, config.hidden_size)
+        # Lattice structure embedding (for first 256 tokens = Braille cells)
+        self.lattice_embedding = nn.Embedding(256, config.hidden_size)
+        # Initialize lattice embeddings with structure
+        self._init_lattice_structure()
+        # Learnable blend weight
+        self.lattice_blend = nn.Parameter(torch.tensor(0.1))
+    def _init_lattice_structure(self):
+        """Initialize embeddings to reflect lattice structure."""
+        with torch.no_grad():
+            # Each Braille cell is an 8-bit vector
+            # Map each bit to a learned direction in embedding space
+            # Create 8 basis vectors (one per dot)
+            basis = torch.randn(8, self.config.hidden_size) * 0.1
+            for i in range(256):
+                # Get the bits of this cell
+                bits = [(i >> b) & 1 for b in range(8)]
+                # Embedding is sum of basis vectors for raised dots
+                emb = torch.zeros(self.config.hidden_size)
+                for b, bit in enumerate(bits):
+                    if bit:
+                        emb += basis[b]
+                self.lattice_embedding.weight[i] = emb
+    def forward(self, token_ids: torch.Tensor, training: bool = True) -> torch.Tensor:
+        # Standard embedding
+        std_emb = self.embedding(token_ids)
+        if self.config.use_lattice_embeddings:
+            # For tokens < 256, blend with lattice embedding
+            mask = (token_ids < 256).float().unsqueeze(-1)
+            safe_ids = token_ids.clamp(0, 255)
+            lat_emb = self.lattice_embedding(safe_ids)
+            # Blend: standard + lattice_blend * lattice (for Braille tokens only)
+            std_emb = std_emb + mask * self.lattice_blend * lat_emb
+        # Apply embedding dropout during training to prevent overfitting
+        if training and self.config.embedding_dropout > 0:
+            std_emb = F.dropout(std_emb, p=self.config.embedding_dropout, training=True)
+        return std_emb
+# =============================================================================
+# Morphological Regularization
+# =============================================================================
+class MorphologicalRegularizer(nn.Module):
+    """
+    Regularization based on morphological operations.
+    Encourages the model to learn representations that are
+    consistent under morphological transformations (erosion, dilation).
+    Loss = ||f(erode(x)) - erode(f(x))||² + ||f(dilate(x)) - dilate(f(x))||²
+    This is a form of equivariance regularization.
+    """
+    def __init__(self, config: LatticeConfig):
+        super().__init__()
+        self.config = config
+        # Get structuring element
+        se_map = {
+            'six_dot': StructuringElement.six_dot(),
+            'column_left': StructuringElement.column_left(),
+            'column_right': StructuringElement.column_right(),
+            'full': StructuringElement.full(),
+        }
+        self.se = se_map.get(config.structuring_element, StructuringElement.six_dot())
+        self.se_value = self.se.cell.value
+    def apply_morphology(self, token_ids: torch.Tensor,
+                         op: str = 'erode') -> torch.Tensor:
+        """Apply morphological operation to token IDs."""
+        result = token_ids.clone()
+        mask = token_ids < 256  # Only apply to Braille tokens
+        if op == 'erode':
+            # Erosion: AND with structuring element
+            result[mask] = token_ids[mask] & self.se_value
+        elif op == 'dilate':
+            # Dilation: OR with structuring element
+            result[mask] = (token_ids[mask] | self.se_value) & 0xFF
+        return result
+    def compute_loss(self, embeddings: torch.Tensor,
+                     token_ids: torch.Tensor,
+                     embedding_layer: nn.Module) -> torch.Tensor:
+        """
+        Compute morphological equivariance loss.
+        We want: embed(morph(x)) ≈ morph_embed(embed(x))
+        Since we can't directly apply morphology to embeddings,
+        we use a proxy: embeddings of morphologically related tokens
+        should be similar.
+        """
+        if not self.config.use_morphological_regularization:
+            return torch.tensor(0.0, device=embeddings.device)
+        # Get eroded and dilated token IDs
+        eroded_ids = self.apply_morphology(token_ids, 'erode')
+        dilated_ids = self.apply_morphology(token_ids, 'dilate')
+        # Get embeddings
+        eroded_emb = embedding_layer(eroded_ids)
+        dilated_emb = embedding_layer(dilated_ids)
+        # Regularization: encourage margin between eroded and dilated distances
+        # Always-on version: penalize deviation from ideal ordering
+        # Ideal: dist_to_eroded < dist_to_original < dist_to_dilated
+        dist_to_eroded = F.mse_loss(embeddings, eroded_emb)
+        dist_to_dilated = F.mse_loss(embeddings, dilated_emb)
+        # Always-on: encourage margin (eroded should be closer than dilated)
+        # Use squared difference for smooth gradient
+        margin_loss = (dist_to_eroded - dist_to_dilated + 0.1).pow(2)
+        # Also add coherence loss: embeddings should be close to their morphological neighbors
+        coherence_loss = dist_to_eroded + dist_to_dilated
+        loss = margin_loss + 0.1 * coherence_loss
+        return loss * self.config.morphological_weight
+# =============================================================================
+# Modality Embedding
+# =============================================================================
+class ModalityEmbedding(nn.Module):
+    """
+    Embeddings for different modalities.
+    Adds a learned embedding based on the detected modality
+    of each token sequence.
+    """
+    # Modality header tokens (from v5)
+    MODALITY_HEADERS = {
+        'TEXT': (0xFF, 0x01),    # ⣿⠁
+        'IMAGE': (0xFF, 0x03),   # ⣿⠃
+        'AUDIO': (0xFF, 0x07),   # ⣿⠇
+        'BINARY': (0xFF, 0x0F),  # ⣿⠏
+        'VIDEO': (0xFF, 0x17),   # ⣿⠗
+    }
+    def __init__(self, config: LatticeConfig):
+        super().__init__()
+        self.embedding = nn.Embedding(config.num_modalities, config.hidden_size)
+    def detect_modality(self, token_ids: torch.Tensor) -> torch.Tensor:
+        """Detect modality from token sequence (simplified)."""
+        # For now, return 0 (TEXT) for all - would need tokenizer to decode
+        return torch.zeros(token_ids.shape[0], dtype=torch.long, device=token_ids.device)
+    def forward(self, token_ids: torch.Tensor) -> torch.Tensor:
+        modality_ids = self.detect_modality(token_ids)
+        return self.embedding(modality_ids).unsqueeze(1)  # (B, 1, H)
+# =============================================================================
+# Full Model
+# =============================================================================
+class FeedForward(nn.Module):
+    def __init__(self, config: LatticeConfig):
+        super().__init__()
+        self.fc1 = nn.Linear(config.hidden_size, config.intermediate_size)
+        self.fc2 = nn.Linear(config.intermediate_size, config.hidden_size)
+        self.dropout = nn.Dropout(config.dropout)
+    def forward(self, x):
+        return self.fc2(self.dropout(F.gelu(self.fc1(x))))
+class LatticeTransformerBlock(nn.Module):
+    """Transformer block with lattice-aware attention."""
+    def __init__(self, config: LatticeConfig):
+        super().__init__()
+        self.ln1 = nn.LayerNorm(config.hidden_size)
+        self.attn = LatticeAttention(config)
+        self.ln2 = nn.LayerNorm(config.hidden_size)
+        self.ff = FeedForward(config)
+        self.dropout = nn.Dropout(config.dropout)
+    def forward(self, x: torch.Tensor, mask: torch.Tensor = None,
+                token_ids: torch.Tensor = None) -> torch.Tensor:
+        x = x + self.dropout(self.attn(self.ln1(x), mask, token_ids))
+        x = x + self.dropout(self.ff(self.ln2(x)))
+        return x
+class Braille256LatticeModel(nn.Module):
+    """
+    braille256-v6: Lattice-Aware Multimodal Braille Model
+    Key innovations over v5:
+    1. LatticeAttention: Hamming-based similarity in attention
+    2. LatticeEmbedding: Structure-aware token embeddings
+    3. MorphologicalRegularizer: Equivariance regularization
+    4. ModalityEmbedding: Explicit modality awareness
+    """
+    def __init__(self, config: LatticeConfig):
+        super().__init__()
+        self.config = config
+        # Lattice-aware embeddings
+        self.token_embedding = LatticeEmbedding(config)
+        self.position_embedding = nn.Embedding(config.max_position_embeddings, config.hidden_size)
+        self.modality_embedding = ModalityEmbedding(config)
+        self.dropout = nn.Dropout(config.dropout)
+        # Transformer layers with lattice attention
+        self.layers = nn.ModuleList([
+            LatticeTransformerBlock(config) for _ in range(config.num_layers)
+        ])
+        self.ln_f = nn.LayerNorm(config.hidden_size)
+        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
+        # Morphological regularizer
+        self.morph_regularizer = MorphologicalRegularizer(config)
+        # Weight tying
+        self.lm_head.weight = self.token_embedding.embedding.weight
+        self.apply(self._init_weights)
+        # Log architecture
+        total_params = sum(p.numel() for p in self.parameters())
+        logger.info(f"Braille256-v6 Lattice Model: {total_params:,} parameters")
+        logger.info(f"  Lattice attention: {config.use_lattice_attention}")
+        logger.info(f"  Lattice embeddings: {config.use_lattice_embeddings}")
+        logger.info(f"  Morphological regularization: {config.use_morphological_regularization}")
+    def _init_weights(self, module):
+        if isinstance(module, nn.Linear):
+            torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
+            if module.bias is not None:
+                torch.nn.init.zeros_(module.bias)
+        elif isinstance(module, nn.Embedding):
+            torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
+    def forward(self, input_ids: torch.Tensor,
+                labels: torch.Tensor = None) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]]:
+        B, T = input_ids.shape
+        # Embeddings
+        positions = torch.arange(T, device=input_ids.device).unsqueeze(0)
+        tok_emb = self.token_embedding(input_ids, training=self.training)
+        pos_emb = self.position_embedding(positions)
+        mod_emb = self.modality_embedding(input_ids)
+        x = tok_emb + pos_emb + mod_emb
+        x = self.dropout(x)
+        # Causal mask
+        mask = torch.tril(torch.ones(T, T, device=input_ids.device)).unsqueeze(0).unsqueeze(0)
+        # Transformer layers
+        for layer in self.layers:
+            x = layer(x, mask, input_ids)
+        x = self.ln_f(x)
+        logits = self.lm_head(x)
+        # Compute losses
+        lm_loss = None
+        morph_loss = None
+        if labels is not None:
+            lm_loss = F.cross_entropy(
+                logits.view(-1, self.config.vocab_size),
+                labels.view(-1),
+                ignore_index=-100
+            )
+            # Morphological regularization
+            morph_loss = self.morph_regularizer.compute_loss(
+                tok_emb, input_ids, self.token_embedding
+            )
+        return logits, lm_loss, morph_loss
+    def generate(self, input_ids: torch.Tensor, max_length: int = 100,
+                 temperature: float = 1.0, top_k: int = 50) -> torch.Tensor:
+        self.eval()
+        with torch.no_grad():
+            for _ in range(max_length):
+                if input_ids.shape[1] >= self.config.max_position_embeddings:
+                    break
+                logits, _, _ = self(input_ids)
+                logits = logits[:, -1, :] / temperature
+                if top_k > 0:
+                    v, _ = torch.topk(logits, top_k)
+                    logits[logits < v[:, [-1]]] = float('-inf')
+                probs = F.softmax(logits, dim=-1)
+                next_token = torch.multinomial(probs, num_samples=1)
+                input_ids = torch.cat([input_ids, next_token], dim=1)
+        return input_ids
+# =============================================================================
+# Dataset (same as v5)
+# =============================================================================
+class MultimodalBrailleDataset(Dataset):
+    def __init__(self, corpus_path: str, tokenizer_path: str,
+                 max_length: int = 512, max_tokens: int = 10_000_000):
+        self.max_length = max_length
+        self.sp = spm.SentencePieceProcessor()
+        self.sp.load(tokenizer_path)
+        logger.info(f"Loading corpus from {corpus_path}...")
+        with open(corpus_path, 'r', encoding='utf-8') as f:
+            text = f.read()
+        if len(text) > max_tokens * 3:
+            logger.info(f"Limiting corpus from {len(text):,} to ~{max_tokens:,} tokens worth")
+            text = text[:max_tokens * 3]
+        logger.info(f"Tokenizing {len(text):,} characters...")
+        self.tokens = self.sp.encode(text)
+        if len(self.tokens) > max_tokens:
+            self.tokens = self.tokens[:max_tokens]
+        logger.info(f"Got {len(self.tokens):,} tokens")
+        self.examples = []
+        stride = max_length // 2
+        for i in range(0, len(self.tokens) - max_length, stride):
+            self.examples.append(i)
+        logger.info(f"Created {len(self.examples):,} training examples")
+    def __len__(self):
+        return len(self.examples)
+    def __getitem__(self, idx):
+        start = self.examples[idx]
+        tokens = self.tokens[start:start + self.max_length + 1]
+        input_ids = torch.tensor(tokens[:-1], dtype=torch.long)
+        labels = torch.tensor(tokens[1:], dtype=torch.long)
+        return input_ids, labels
+# =============================================================================
+# Haptic Evaluation
+# =============================================================================
+class HapticEvaluator:
+    """
+    Evaluate model outputs for haptic/tactile quality.
+    Metrics:
+    1. Lattice coherence: How well outputs respect lattice structure
+    2. Morphological stability: Consistency under erosion/dilation
+    3. Modality preservation: Cross-modal semantic consistency
+    """
+    def __init__(self, config: LatticeConfig):
+        self.config = config
+        self.se = StructuringElement.six_dot()
+    def lattice_coherence(self, token_ids: torch.Tensor) -> float:
+        """
+        Measure how well token sequences respect lattice structure.
+        High coherence = adjacent tokens have low Hamming distance.
+        """
+        if token_ids.shape[-1] < 2:
+            return 1.0
+        total_dist = 0
+        count = 0
+        for i in range(token_ids.shape[-1] - 1):
+            t1 = token_ids[..., i].item() if token_ids[..., i].numel() == 1 else token_ids[0, i].item()
+            t2 = token_ids[..., i+1].item() if token_ids[..., i+1].numel() == 1 else token_ids[0, i+1].item()
+            if t1 < 256 and t2 < 256:
+                # Hamming distance
+                dist = bin(t1 ^ t2).count('1')
+                total_dist += dist
+                count += 1
+        if count == 0:
+            return 1.0
+        # Normalize: 0 = max coherence, 8 = min coherence
+        avg_dist = total_dist / count
+        return 1.0 - (avg_dist / 8.0)
+    def morphological_stability(self, token_ids: torch.Tensor) -> float:
+        """
+        Measure stability under morphological operations.
+        High stability = erosion and dilation don't change meaning drastically.
+        """
+        if token_ids.numel() == 0:
+            return 1.0
+        original = token_ids.clone()
+        # Apply erosion
+        eroded = original.clone()
+        mask = original < 256
+        eroded[mask] = original[mask] & self.se.cell.value
+        # Apply dilation
+        dilated = original.clone()
+        dilated[mask] = (original[mask] | self.se.cell.value) & 0xFF
+        # Measure how much changed
+        erode_change = (original[mask] != eroded[mask]).float().mean().item() if mask.any() else 0
+        dilate_change = (original[mask] != dilated[mask]).float().mean().item() if mask.any() else 0
+        # Stability = 1 - average change
+        return 1.0 - (erode_change + dilate_change) / 2
+    def evaluate(self, model: nn.Module, dataloader: DataLoader,
+                 device: torch.device, num_samples: int = 100) -> Dict[str, float]:
+        """Run full haptic evaluation."""
+        model.eval()
+        coherence_scores = []
+        stability_scores = []
+        with torch.no_grad():
+            for i, (input_ids, _) in enumerate(dataloader):
+                if i >= num_samples:
+                    break
+                input_ids = input_ids.to(device)
+                # Generate some tokens
+                generated = model.generate(input_ids[:, :10], max_length=50)
+                coherence_scores.append(self.lattice_coherence(generated))
+                stability_scores.append(self.morphological_stability(generated))
+        return {
+            'lattice_coherence': np.mean(coherence_scores),
+            'morphological_stability': np.mean(stability_scores),
+            'haptic_score': np.mean(coherence_scores) * 0.5 + np.mean(stability_scores) * 0.5
+        }
+# =============================================================================
+# Training
+# =============================================================================
+def train(
+    corpus_path: str,
+    tokenizer_path: str,
+    output_dir: str,
+    max_steps: int = 10000,
+    batch_size: int = 16,
+    learning_rate: float = 3e-4,
+    gradient_accumulation: int = 2,
+    save_steps: int = 1000,
+    eval_steps: int = 500,
+    use_lattice_attention: bool = True,
+    use_lattice_embeddings: bool = True,
+    use_morphological_regularization: bool = True,
+):
+    """Train the lattice-aware model."""
+    os.makedirs(output_dir, exist_ok=True)
+    # Device
+    if torch.backends.mps.is_available():
+        device = torch.device("mps")
+    elif torch.cuda.is_available():
+        device = torch.device("cuda")
+    else:
+        device = torch.device("cpu")
+    logger.info(f"Using device: {device}")
+    # Load tokenizer
+    sp = spm.SentencePieceProcessor()
+    sp.load(tokenizer_path)
+    vocab_size = sp.get_piece_size()
+    # Config
+    config = LatticeConfig(
+        vocab_size=vocab_size,
+        use_lattice_attention=use_lattice_attention,
+        use_lattice_embeddings=use_lattice_embeddings,
+        use_morphological_regularization=use_morphological_regularization,
+    )
+    # Save config
+    with open(os.path.join(output_dir, "config.json"), 'w') as f:
+        json.dump(config.to_dict(), f, indent=2)
+    # Model
+    model = Braille256LatticeModel(config)
+    model.to(device)
+    # Dataset
+    dataset = MultimodalBrailleDataset(
+        corpus_path, tokenizer_path,
+        max_length=256, max_tokens=2_000_000
+    )
+    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=0)
+    # Evaluator
+    evaluator = HapticEvaluator(config)
+    # Optimizer with increased weight decay to preserve lattice structure
+    optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.05)
+    # LR scheduler
+    def lr_lambda(step):
+        warmup_steps = 500
+        if step < warmup_steps:
+            return step / warmup_steps
+        decay_steps = max_steps - warmup_steps
+        progress = (step - warmup_steps) / decay_steps
+        return 0.5 * (1 + math.cos(math.pi * progress))
+    scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda)
+    # Mixed precision - only for CUDA, MPS AMP has issues with custom ops
+    use_amp = device.type == 'cuda'
+    scaler = torch.amp.GradScaler('cuda') if use_amp else None
+    # torch.compile disabled for MPS - causes slow compilation overhead
+    # Enable only for CUDA
+    compiled = False
+    if device.type == 'cuda':
+        try:
+            model = torch.compile(model, mode="reduce-overhead")
+            compiled = True
+        except Exception as e:
+            logger.warning(f"torch.compile not available: {e}")
+    # Training loop
+    print("\n" + "=" * 70)
+    print("⣿ braille256-v6: Lattice-Aware Training ⣿")
+    print("=" * 70)
+    print(f"  Max steps: {max_steps}")
+    print(f"  Batch size: {batch_size} x {gradient_accumulation} = {batch_size * gradient_accumulation}")
+    print(f"  Learning rate: {learning_rate}")
+    print(f"  Lattice attention: {use_lattice_attention}")
+    print(f"  Lattice embeddings: {use_lattice_embeddings}")
+    print(f"  Morphological regularization: {use_morphological_regularization}")
+    print(f"  Mixed precision (AMP): {use_amp}")
+    print(f"  torch.compile: {compiled}")
+    print(f"  Output: {output_dir}")
+    print("=" * 70 + "\n")
+    model.train()
+    step = 0
+    data_iter = iter(dataloader)
+    best_haptic_score = 0
+    pbar = tqdm(total=max_steps, desc="Training")
+    training_log = []
+    while step < max_steps:
+        optimizer.zero_grad()
+        total_lm_loss = 0
+        total_morph_loss = 0
+        # Staged morphological regularization: high early, decay later
+        # This locks in geometry early while allowing expressivity later
+        if step < 1500:
+            morph_weight_scale = 1.0  # Full strength: 0.05
+        elif step < 4000:
+            morph_weight_scale = 0.4  # Medium: 0.02
+        else:
+            morph_weight_scale = 0.1  # Low: 0.005
+        # Update the model's morph weight dynamically
+        model.morph_regularizer.config.morphological_weight = 0.05 * morph_weight_scale
+        for _ in range(gradient_accumulation):
+            try:
+                input_ids, labels = next(data_iter)
+            except StopIteration:
+                data_iter = iter(dataloader)
+                input_ids, labels = next(data_iter)
+            input_ids = input_ids.to(device)
+            labels = labels.to(device)
+            # Mixed precision forward pass
+            if use_amp:
+                with torch.amp.autocast(device.type):
+                    _, lm_loss, morph_loss = model(input_ids, labels)
+                    loss = lm_loss + morph_loss
+                    loss = loss / gradient_accumulation
+                scaler.scale(loss).backward()
+            else:
+                _, lm_loss, morph_loss = model(input_ids, labels)
+                loss = lm_loss + morph_loss
+                loss = loss / gradient_accumulation
+                loss.backward()
+            total_lm_loss += lm_loss.item() / gradient_accumulation
+            total_morph_loss += morph_loss.item() / gradient_accumulation if morph_loss else 0
+        if use_amp:
+            scaler.unscale_(optimizer)
+            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+            scaler.step(optimizer)
+            scaler.update()
+        else:
+            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+            optimizer.step()
+        scheduler.step()
+        step += 1
+        pbar.set_postfix(
+            lm_loss=f"{total_lm_loss:.4f}",
+            morph=f"{total_morph_loss:.4f}",
+            lr=f"{scheduler.get_last_lr()[0]:.2e}"
+        )
+        pbar.update(1)
+        # Log
+        if step % 100 == 0:
+            training_log.append({
+                'step': step,
+                'lm_loss': total_lm_loss,
+                'morph_loss': total_morph_loss,
+                'lr': scheduler.get_last_lr()[0]
+            })
+        # Evaluate
+        if step % eval_steps == 0:
+            eval_results = evaluator.evaluate(model, dataloader, device, num_samples=20)
+            logger.info(f"\nStep {step} Haptic Eval: {eval_results}")
+            if eval_results['haptic_score'] > best_haptic_score:
+                best_haptic_score = eval_results['haptic_score']
+                # Save best model
+                best_dir = os.path.join(output_dir, "best")
+                os.makedirs(best_dir, exist_ok=True)
+                torch.save(model.state_dict(), os.path.join(best_dir, "pytorch_model.bin"))
+                logger.info(f"New best haptic score: {best_haptic_score:.4f}")
+            model.train()
+        # Save checkpoint
+        if step % save_steps == 0:
+            checkpoint_dir = os.path.join(output_dir, f"checkpoint-{step}")
+            os.makedirs(checkpoint_dir, exist_ok=True)
+            torch.save(model.state_dict(), os.path.join(checkpoint_dir, "pytorch_model.bin"))
+            with open(os.path.join(checkpoint_dir, "config.json"), 'w') as f:
+                json.dump(config.to_dict(), f, indent=2)
+            logger.info(f"Saved checkpoint at step {step}")
+    pbar.close()
+    # Save final model
+    print("\n" + "=" * 70)
+    print("Saving Final Model")
+    print("=" * 70)
+    final_dir = os.path.join(output_dir, "final")
+    os.makedirs(final_dir, exist_ok=True)
+    torch.save(model.state_dict(), os.path.join(final_dir, "pytorch_model.bin"))
+    with open(os.path.join(final_dir, "config.json"), 'w') as f:
+        json.dump(config.to_dict(), f, indent=2)
+    # Save training log
+    with open(os.path.join(output_dir, "training_log.json"), 'w') as f:
+        json.dump(training_log, f, indent=2)
+    # Copy tokenizer
+    import shutil
+    shutil.copy(tokenizer_path, os.path.join(final_dir, "tokenizer.model"))
+    # Final evaluation
+    final_eval = evaluator.evaluate(model, dataloader, device, num_samples=50)
+    print(f"\nFinal Haptic Evaluation:")
+    print(f"  Lattice Coherence: {final_eval['lattice_coherence']:.4f}")
+    print(f"  Morphological Stability: {final_eval['morphological_stability']:.4f}")
+    print(f"  Haptic Score: {final_eval['haptic_score']:.4f}")
+    with open(os.path.join(output_dir, "final_eval.json"), 'w') as f:
+        json.dump(final_eval, f, indent=2)
+    print(f"\nModel saved to: {final_dir}")
+    print("\n" + "=" * 70)
+    print("⣿ Training Complete! ⣿")
+    print("=" * 70)
+def main():
+    parser = argparse.ArgumentParser(description="Train braille256-v6 lattice-aware model")
+    parser.add_argument("--corpus", default="corpus/braille_multimodal_corpus.txt")
+    parser.add_argument("--tokenizer", default="tokenizers/braille_8dot_32k/braille_8dot_32k.model")
+    parser.add_argument("--output", default="models/braille256_v6_lattice")
+    parser.add_argument("--steps", type=int, default=10000)
+    parser.add_argument("--batch-size", type=int, default=16)
+    parser.add_argument("--lr", type=float, default=3e-4)
+    parser.add_argument("--no-lattice-attention", action="store_true")
+    parser.add_argument("--no-lattice-embeddings", action="store_true")
+    parser.add_argument("--no-morph-regularization", action="store_true")
+    args = parser.parse_args()
+    train(
+        corpus_path=args.corpus,
+        tokenizer_path=args.tokenizer,
+        output_dir=args.output,
+        max_steps=args.steps,
+        batch_size=args.batch_size,
+        learning_rate=args.lr,
+        use_lattice_attention=not args.no_lattice_attention,
+        use_lattice_embeddings=not args.no_lattice_embeddings,
+        use_morphological_regularization=not args.no_morph_regularization,
+    )
+if __name__ == "__main__":
+    main()

training_log.json ADDED Viewed

	@@ -0,0 +1,602 @@

+[
+  {
+    "step": 100,
+    "lm_loss": 6.993148565292358,
+    "morph_loss": 0.0004997336654923856,
+    "lr": 5.9999999999999995e-05
+  },
+  {
+    "step": 200,
+    "lm_loss": 3.8235212564468384,
+    "morph_loss": 0.0004995590425096452,
+    "lr": 0.00011999999999999999
+  },
+  {
+    "step": 300,
+    "lm_loss": 2.6851329803466797,
+    "morph_loss": 0.0004995536583010107,
+    "lr": 0.00017999999999999998
+  },
+  {
+    "step": 400,
+    "lm_loss": 2.7323516607284546,
+    "morph_loss": 0.0004988558066543192,
+    "lr": 0.00023999999999999998
+  },
+  {
+    "step": 500,
+    "lm_loss": 2.6041159629821777,
+    "morph_loss": 0.000500149471918121,
+    "lr": 0.0003
+  },
+  {
+    "step": 600,
+    "lm_loss": 2.7856509685516357,
+    "morph_loss": 0.0004991319729015231,
+    "lr": 0.0002999179886011389
+  },
+  {
+    "step": 700,
+    "lm_loss": 1.9858170747756958,
+    "morph_loss": 0.0004987868887837976,
+    "lr": 0.00029967204408281613
+  },
+  {
+    "step": 800,
+    "lm_loss": 2.2885484099388123,
+    "morph_loss": 0.0004994409391656518,
+    "lr": 0.0002992624353817517
+  },
+  {
+    "step": 900,
+    "lm_loss": 1.7579542398452759,
+    "morph_loss": 0.000499250425491482,
+    "lr": 0.00029868961039904624
+  },
+  {
+    "step": 1000,
+    "lm_loss": 2.356551766395569,
+    "morph_loss": 0.000498554261866957,
+    "lr": 0.00029795419551040833
+  },
+  {
+    "step": 1100,
+    "lm_loss": 2.2041295170783997,
+    "morph_loss": 0.0004986616258975118,
+    "lr": 0.0002970569948812214
+  },
+  {
+    "step": 1200,
+    "lm_loss": 1.987478256225586,
+    "morph_loss": 0.0004990812740288675,
+    "lr": 0.0002959989895872009
+  },
+  {
+    "step": 1300,
+    "lm_loss": 1.7949504256248474,
+    "morph_loss": 0.0004993200418539345,
+    "lr": 0.0002947813365416023
+  },
+  {
+    "step": 1400,
+    "lm_loss": 2.577250361442566,
+    "morph_loss": 0.0004996150964871049,
+    "lr": 0.0002934053672301536
+  },
+  {
+    "step": 1500,
+    "lm_loss": 2.1883418560028076,
+    "morph_loss": 0.000496969121741131,
+    "lr": 0.00029187258625509513
+  },
+  {
+    "step": 1600,
+    "lm_loss": 1.70472252368927,
+    "morph_loss": 0.0001988508302019909,
+    "lr": 0.0002901846696899191
+  },
+  {
+    "step": 1700,
+    "lm_loss": 2.3121373057365417,
+    "morph_loss": 0.0001993859259528108,
+    "lr": 0.0002883434632466077
+  },
+  {
+    "step": 1800,
+    "lm_loss": 2.0045361518859863,
+    "morph_loss": 0.00019962265650974587,
+    "lr": 0.00028635098025737434
+  },
+  {
+    "step": 1900,
+    "lm_loss": 1.9210466742515564,
+    "morph_loss": 0.0001999259038711898,
+    "lr": 0.0002842093994731145
+  },
+  {
+    "step": 2000,
+    "lm_loss": 2.045823335647583,
+    "morph_loss": 0.00019963263912359253,
+    "lr": 0.00028192106268097334
+  },
+  {
+    "step": 2100,
+    "lm_loss": 2.363018274307251,
+    "morph_loss": 0.00020041707466589287,
+    "lr": 0.0002794884721436361
+  },
+  {
+    "step": 2200,
+    "lm_loss": 2.146875023841858,
+    "morph_loss": 0.00019886076188413426,
+    "lr": 0.0002769142878631403
+  },
+  {
+    "step": 2300,
+    "lm_loss": 1.7959808111190796,
+    "morph_loss": 0.0001991742683458142,
+    "lr": 0.000274201324672203
+  },
+  {
+    "step": 2400,
+    "lm_loss": 2.1792458295822144,
+    "morph_loss": 0.00020018102077301592,
+    "lr": 0.0002713525491562421
+  },
+  {
+    "step": 2500,
+    "lm_loss": 2.177172005176544,
+    "morph_loss": 0.0001997477374970913,
+    "lr": 0.00026837107640945905
+  },
+  {
+    "step": 2600,
+    "lm_loss": 1.8140788674354553,
+    "morph_loss": 0.00019913741562049836,
+    "lr": 0.00026526016662852886
+  },
+  {
+    "step": 2700,
+    "lm_loss": 2.1109378337860107,
+    "morph_loss": 0.00019985359540442005,
+    "lr": 0.0002620232215476231
+  },
+  {
+    "step": 2800,
+    "lm_loss": 1.8451208472251892,
+    "morph_loss": 0.0002002850960707292,
+    "lr": 0.00025866378071866334
+  },
+  {
+    "step": 2900,
+    "lm_loss": 1.4895858764648438,
+    "morph_loss": 0.00019962178339483216,
+    "lr": 0.00025518551764087326
+  },
+  {
+    "step": 3000,
+    "lm_loss": 1.5707910060882568,
+    "morph_loss": 0.00019960849022027105,
+    "lr": 0.00025159223574386114
+  },
+  {
+    "step": 3100,
+    "lm_loss": 1.5902798175811768,
+    "morph_loss": 0.0002000557360588573,
+    "lr": 0.00024788786422862526
+  },
+  {
+    "step": 3200,
+    "lm_loss": 1.8854023218154907,
+    "morph_loss": 0.00019861374312313274,
+    "lr": 0.00024407645377103054
+  },
+  {
+    "step": 3300,
+    "lm_loss": 1.6806678175926208,
+    "morph_loss": 0.00020012820459669456,
+    "lr": 0.00024016217209245374
+  },
+  {
+    "step": 3400,
+    "lm_loss": 1.827455759048462,
+    "morph_loss": 0.00020046472491230816,
+    "lr": 0.0002361492994024415
+  },
+  {
+    "step": 3500,
+    "lm_loss": 1.9594002962112427,
+    "morph_loss": 0.00019966769468737766,
+    "lr": 0.00023204222371836402
+  },
+  {
+    "step": 3600,
+    "lm_loss": 1.359905481338501,
+    "morph_loss": 0.00019929300469812006,
+    "lr": 0.00022784543606718227
+  },
+  {
+    "step": 3700,
+    "lm_loss": 1.9533899426460266,
+    "morph_loss": 0.00020049385057063773,
+    "lr": 0.0002235635255745762
+  },
+  {
+    "step": 3800,
+    "lm_loss": 1.4309832453727722,
+    "morph_loss": 0.00019867864466505125,
+    "lr": 0.00021920117444680317
+  },
+  {
+    "step": 3900,
+    "lm_loss": 1.2603670358657837,
+    "morph_loss": 0.00020007100101793185,
+    "lr": 0.0002147631528507739
+  },
+  {
+    "step": 4000,
+    "lm_loss": 1.359400749206543,
+    "morph_loss": 0.00019884618814103305,
+    "lr": 0.0002102543136979454
+  },
+  {
+    "step": 4100,
+    "lm_loss": 1.4845112562179565,
+    "morph_loss": 4.9905273044714704e-05,
+    "lr": 0.0002056795873377331
+  },
+  {
+    "step": 4200,
+    "lm_loss": 1.3849643468856812,
+    "morph_loss": 4.9970141844823956e-05,
+    "lr": 0.00020104397616624645
+  },
+  {
+    "step": 4300,
+    "lm_loss": 1.293235957622528,
+    "morph_loss": 4.9802090870798565e-05,
+    "lr": 0.0001963525491562421
+  },
+  {
+    "step": 4400,
+    "lm_loss": 1.8045696020126343,
+    "morph_loss": 5.000879900762811e-05,
+    "lr": 0.00019161043631427666
+  },
+  {
+    "step": 4500,
+    "lm_loss": 1.472623884677887,
+    "morph_loss": 4.9775953812059015e-05,
+    "lr": 0.00018682282307111987
+  },
+  {
+    "step": 4600,
+    "lm_loss": 1.4265462756156921,
+    "morph_loss": 4.9740716349333525e-05,
+    "lr": 0.00018199494461156203
+  },
+  {
+    "step": 4700,
+    "lm_loss": 1.3741839528083801,
+    "morph_loss": 5.005610182706732e-05,
+    "lr": 0.00017713208014981648
+  },
+  {
+    "step": 4800,
+    "lm_loss": 1.5047972798347473,
+    "morph_loss": 4.9890166337718256e-05,
+    "lr": 0.00017223954715677627
+  },
+  {
+    "step": 4900,
+    "lm_loss": 1.9711642265319824,
+    "morph_loss": 5.0176731747342274e-05,
+    "lr": 0.00016732269554543794
+  },
+  {
+    "step": 5000,
+    "lm_loss": 1.3522289395332336,
+    "morph_loss": 5.020072603656445e-05,
+    "lr": 0.00016238690182084986
+  },
+  {
+    "step": 5100,
+    "lm_loss": 1.7605299353599548,
+    "morph_loss": 4.991166679246817e-05,
+    "lr": 0.00015743756320098332
+  },
+  {
+    "step": 5200,
+    "lm_loss": 1.4882904291152954,
+    "morph_loss": 4.9983715143753216e-05,
+    "lr": 0.00015248009171495378
+  },
+  {
+    "step": 5300,
+    "lm_loss": 1.6343830823898315,
+    "morph_loss": 5.002636680728756e-05,
+    "lr": 0.00014751990828504622
+  },
+  {
+    "step": 5400,
+    "lm_loss": 1.1790239810943604,
+    "morph_loss": 4.997515679860953e-05,
+    "lr": 0.00014256243679901663
+  },
+  {
+    "step": 5500,
+    "lm_loss": 1.2620559334754944,
+    "morph_loss": 4.979184268449899e-05,
+    "lr": 0.00013761309817915014
+  },
+  {
+    "step": 5600,
+    "lm_loss": 1.2926940321922302,
+    "morph_loss": 4.9720953029464e-05,
+    "lr": 0.00013267730445456208
+  },
+  {
+    "step": 5700,
+    "lm_loss": 1.6810715198516846,
+    "morph_loss": 5.0059263230650686e-05,
+    "lr": 0.00012776045284322368
+  },
+  {
+    "step": 5800,
+    "lm_loss": 1.3960903882980347,
+    "morph_loss": 5.0068707423633896e-05,
+    "lr": 0.00012286791985018355
+  },
+  {
+    "step": 5900,
+    "lm_loss": 1.5417255759239197,
+    "morph_loss": 5.0141115934820846e-05,
+    "lr": 0.00011800505538843798
+  },
+  {
+    "step": 6000,
+    "lm_loss": 1.3895499110221863,
+    "morph_loss": 4.9999320253846236e-05,
+    "lr": 0.00011317717692888012
+  },
+  {
+    "step": 6100,
+    "lm_loss": 1.578350841999054,
+    "morph_loss": 4.975642514182255e-05,
+    "lr": 0.00010838956368572334
+  },
+  {
+    "step": 6200,
+    "lm_loss": 0.8763712048530579,
+    "morph_loss": 4.9885256885318086e-05,
+    "lr": 0.0001036474508437579
+  },
+  {
+    "step": 6300,
+    "lm_loss": 1.3805139064788818,
+    "morph_loss": 4.985421219316777e-05,
+    "lr": 9.895602383375353e-05
+  },
+  {
+    "step": 6400,
+    "lm_loss": 1.642943263053894,
+    "morph_loss": 4.988547880202532e-05,
+    "lr": 9.432041266226686e-05
+  },
+  {
+    "step": 6500,
+    "lm_loss": 1.2295689284801483,
+    "morph_loss": 4.976921445631888e-05,
+    "lr": 8.97456863020546e-05
+  },
+  {
+    "step": 6600,
+    "lm_loss": 0.9539550840854645,
+    "morph_loss": 4.960754813509993e-05,
+    "lr": 8.523684714922608e-05
+  },
+  {
+    "step": 6700,
+    "lm_loss": 1.4480910301208496,
+    "morph_loss": 4.9842370572150685e-05,
+    "lr": 8.079882555319683e-05
+  },
+  {
+    "step": 6800,
+    "lm_loss": 1.1316336393356323,
+    "morph_loss": 4.944742067891639e-05,
+    "lr": 7.643647442542382e-05
+  },
+  {
+    "step": 6900,
+    "lm_loss": 1.2974263429641724,
+    "morph_loss": 4.9362375648343004e-05,
+    "lr": 7.215456393281776e-05
+  },
+  {
+    "step": 7000,
+    "lm_loss": 1.8624339699745178,
+    "morph_loss": 4.9819502237369306e-05,
+    "lr": 6.795777628163599e-05
+  },
+  {
+    "step": 7100,
+    "lm_loss": 1.2204494774341583,
+    "morph_loss": 5.013305417378433e-05,
+    "lr": 6.385070059755846e-05
+  },
+  {
+    "step": 7200,
+    "lm_loss": 1.5136797428131104,
+    "morph_loss": 4.9816295359050855e-05,
+    "lr": 5.983782790754623e-05
+  },
+  {
+    "step": 7300,
+    "lm_loss": 1.3666653037071228,
+    "morph_loss": 4.99475918331882e-05,
+    "lr": 5.592354622896944e-05
+  },
+  {
+    "step": 7400,
+    "lm_loss": 0.8389511108398438,
+    "morph_loss": 4.9609083362156525e-05,
+    "lr": 5.211213577137469e-05
+  },
+  {
+    "step": 7500,
+    "lm_loss": 1.1075031757354736,
+    "morph_loss": 4.941183760820422e-05,
+    "lr": 4.840776425613885e-05
+  },
+  {
+    "step": 7600,
+    "lm_loss": 1.2579197883605957,
+    "morph_loss": 4.980366429663263e-05,
+    "lr": 4.481448235912671e-05
+  },
+  {
+    "step": 7700,
+    "lm_loss": 1.0307890474796295,
+    "morph_loss": 4.9867769121192396e-05,
+    "lr": 4.133621928133665e-05
+  },
+  {
+    "step": 7800,
+    "lm_loss": 1.0696255564689636,
+    "morph_loss": 5.0295926484977826e-05,
+    "lr": 3.797677845237696e-05
+  },
+  {
+    "step": 7900,
+    "lm_loss": 1.3926631212234497,
+    "morph_loss": 5.014805537939537e-05,
+    "lr": 3.473983337147118e-05
+  },
+  {
+    "step": 8000,
+    "lm_loss": 1.5005779266357422,
+    "morph_loss": 5.018570300308056e-05,
+    "lr": 3.162892359054098e-05
+  },
+  {
+    "step": 8100,
+    "lm_loss": 1.7105327248573303,
+    "morph_loss": 5.010495260648895e-05,
+    "lr": 2.8647450843757897e-05
+  },
+  {
+    "step": 8200,
+    "lm_loss": 1.229815423488617,
+    "morph_loss": 4.9758424211177044e-05,
+    "lr": 2.5798675327796993e-05
+  },
+  {
+    "step": 8300,
+    "lm_loss": 1.1335912346839905,
+    "morph_loss": 4.941884253639728e-05,
+    "lr": 2.3085712136859668e-05
+  },
+  {
+    "step": 8400,
+    "lm_loss": 1.0056449174880981,
+    "morph_loss": 4.942774467053823e-05,
+    "lr": 2.0511527856363895e-05
+  },
+  {
+    "step": 8500,
+    "lm_loss": 1.6661915183067322,
+    "morph_loss": 4.997232463210821e-05,
+    "lr": 1.8078937319026654e-05
+  },
+  {
+    "step": 8600,
+    "lm_loss": 1.1359021067619324,
+    "morph_loss": 4.9868809583131224e-05,
+    "lr": 1.579060052688548e-05
+  },
+  {
+    "step": 8700,
+    "lm_loss": 1.434115707874298,
+    "morph_loss": 4.993028596800286e-05,
+    "lr": 1.3649019742625623e-05
+  },
+  {
+    "step": 8800,
+    "lm_loss": 1.2552986145019531,
+    "morph_loss": 4.9796975872595794e-05,
+    "lr": 1.1656536753392287e-05
+  },
+  {
+    "step": 8900,
+    "lm_loss": 1.217362403869629,
+    "morph_loss": 4.991148489352781e-05,
+    "lr": 9.815330310080887e-06
+  },
+  {
+    "step": 9000,
+    "lm_loss": 1.7755168080329895,
+    "morph_loss": 5.013133522879798e-05,
+    "lr": 8.127413744904804e-06
+  },
+  {
+    "step": 9100,
+    "lm_loss": 1.3130499720573425,
+    "morph_loss": 5.007252184441313e-05,
+    "lr": 6.594632769846353e-06
+  },
+  {
+    "step": 9200,
+    "lm_loss": 1.1731150150299072,
+    "morph_loss": 5.0212831411045045e-05,
+    "lr": 5.218663458397715e-06
+  },
+  {
+    "step": 9300,
+    "lm_loss": 0.9502497613430023,
+    "morph_loss": 4.9593836592976004e-05,
+    "lr": 4.001010412799138e-06
+  },
+  {
+    "step": 9400,
+    "lm_loss": 1.3233891725540161,
+    "morph_loss": 4.999847624276299e-05,
+    "lr": 2.9430051187785962e-06
+  },
+  {
+    "step": 9500,
+    "lm_loss": 1.3283841013908386,
+    "morph_loss": 5.0088236093870364e-05,
+    "lr": 2.0458044895916513e-06
+  },
+  {
+    "step": 9600,
+    "lm_loss": 1.2733866572380066,
+    "morph_loss": 4.998848271497991e-05,
+    "lr": 1.3103896009537207e-06
+  },
+  {
+    "step": 9700,
+    "lm_loss": 1.1967694163322449,
+    "morph_loss": 5.023612902732566e-05,
+    "lr": 7.375646182482875e-07
+  },
+  {
+    "step": 9800,
+    "lm_loss": 1.0563868880271912,
+    "morph_loss": 4.972490751242731e-05,
+    "lr": 3.2795591718381975e-07
+  },
+  {
+    "step": 9900,
+    "lm_loss": 1.7780798077583313,
+    "morph_loss": 4.9859330829349346e-05,
+    "lr": 8.201139886109264e-08
+  },
+  {
+    "step": 10000,
+    "lm_loss": 1.231259286403656,
+    "morph_loss": 4.956562952429522e-05,
+    "lr": 0.0
+  }
+]