--- license: mit tags: - vision - encoder - multimodal - self-supervised - video - execution - symbolic library_name: pytorch pipeline_tag: feature-extraction datasets: - Nine1Eight/vil-canonical-glyph-system --- # VIL Encoder v1.2 (GVL-P) **VIL Encoder v1.2** is a glyphmatic vision encoder trained using **GVL-P (Glyphmatic Video-Language Pretraining) v1.2**. This model learns **temporal execution structure** from canonical glyph sequences derived from text, code, binaries, and other data. > ⚠️ This model does **not tokenize language**. > All inputs are compiled into a **canonical glyph IR (base-111)**. --- ## Architecture - **Vision Encoder:** GlyphVisionEncoder - **Temporal Head:** TemporalGlyphTransformer - **Embedding Dimension:** 768 - **Canon Size:** 111 - **Deterministic:** Yes --- ## Training (GVL-P v1.2) Training is **fully self-supervised**: 1. Arbitrary input (text, code, binary) 2. Deterministic compilation → glyph indices 3. Sliding temporal windows 4. Next-step temporal consistency objective No labels, captions, or annotations were used. --- ## Intended Use - Execution-aware embeddings - Vision–language research - Glyph-based reasoning systems - Multimodal IR experiments This is **not** a language model. --- ## Limitations - Requires canonical glyph compilation - No text generation - No decoding or execution --- ## Weights File: vil-encoder-v1.2.pt Checkpoint contains: - `vision_encoder` - `temporal_head` - `embed_dim` - `canon_size` - `gvlp_version = 1.2` --- ## Relationship to VIL Canonical dataset: https://huggingface.co/datasets/Nine1Eight/vil-canonical-glyph-system --- ## Author Matthew Blake Ward (Nine1Eight) Tulsa, Oklahoma, USA