metadata
license: mit
tags:
- vision
- encoder
- multimodal
- self-supervised
- video
- execution
- symbolic
library_name: pytorch
pipeline_tag: feature-extraction
datasets:
- Nine1Eight/vil-canonical-glyph-system
VIL Encoder v1.2 (GVL-P)
VIL Encoder v1.2 is a glyphmatic vision encoder trained using
GVL-P (Glyphmatic Video-Language Pretraining) v1.2.
This model learns temporal execution structure from canonical glyph sequences derived from text, code, binaries, and other data.
⚠️ This model does not tokenize language.
All inputs are compiled into a canonical glyph IR (base-111).
Architecture
- Vision Encoder: GlyphVisionEncoder
- Temporal Head: TemporalGlyphTransformer
- Embedding Dimension: 768
- Canon Size: 111
- Deterministic: Yes
Training (GVL-P v1.2)
Training is fully self-supervised:
- Arbitrary input (text, code, binary)
- Deterministic compilation → glyph indices
- Sliding temporal windows
- Next-step temporal consistency objective
No labels, captions, or annotations were used.
Intended Use
- Execution-aware embeddings
- Vision–language research
- Glyph-based reasoning systems
- Multimodal IR experiments
This is not a language model.
Limitations
- Requires canonical glyph compilation
- No text generation
- No decoding or execution
Weights
File: vil-encoder-v1.2.pt Checkpoint contains:
vision_encodertemporal_headembed_dimcanon_sizegvlp_version = 1.2
Relationship to VIL
Canonical dataset: https://huggingface.co/datasets/Nine1Eight/vil-canonical-glyph-system
Author
Matthew Blake Ward (Nine1Eight)
Tulsa, Oklahoma, USA