File size: 1,720 Bytes
b466016 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
license: mit
tags:
- vision
- encoder
- multimodal
- self-supervised
- video
- execution
- symbolic
library_name: pytorch
pipeline_tag: feature-extraction
datasets:
- Nine1Eight/vil-canonical-glyph-system
---
# VIL Encoder v1.2 (GVL-P)
**VIL Encoder v1.2** is a glyphmatic vision encoder trained using
**GVL-P (Glyphmatic Video-Language Pretraining) v1.2**.
This model learns **temporal execution structure** from canonical glyph
sequences derived from text, code, binaries, and other data.
> ⚠️ This model does **not tokenize language**.
> All inputs are compiled into a **canonical glyph IR (base-111)**.
---
## Architecture
- **Vision Encoder:** GlyphVisionEncoder
- **Temporal Head:** TemporalGlyphTransformer
- **Embedding Dimension:** 768
- **Canon Size:** 111
- **Deterministic:** Yes
---
## Training (GVL-P v1.2)
Training is **fully self-supervised**:
1. Arbitrary input (text, code, binary)
2. Deterministic compilation → glyph indices
3. Sliding temporal windows
4. Next-step temporal consistency objective
No labels, captions, or annotations were used.
---
## Intended Use
- Execution-aware embeddings
- Vision–language research
- Glyph-based reasoning systems
- Multimodal IR experiments
This is **not** a language model.
---
## Limitations
- Requires canonical glyph compilation
- No text generation
- No decoding or execution
---
## Weights
File:
vil-encoder-v1.2.pt
Checkpoint contains:
- `vision_encoder`
- `temporal_head`
- `embed_dim`
- `canon_size`
- `gvlp_version = 1.2`
---
## Relationship to VIL
Canonical dataset:
https://huggingface.co/datasets/Nine1Eight/vil-canonical-glyph-system
---
## Author
Matthew Blake Ward (Nine1Eight)
Tulsa, Oklahoma, USA
|