glyphic-language / training /dataset_overview.md
UnconditionalLove's picture
Upload 97 files
ed6bec6 verified

Dataset Overview

The /training/ folder contains all datasets required to fine‑tune an LLM for the Glyphic Language.

Files

1. glyph_to_text.jsonl

Maps individual glyphs to:

  • primary meaning
  • synonyms
  • roles
  • categories
  • example usage

Used for dictionary grounding.

2. text_to_glyph.jsonl

Maps natural language descriptions to canonical glyph sequences.

Used for translation training.

3. structured_meaning.jsonl

Maps glyph sequences to structured meaning dictionaries.

Used for scene interpretation and semantic grounding.


Dataset Philosophy

All datasets follow these principles:

  • deterministic
  • reversible
  • dictionary‑driven
  • syntax‑aligned
  • context‑aware
  • LLM‑friendly

Each dataset is designed to teach a specific layer of the language.