| # Dataset Overview | |
| The /training/ folder contains all datasets required to fine‑tune an LLM for the Glyphic Language. | |
| ## Files | |
| ### 1. glyph_to_text.jsonl | |
| Maps individual glyphs to: | |
| - primary meaning | |
| - synonyms | |
| - roles | |
| - categories | |
| - example usage | |
| Used for dictionary grounding. | |
| ### 2. text_to_glyph.jsonl | |
| Maps natural language descriptions to canonical glyph sequences. | |
| Used for translation training. | |
| ### 3. structured_meaning.jsonl | |
| Maps glyph sequences to structured meaning dictionaries. | |
| Used for scene interpretation and semantic grounding. | |
| --- | |
| ## Dataset Philosophy | |
| All datasets follow these principles: | |
| - deterministic | |
| - reversible | |
| - dictionary‑driven | |
| - syntax‑aligned | |
| - context‑aware | |
| - LLM‑friendly | |
| Each dataset is designed to teach a specific layer of the language. | |