glyphic-language / training /dataset_generation_guide.md
UnconditionalLove's picture
Upload 97 files
ed6bec6 verified
# Dataset Generation Guide
This guide explains how to generate new training examples for the Glyphic Language.
---
# 1. Dictionary‑Driven Generation
All examples must be derived from:
- dictionary entries
- syntax rules
- BNF grammar
No freeform glyph usage is allowed.
---
# 2. Example Types
## 2.1 Atomic Examples
Single glyph → meaning
Meaning → single glyph
## 2.2 Scene Examples
Full sequences with:
- actor
- action
- object
- modifiers
- context
## 2.3 Negative Examples
Invalid sequences for syntax training.
## 2.4 Symbolic Examples
Mythic, emotional, sensory, or social scenes.
---
# 3. Generation Process
1. Select glyph(s) from dictionary
2. Build a valid sequence using syntax rules
3. Generate structured meaning
4. Generate natural language description
5. Add to appropriate dataset file
---
# 4. Quality Requirements
- No ambiguity
- No hallucinated glyphs
- No missing roles
- No invalid ordering
- No duplicate examples