glyphic-language / training /dataset_format.md
UnconditionalLove's picture
Upload 97 files
ed6bec6 verified

Dataset Format Specification

All datasets in this folder use JSON Lines (.jsonl) format.

Each line is a standalone training example:

{"input": "...", "output": "..."}


1. glyph_to_text.jsonl

Format

{ "input": "๐ŸŒฑ", "output": { "id": "glyph.object.nature.sprout", "primary": "sprout", "synomic": ["growth", "seedling", "new life"], "roles": ["object", "symbol"] }


2. text_to_glyph.jsonl

Format

{ "input": "a new beginning, growth, seedling", "output": "๐ŸŒฑ" }


3. structured_meaning.jsonl

Format

{ "input": "๐Ÿ‘คโœ๏ธ๐Ÿ“„๐ŸŒ™", "output": { "actor": {"id": "glyph.actor.person"}, "action": {"id": "glyph.action.write"}, "object": {"id": "glyph.object.document.page"}, "context": { "time": [{"id": "context.time.night"}] }


Validation Rules

  • All glyphs must exist in the dictionary.
  • All sequences must obey syntax rules.
  • All structured meaning must be canonical.