task_categories:
- text-generation
language:
- en
tags:
- code
- mermaid
- syntax
- diagram
- repair
pretty_name: Mermaid AI Syntax
size_categories:
- 100K<n<1M
authors:
- Gabriel Lars Sabadin
- Darshan Jain
- Mermaid Chart AI Team
Mermaid Syntax Dataset
Dataset Summary
The Mermaid Syntax Dataset provides training and evaluation data for syntax understanding, validation, repair, and semantic titling of Mermaid.js diagrams.
It supports two primary tasks:
- Repair – Generate minimal diffs or patched diagrams that compile successfully.
- Titling – Propose a short, human-friendly title, optionally with a one-sentence summary, based on content and context (instead of “Untitled Diagram”).
- Generation – Create a new valid Mermaid diagram from a user instruction and optional diagram type.
Note: Validation is performed by the Mermaid parser before any model call. Parser diagnostics are exposed in the dataset as
compiler_errors(array of strings) so the model can understand what failed and propose targeted repairs.
Supported Tasks and Benchmarks
- Text Generation
REPAIR: Given an invalid diagram and parser diagnostics (compiler_errors), generate a corrected diagram (or a minimal patch).TITLE: Given a valid diagram, generate a short, human-friendly title (optionally with a one-sentence summary).GENERATE: Given a natural language instruction and optional diagram type, generate a new valid diagram (diagram_content) plus optional title and summary.
Task Categories
text-generation
Languages
- English (
en)
All error messages, titles, and instructions are in English. Future multilingual expansions may include localized error messages.
Dataset Structure
Input Schema
{
"task": "REPAIR|TITLE|GENERATE",
"input": {
"diagram": "string (for REPAIR|TITLE)",
"instruction": "string (for GENERATE)",
"context": "optional string",
"diagram_type": "optional string",
"compiler_errors": ["string (for REPAIR)"]
}
}
compiler_errors is an optional array of strings produced by the Mermaid parser (e.g., "MISSING_ARROW at line 7", "UNTERMINATED_BLOCK: 'gantt' missing 'end'"). Include it for REPAIR samples; omit it for TITLE and GENERATE samples.
Output Schema
{
"result": {
"compiler_errors": ["string"], // optional echo of parser diagnostics
"patch": [ // optional for REPAIR tasks
{
"op": "replace|insert|delete",
"range": {"startLine": 1, "startCol": 5, "endLine": 1, "endCol": 10},
"text": "new content"
}
],
"repaired_diagram": "string or null", // for REPAIR
"diagram_content": "string or null", // for GENERATE
"title": "string or null", // for TITLE and GENERATE
"summary": "string or null" // optional one-sentence description
}
}
compiler_errors: optional echo of parser diagnostics to provide context for the model.patch: optional list of minimal edit operations for REPAIR tasks.repaired_diagram: the corrected diagram (full text), used in REPAIR tasks.diagram_content: the newly generated diagram, used in GENERATE tasks.title: a short, human-friendly title, used in TITLE and GENERATE tasks.summary: an optional one-sentence description or summary, used in TITLE and GENERATE tasks.
Examples
Example REPAIR
{
"task": "REPAIR",
"input": {
"diagram": "flowchart TD\nA --> B",
"compiler_errors": ["MISSING_ARROW at line 2"]
},
"result": {
"compiler_errors": ["MISSING_ARROW at line 2"],
"patch": [
{
"op": "replace",
"range": {"startLine": 2, "startCol": 5, "endLine": 2, "endCol": 7},
"text": "->"
}
],
"repaired_diagram": "flowchart TD\nA -> B",
"title": null,
"summary": null
}
}
Example TITLE
{
"task": "TITLE",
"input": {
"diagram": "sequenceDiagram\nAlice->>Bob: Hello Bob!"
},
"result": {
"compiler_errors": [],
"patch": [],
"repaired_diagram": null,
"title": "Alice greets Bob",
"summary": "A simple sequence diagram showing Alice sending a greeting message to Bob."
}
}
Example GENERATE
{
"task": "GENERATE",
"input": {
"instruction": "Create a flowchart for the checkout process",
"diagram_type": "flowchart"
},
"result": {
"compiler_errors": [],
"patch": [],
"diagram_content": "flowchart TD\nStart --> Cart\nCart --> Payment\nPayment --> Confirmation",
"title": "Checkout Flow",
"summary": "A flowchart showing the steps from start to order confirmation in an e-commerce checkout process."
}
}
Sample Data
An example of a sample.jsonl is included for each task type. Each line is a JSON object following the schema.
REPAIR Sample
{"task": "REPAIR", "input": {"diagram": "flowchart TD\nA -> B", "diagram_type": "flowchart", "compiler_errors": ["MISSING_ARROW at line 2: use '-->' instead of '->'"]}, "result": {"compiler_errors": ["MISSING_ARROW at line 2: use '-->' instead of '->'"], "patch": [{"op": "replace", "range": {"startLine": 2, "startCol": 3, "endLine": 2, "endCol": 4}, "text": "--"}], "repaired_diagram": "flowchart TD\nA --> B", "diagram_content": null, "title": null, "summary": null}}
TITLE Sample
{"task": "TITLE", "input": {"diagram": "sequenceDiagram\nAlice->>Bob: Hello Bob!", "diagram_type": "sequence"}, "result": {"compiler_errors": [], "patch": [], "repaired_diagram": null, "diagram_content": null, "title": "Alice greets Bob", "summary": "A simple sequence diagram showing Alice sending a greeting message to Bob."}}
GENERATE Sample
{"task": "GENERATE", "input": {"instruction": "Create a flowchart for the checkout process", "diagram_type": "flowchart"}, "result": {"compiler_errors": [], "patch": [], "repaired_diagram": null, "diagram_content": "flowchart TD\nStart --> Cart\nCart --> Payment\nPayment --> Confirmation", "title": "Checkout Flow", "summary": "A flowchart showing the steps from start to order confirmation in an e-commerce checkout process."}}
Additional syntax-focused training samples have been generated from the Mermaid documentation and are available as JSONL files:
data/syntax_repair_samples.jsonl– contains REPAIR task samples with broken diagrams and their fixes.data/syntax_title_samples.jsonl– contains TITLE task samples with valid diagrams, titles, and summaries.data/syntax_generate_samples.jsonl– contains GENERATE task samples with instructions and generated diagrams.data/syntax_all_samples.jsonl– combined file with all tasks.
These files can be used to train models specifically on Mermaid syntax understanding, repair, and generation.