DavidrPatton's picture
Add datasets dataset
e65ef8e verified

n8n Workflow Training Datasets

This directory contains training datasets for fine-tuning Large Language Models (LLMs) to generate n8n workflows from natural language descriptions.

Dataset Format

Each dataset file is a JSON array containing training examples in a conversational format:

[
  {
    "messages": [
      {
        "role": "user",
        "content": "When a new email arrives in Gmail, save the attachment to Google Drive."
      },
      {
        "role": "assistant",
        "content": "{\"name\": \"Email to Drive\", \"nodes\": [...], \"connections\": {...}, \"active\": false}"
      }
    ]
  }
]

Structure

  • role: "user" - Natural language description of the workflow to create
  • role: "assistant" - JSON representation of the complete n8n workflow

The assistant's response contains a valid n8n workflow definition with:

  • name: Workflow name
  • nodes: Array of node definitions (triggers, actions, transformations)
  • connections: Object defining how nodes are connected
  • active: Boolean indicating if workflow is active (usually false for templates)

Dataset Files

dataset_001.json

  • Size: 2.5 MB
  • Examples: 3,061 workflow examples
  • Status: βœ… Valid JSON
  • Focus: Common workflow patterns (Gmail, Slack, Google Sheets, Trello, Airtable, Notion, etc.)

dataset_002.json

  • Size: 4.9 MB
  • Status: ⚠️ JSON parsing errors detected
  • Note: May require cleaning before use

dataset_003.json

  • Size: 14.0 MB
  • Status: ⚠️ JSON parsing errors detected
  • Note: May require cleaning before use

Common Workflow Patterns

Based on dataset_001.json analysis, the most common patterns include:

  1. Email Automation

    • Gmail β†’ Google Drive (save attachments)
    • Gmail β†’ Slack (notifications)
    • Gmail β†’ Airtable (create records)
  2. Spreadsheet Integration

    • Google Sheets β†’ Slack (new row notifications)
    • Google Sheets β†’ Gmail (alerts)
    • Airtable β†’ Google Sheets (sync data)
  3. Project Management

    • Trello β†’ Slack (card updates)
    • Trello β†’ Google Calendar (deadline tracking)
    • GitHub β†’ Trello (issue tracking)
  4. Notification Workflows

    • Slack reactions β†’ Airtable (logging)
    • Calendar events β†’ Email reminders
    • Notion updates β†’ Slack posts

Usage for LLM Training

Fine-tuning Format

These datasets are compatible with OpenAI's fine-tuning format and similar training pipelines. Each example teaches the model to:

  1. Parse natural language workflow requests
  2. Identify required n8n nodes
  3. Configure node parameters
  4. Establish proper connections between nodes

Recommended Preprocessing

Before using these datasets:

  1. Validate JSON: Verify all files parse correctly
  2. Deduplicate: Remove duplicate examples (some duplicates exist)
  3. Filter: Optionally filter by specific integrations or complexity
  4. Balance: Ensure diverse node types are represented

Example Use Cases

  • Fine-tune GPT models to generate n8n workflows
  • Train models to suggest workflow improvements
  • Create workflow completion assistants
  • Build n8n-specific code generation tools

Integration with n8n-mcp

This repository complements the n8n-mcp server by providing:

  • Static training data for model fine-tuning
  • Example workflows for reference
  • Pattern library for common automations

While n8n-mcp provides real-time workflow execution and API access, these datasets enable LLMs to learn n8n's workflow generation patterns.

Contributing

When adding new examples:

  1. Follow the existing JSON structure
  2. Ensure workflow JSON is valid n8n format
  3. Use descriptive, natural language in user messages
  4. Test workflows before adding to datasets
  5. Avoid duplicates

Known Issues

  • Duplicate entries exist in dataset_001.json (minimal impact on training)
  • dataset_002.json and dataset_003.json have JSON formatting errors
  • Some placeholder values (e.g., {{SHEET_ID}}, {{API_KEY}}) are included - these are intentional for template-style workflows

Tools

See /scripts/analyze_datasets.py for dataset analysis and statistics tools.