n8n Workflow Training Datasets
This directory contains training datasets for fine-tuning Large Language Models (LLMs) to generate n8n workflows from natural language descriptions.
Dataset Format
Each dataset file is a JSON array containing training examples in a conversational format:
[
{
"messages": [
{
"role": "user",
"content": "When a new email arrives in Gmail, save the attachment to Google Drive."
},
{
"role": "assistant",
"content": "{\"name\": \"Email to Drive\", \"nodes\": [...], \"connections\": {...}, \"active\": false}"
}
]
}
]
Structure
role: "user"- Natural language description of the workflow to createrole: "assistant"- JSON representation of the complete n8n workflow
The assistant's response contains a valid n8n workflow definition with:
name: Workflow namenodes: Array of node definitions (triggers, actions, transformations)connections: Object defining how nodes are connectedactive: Boolean indicating if workflow is active (usuallyfalsefor templates)
Dataset Files
dataset_001.json
- Size: 2.5 MB
- Examples: 3,061 workflow examples
- Status: β Valid JSON
- Focus: Common workflow patterns (Gmail, Slack, Google Sheets, Trello, Airtable, Notion, etc.)
dataset_002.json
- Size: 4.9 MB
- Status: β οΈ JSON parsing errors detected
- Note: May require cleaning before use
dataset_003.json
- Size: 14.0 MB
- Status: β οΈ JSON parsing errors detected
- Note: May require cleaning before use
Common Workflow Patterns
Based on dataset_001.json analysis, the most common patterns include:
Email Automation
- Gmail β Google Drive (save attachments)
- Gmail β Slack (notifications)
- Gmail β Airtable (create records)
Spreadsheet Integration
- Google Sheets β Slack (new row notifications)
- Google Sheets β Gmail (alerts)
- Airtable β Google Sheets (sync data)
Project Management
- Trello β Slack (card updates)
- Trello β Google Calendar (deadline tracking)
- GitHub β Trello (issue tracking)
Notification Workflows
- Slack reactions β Airtable (logging)
- Calendar events β Email reminders
- Notion updates β Slack posts
Usage for LLM Training
Fine-tuning Format
These datasets are compatible with OpenAI's fine-tuning format and similar training pipelines. Each example teaches the model to:
- Parse natural language workflow requests
- Identify required n8n nodes
- Configure node parameters
- Establish proper connections between nodes
Recommended Preprocessing
Before using these datasets:
- Validate JSON: Verify all files parse correctly
- Deduplicate: Remove duplicate examples (some duplicates exist)
- Filter: Optionally filter by specific integrations or complexity
- Balance: Ensure diverse node types are represented
Example Use Cases
- Fine-tune GPT models to generate n8n workflows
- Train models to suggest workflow improvements
- Create workflow completion assistants
- Build n8n-specific code generation tools
Integration with n8n-mcp
This repository complements the n8n-mcp server by providing:
- Static training data for model fine-tuning
- Example workflows for reference
- Pattern library for common automations
While n8n-mcp provides real-time workflow execution and API access, these datasets enable LLMs to learn n8n's workflow generation patterns.
Contributing
When adding new examples:
- Follow the existing JSON structure
- Ensure workflow JSON is valid n8n format
- Use descriptive, natural language in user messages
- Test workflows before adding to datasets
- Avoid duplicates
Known Issues
- Duplicate entries exist in dataset_001.json (minimal impact on training)
- dataset_002.json and dataset_003.json have JSON formatting errors
- Some placeholder values (e.g.,
{{SHEET_ID}},{{API_KEY}}) are included - these are intentional for template-style workflows
Tools
See /scripts/analyze_datasets.py for dataset analysis and statistics tools.