File size: 4,364 Bytes
e65ef8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# n8n Workflow Training Datasets

This directory contains training datasets for fine-tuning Large Language Models (LLMs) to generate n8n workflows from natural language descriptions.

## Dataset Format

Each dataset file is a JSON array containing training examples in a conversational format:

```json

[

  {

    "messages": [

      {

        "role": "user",

        "content": "When a new email arrives in Gmail, save the attachment to Google Drive."

      },

      {

        "role": "assistant",

        "content": "{\"name\": \"Email to Drive\", \"nodes\": [...], \"connections\": {...}, \"active\": false}"

      }

    ]

  }

]

```

### Structure

- **`role: "user"`** - Natural language description of the workflow to create
- **`role: "assistant"`** - JSON representation of the complete n8n workflow

The assistant's response contains a valid n8n workflow definition with:
- `name`: Workflow name
- `nodes`: Array of node definitions (triggers, actions, transformations)
- `connections`: Object defining how nodes are connected
- `active`: Boolean indicating if workflow is active (usually `false` for templates)

## Dataset Files

### dataset_001.json

- **Size**: 2.5 MB

- **Examples**: 3,061 workflow examples

- **Status**: βœ… Valid JSON

- **Focus**: Common workflow patterns (Gmail, Slack, Google Sheets, Trello, Airtable, Notion, etc.)



### dataset_002.json
- **Size**: 4.9 MB
- **Status**: ⚠️ JSON parsing errors detected
- **Note**: May require cleaning before use

### dataset_003.json

- **Size**: 14.0 MB

- **Status**: ⚠️ JSON parsing errors detected

- **Note**: May require cleaning before use



## Common Workflow Patterns



Based on dataset_001.json analysis, the most common patterns include:

1. **Email Automation**
   - Gmail β†’ Google Drive (save attachments)
   - Gmail β†’ Slack (notifications)
   - Gmail β†’ Airtable (create records)

2. **Spreadsheet Integration**
   - Google Sheets β†’ Slack (new row notifications)
   - Google Sheets β†’ Gmail (alerts)
   - Airtable β†’ Google Sheets (sync data)

3. **Project Management**
   - Trello β†’ Slack (card updates)
   - Trello β†’ Google Calendar (deadline tracking)
   - GitHub β†’ Trello (issue tracking)

4. **Notification Workflows**
   - Slack reactions β†’ Airtable (logging)
   - Calendar events β†’ Email reminders
   - Notion updates β†’ Slack posts

## Usage for LLM Training

### Fine-tuning Format

These datasets are compatible with OpenAI's fine-tuning format and similar training pipelines. Each example teaches the model to:

1. Parse natural language workflow requests
2. Identify required n8n nodes
3. Configure node parameters
4. Establish proper connections between nodes

### Recommended Preprocessing

Before using these datasets:

1. **Validate JSON**: Verify all files parse correctly
2. **Deduplicate**: Remove duplicate examples (some duplicates exist)
3. **Filter**: Optionally filter by specific integrations or complexity
4. **Balance**: Ensure diverse node types are represented

### Example Use Cases

- Fine-tune GPT models to generate n8n workflows
- Train models to suggest workflow improvements
- Create workflow completion assistants
- Build n8n-specific code generation tools

## Integration with n8n-mcp

This repository complements the [n8n-mcp](https://github.com/yourusername/n8n-mcp) server by providing:

- **Static training data** for model fine-tuning
- **Example workflows** for reference
- **Pattern library** for common automations

While n8n-mcp provides real-time workflow execution and API access, these datasets enable LLMs to learn n8n's workflow generation patterns.

## Contributing

When adding new examples:

1. Follow the existing JSON structure
2. Ensure workflow JSON is valid n8n format
3. Use descriptive, natural language in user messages
4. Test workflows before adding to datasets
5. Avoid duplicates

## Known Issues

- Duplicate entries exist in dataset_001.json (minimal impact on training)

- dataset_002.json and dataset_003.json have JSON formatting errors

- Some placeholder values (e.g., `{{SHEET_ID}}`, `{{API_KEY}}`) are included - these are intentional for template-style workflows



## Tools



See `/scripts/analyze_datasets.py` for dataset analysis and statistics tools.