File size: 6,361 Bytes
e8e9ae7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | ---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- n8n
- workflow-automation
- code-generation
- sft
- json
- low-code
- automation
pretty_name: n8n Workflows SFT Dataset
size_categories:
- 1K<n<10K
---
# n8n Workflows SFT Dataset
A curated dataset of [n8n](https://n8n.io/) workflow examples paired with natural language descriptions, designed for supervised fine-tuning (SFT) of code generation models.
## Dataset Description
This dataset contains instruction-workflow pairs where each example consists of:
- A natural language description of an automation task
- The corresponding valid n8n workflow JSON configuration
The dataset is specifically formatted for training models to generate n8n workflows from user prompts.
| Property | Value |
|----------|-------|
| **Format** | JSON |
| **Size** | 1K-10K examples |
| **Language** | English |
| **License** | Apache 2.0 |
## Dataset Structure
### Data Fields
```json
{
"instruction": "string - Natural language description of the desired workflow",
"output": "string - Valid n8n workflow JSON configuration"
}
```
### Example
```json
{
"instruction": "Create a workflow that triggers on a webhook, filters incoming data based on a status field, and sends a notification to Slack",
"output": "{\"name\":\"Webhook to Slack\",\"nodes\":[{\"parameters\":{\"path\":\"status-webhook\"},\"name\":\"Webhook\",\"type\":\"n8n-nodes-base.webhook\",\"typeVersion\":1,\"position\":[250,300]},{\"parameters\":{\"conditions\":{\"string\":[{\"value1\":\"={{$json[\\\"status\\\"]}}\",\"value2\":\"active\"}]}},\"name\":\"Filter\",\"type\":\"n8n-nodes-base.filter\",\"typeVersion\":1,\"position\":[450,300]},{\"parameters\":{\"channel\":\"#notifications\",\"text\":\"New active status received\"},\"name\":\"Slack\",\"type\":\"n8n-nodes-base.slack\",\"typeVersion\":1,\"position\":[650,300]}],\"connections\":{\"Webhook\":{\"main\":[[{\"node\":\"Filter\",\"type\":\"main\",\"index\":0}]]},\"Filter\":{\"main\":[[{\"node\":\"Slack\",\"type\":\"main\",\"index\":0}]]}}}"
}
```
## Usage
### Loading with 🤗 Datasets
```python
from datasets import load_dataset
dataset = load_dataset("eclaude/n8n-workflows-sft")
# Access training data
print(dataset["train"][0])
```
### Loading with Pandas
```python
import pandas as pd
df = pd.read_json("hf://datasets/eclaude/n8n-workflows-sft/data.json")
print(df.head())
```
### Preparing for SFT Training
```python
from datasets import load_dataset
dataset = load_dataset("eclaude/n8n-workflows-sft")
def format_for_chat(example):
"""Format examples for chat-style fine-tuning."""
return {
"messages": [
{
"role": "system",
"content": "You are an n8n workflow expert. Generate valid n8n workflow JSON configurations based on user requirements."
},
{
"role": "user",
"content": example["instruction"]
},
{
"role": "assistant",
"content": example["output"]
}
]
}
formatted_dataset = dataset.map(format_for_chat)
```
### Training with TRL
```python
from datasets import load_dataset
from trl import SFTTrainer, SFTConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Qwen/Qwen2.5-Coder-3B-Instruct"
dataset = load_dataset("eclaude/n8n-workflows-sft")
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
def formatting_func(example):
return f"""<|im_start|>system
You are an n8n workflow expert. Generate valid n8n workflow JSON configurations.<|im_end|>
<|im_start|>user
{example['instruction']}<|im_end|>
<|im_start|>assistant
{example['output']}<|im_end|>"""
training_args = SFTConfig(
output_dir="./n8n-sft-model",
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
num_train_epochs=3,
learning_rate=2e-5,
bf16=True,
logging_steps=10,
save_strategy="epoch",
)
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
formatting_func=formatting_func,
tokenizer=tokenizer,
max_seq_length=2048,
)
trainer.train()
```
## Covered n8n Nodes
The dataset includes workflows featuring common n8n integrations:
| Category | Nodes |
|----------|-------|
| **Triggers** | Webhook, Schedule, Manual |
| **Core** | HTTP Request, Code, Function, Set, Filter, Switch, Merge |
| **Communication** | Slack, Discord, Email, Telegram |
| **Data** | PostgreSQL, MySQL, MongoDB, Airtable, Google Sheets |
| **Dev Tools** | GitHub, GitLab, Jira |
| **Storage** | AWS S3, Google Drive, Dropbox |
| **CRM** | HubSpot, Salesforce |
## Intended Uses
### Primary Use
- Fine-tuning language models for n8n workflow generation
- Training code assistants specialized in automation
### Out-of-Scope Use
- Direct production deployment without validation
- Training models for other automation platforms (Zapier, Make, etc.)
## Limitations
- **Node Coverage**: Not all 400+ n8n nodes are represented equally
- **Complexity**: Most workflows are simple to medium complexity (2-8 nodes)
- **Validation**: Workflows are structurally valid but may require credential configuration
- **Version**: Based on n8n workflow schema as of late 2024; may need updates for future n8n versions
## Dataset Creation
### Source Data
Workflows were collected and curated from:
- Public n8n workflow templates
- Community-shared automations
- Synthetically generated examples with manual validation
### Curation Process
1. Collection of raw workflow JSON files
2. Extraction and normalization of workflow structure
3. Generation of natural language descriptions
4. Manual review for quality and accuracy
5. Deduplication and filtering
## Models Trained on This Dataset
- [eclaude/qwen-coder-3b-n8n-sft](https://huggingface.co/eclaude/qwen-coder-3b-n8n-sft)
## Citation
```bibtex
@dataset{n8n_workflows_sft_2025,
author = {eclaude},
title = {n8n Workflows SFT Dataset},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/eclaude/n8n-workflows-sft}
}
```
## Contact
For questions, suggestions, or contributions, open a discussion on this repository or contact via [Hugging Face](https://huggingface.co/eclaude). |