eclaude
/

n8n-workflows-sft

+---
+license: apache-2.0
+task_categories:
+  - text-generation
+language:
+  - en
+tags:
+  - n8n
+  - workflow-automation
+  - code-generation
+  - sft
+  - json
+  - low-code
+  - automation
+pretty_name: n8n Workflows SFT Dataset
+size_categories:
+  - 1K<n<10K
+---
+# n8n Workflows SFT Dataset
+A curated dataset of [n8n](https://n8n.io/) workflow examples paired with natural language descriptions, designed for supervised fine-tuning (SFT) of code generation models.
+## Dataset Description
+This dataset contains instruction-workflow pairs where each example consists of:
+- A natural language description of an automation task
+- The corresponding valid n8n workflow JSON configuration
+The dataset is specifically formatted for training models to generate n8n workflows from user prompts.
+| Property | Value |
+|----------|-------|
+| **Format** | JSON |
+| **Size** | 1K-10K examples |
+| **Language** | English |
+| **License** | Apache 2.0 |
+## Dataset Structure
+### Data Fields
+```json
+{
+  "instruction": "string - Natural language description of the desired workflow",
+  "output": "string - Valid n8n workflow JSON configuration"
+}
+```
+### Example
+```json
+{
+  "instruction": "Create a workflow that triggers on a webhook, filters incoming data based on a status field, and sends a notification to Slack",
+  "output": "{\"name\":\"Webhook to Slack\",\"nodes\":[{\"parameters\":{\"path\":\"status-webhook\"},\"name\":\"Webhook\",\"type\":\"n8n-nodes-base.webhook\",\"typeVersion\":1,\"position\":[250,300]},{\"parameters\":{\"conditions\":{\"string\":[{\"value1\":\"={{$json[\\\"status\\\"]}}\",\"value2\":\"active\"}]}},\"name\":\"Filter\",\"type\":\"n8n-nodes-base.filter\",\"typeVersion\":1,\"position\":[450,300]},{\"parameters\":{\"channel\":\"#notifications\",\"text\":\"New active status received\"},\"name\":\"Slack\",\"type\":\"n8n-nodes-base.slack\",\"typeVersion\":1,\"position\":[650,300]}],\"connections\":{\"Webhook\":{\"main\":[[{\"node\":\"Filter\",\"type\":\"main\",\"index\":0}]]},\"Filter\":{\"main\":[[{\"node\":\"Slack\",\"type\":\"main\",\"index\":0}]]}}}"
+}
+```
+## Usage
+### Loading with 🤗 Datasets
+```python
+from datasets import load_dataset
+dataset = load_dataset("eclaude/n8n-workflows-sft")
+# Access training data
+print(dataset["train"][0])
+```
+### Loading with Pandas
+```python
+import pandas as pd
+df = pd.read_json("hf://datasets/eclaude/n8n-workflows-sft/data.json")
+print(df.head())
+```
+### Preparing for SFT Training
+```python
+from datasets import load_dataset
+dataset = load_dataset("eclaude/n8n-workflows-sft")
+def format_for_chat(example):
+    """Format examples for chat-style fine-tuning."""
+    return {
+        "messages": [
+            {
+                "role": "system",
+                "content": "You are an n8n workflow expert. Generate valid n8n workflow JSON configurations based on user requirements."
+            },
+            {
+                "role": "user",
+                "content": example["instruction"]
+            },
+            {
+                "role": "assistant",
+                "content": example["output"]
+            }
+        ]
+    }
+formatted_dataset = dataset.map(format_for_chat)
+```
+### Training with TRL
+```python
+from datasets import load_dataset
+from trl import SFTTrainer, SFTConfig
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "Qwen/Qwen2.5-Coder-3B-Instruct"
+dataset = load_dataset("eclaude/n8n-workflows-sft")
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id)
+def formatting_func(example):
+    return f"""<|im_start|>system
+You are an n8n workflow expert. Generate valid n8n workflow JSON configurations.<|im_end|>
+<|im_start|>user
+{example['instruction']}<|im_end|>
+<|im_start|>assistant
+{example['output']}<|im_end|>"""
+training_args = SFTConfig(
+    output_dir="./n8n-sft-model",
+    per_device_train_batch_size=4,
+    gradient_accumulation_steps=4,
+    num_train_epochs=3,
+    learning_rate=2e-5,
+    bf16=True,
+    logging_steps=10,
+    save_strategy="epoch",
+)
+trainer = SFTTrainer(
+    model=model,
+    args=training_args,
+    train_dataset=dataset["train"],
+    formatting_func=formatting_func,
+    tokenizer=tokenizer,
+    max_seq_length=2048,
+)
+trainer.train()
+```
+## Covered n8n Nodes
+The dataset includes workflows featuring common n8n integrations:
+| Category | Nodes |
+|----------|-------|
+| **Triggers** | Webhook, Schedule, Manual |
+| **Core** | HTTP Request, Code, Function, Set, Filter, Switch, Merge |
+| **Communication** | Slack, Discord, Email, Telegram |
+| **Data** | PostgreSQL, MySQL, MongoDB, Airtable, Google Sheets |
+| **Dev Tools** | GitHub, GitLab, Jira |
+| **Storage** | AWS S3, Google Drive, Dropbox |
+| **CRM** | HubSpot, Salesforce |
+## Intended Uses
+### Primary Use
+- Fine-tuning language models for n8n workflow generation
+- Training code assistants specialized in automation
+### Out-of-Scope Use
+- Direct production deployment without validation
+- Training models for other automation platforms (Zapier, Make, etc.)
+## Limitations
+- **Node Coverage**: Not all 400+ n8n nodes are represented equally
+- **Complexity**: Most workflows are simple to medium complexity (2-8 nodes)
+- **Validation**: Workflows are structurally valid but may require credential configuration
+- **Version**: Based on n8n workflow schema as of late 2024; may need updates for future n8n versions
+## Dataset Creation
+### Source Data
+Workflows were collected and curated from:
+- Public n8n workflow templates
+- Community-shared automations
+- Synthetically generated examples with manual validation
+### Curation Process
+1. Collection of raw workflow JSON files
+2. Extraction and normalization of workflow structure
+3. Generation of natural language descriptions
+4. Manual review for quality and accuracy
+5. Deduplication and filtering
+## Models Trained on This Dataset
+- [eclaude/qwen-coder-3b-n8n-sft](https://huggingface.co/eclaude/qwen-coder-3b-n8n-sft)
+## Citation
+```bibtex
+@dataset{n8n_workflows_sft_2025,
+  author = {eclaude},
+  title = {n8n Workflows SFT Dataset},
+  year = {2025},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/datasets/eclaude/n8n-workflows-sft}
+}
+```
+## Contact
+For questions, suggestions, or contributions, open a discussion on this repository or contact via [Hugging Face](https://huggingface.co/eclaude).