File size: 3,057 Bytes
4484c0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# n8n-Toolkit: Qwen3-VL Multimodal Training Dataset

A comprehensive multimodal training dataset for fine-tuning Qwen3-VL models on business automation, no-code tooling, and workflow expertise.

## πŸ“Š Dataset Stats

| Metric | Count |
|--------|-------|
| **Total Examples** | 8,521 |
| **Vision Examples** | 2,391 |
| **Text Examples** | 6,130 |
| **n8n Workflow Screenshots** | 2,274 |
| **GoHighLevel Screenshots** | 58 |
| **Odoo Screenshots** | 170 |

## πŸ—οΈ Schema

Flat schema optimized for HuggingFace compatibility:

| Column | Type | Description |
|--------|------|-------------|
| `instruction` | string | User's question or prompt |
| `response` | string | Assistant's detailed answer |
| `image_path` | string | Path to image (empty for text-only) |
| `has_image` | bool | True for vision examples |
| `id` | string | Unique example identifier |
| `category` | string | Topic category |
| `source` | string | Data source |
| `platform` | string | Platform (n8n, GHL, Odoo) |
| `complexity` | string | Difficulty level |

## πŸš€ Quick Start

```python

from datasets import load_dataset



# Load the dataset

dataset = load_dataset("DavidrPatton/n8n-Toolkit", data_files={'train': 'train.jsonl'})



print(f"Total examples: {len(dataset['train'])}")

# Total examples: 8521

```

## πŸ”„ Convert to Qwen3-VL Format

For fine-tuning with Unsloth, transform to the messages format:

```python

from PIL import Image



def to_qwen3_vl_format(row, screenshots_dir):

    """Convert flat row to Qwen3-VL messages format."""

    user_content = [{"type": "text", "text": row['instruction']}]

    

    # Add image if present

    if row['has_image'] and row['image_path']:

        img_path = f"{screenshots_dir}/{row['image_path']}"

        img = Image.open(img_path)

        user_content.insert(0, {"type": "image", "image": img})

    

    return {

        "messages": [

            {"role": "user", "content": user_content},

            {"role": "assistant", "content": [{"type": "text", "text": row['response']}]}

        ]

    }

```

## πŸ“ Repository Structure

```

β”œβ”€β”€ train.jsonl              # Main training data (8,521 examples)

β”œβ”€β”€ screenshots/

β”‚   β”œβ”€β”€ n8n-workflows/       # 2,274 n8n workflow screenshots

β”‚   β”œβ”€β”€ gohighlevel/         # 58 GoHighLevel screenshots

β”‚   └── odoo/                # 170 Odoo ERP screenshots

└── json_workflows/          # 323 n8n workflow JSON templates

```

## 🎯 Topics Covered

- **n8n Workflow Automation** - Triggers, nodes, integrations, debugging
- **GoHighLevel CRM** - Marketing automation, funnels, campaigns
- **Odoo ERP** - CRM, Sales, Inventory, Accounting modules
- **AI/LLM Integration** - OpenAI, Anthropic, embeddings, RAG
- **Full-Stack Development** - JavaScript, React, CSS, APIs

## πŸ“œ License

Apache 2.0

## πŸ”— Links

- [GitHub Repository](https://github.com/David2024patton/n8n-docs-datasets)
- [Qwen3-VL Documentation](https://huggingface.co/Qwen)