DavidrPatton's picture
Add docs-dataset dataset
4484c0f verified
# n8n-Toolkit: Qwen3-VL Multimodal Training Dataset
A comprehensive multimodal training dataset for fine-tuning Qwen3-VL models on business automation, no-code tooling, and workflow expertise.
## πŸ“Š Dataset Stats
| Metric | Count |
|--------|-------|
| **Total Examples** | 8,521 |
| **Vision Examples** | 2,391 |
| **Text Examples** | 6,130 |
| **n8n Workflow Screenshots** | 2,274 |
| **GoHighLevel Screenshots** | 58 |
| **Odoo Screenshots** | 170 |
## πŸ—οΈ Schema
Flat schema optimized for HuggingFace compatibility:
| Column | Type | Description |
|--------|------|-------------|
| `instruction` | string | User's question or prompt |
| `response` | string | Assistant's detailed answer |
| `image_path` | string | Path to image (empty for text-only) |
| `has_image` | bool | True for vision examples |
| `id` | string | Unique example identifier |
| `category` | string | Topic category |
| `source` | string | Data source |
| `platform` | string | Platform (n8n, GHL, Odoo) |
| `complexity` | string | Difficulty level |
## πŸš€ Quick Start
```python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("DavidrPatton/n8n-Toolkit", data_files={'train': 'train.jsonl'})
print(f"Total examples: {len(dataset['train'])}")
# Total examples: 8521
```
## πŸ”„ Convert to Qwen3-VL Format
For fine-tuning with Unsloth, transform to the messages format:
```python
from PIL import Image
def to_qwen3_vl_format(row, screenshots_dir):
"""Convert flat row to Qwen3-VL messages format."""
user_content = [{"type": "text", "text": row['instruction']}]
# Add image if present
if row['has_image'] and row['image_path']:
img_path = f"{screenshots_dir}/{row['image_path']}"
img = Image.open(img_path)
user_content.insert(0, {"type": "image", "image": img})
return {
"messages": [
{"role": "user", "content": user_content},
{"role": "assistant", "content": [{"type": "text", "text": row['response']}]}
]
}
```
## πŸ“ Repository Structure
```
β”œβ”€β”€ train.jsonl # Main training data (8,521 examples)
β”œβ”€β”€ screenshots/
β”‚ β”œβ”€β”€ n8n-workflows/ # 2,274 n8n workflow screenshots
β”‚ β”œβ”€β”€ gohighlevel/ # 58 GoHighLevel screenshots
β”‚ └── odoo/ # 170 Odoo ERP screenshots
└── json_workflows/ # 323 n8n workflow JSON templates
```
## 🎯 Topics Covered
- **n8n Workflow Automation** - Triggers, nodes, integrations, debugging
- **GoHighLevel CRM** - Marketing automation, funnels, campaigns
- **Odoo ERP** - CRM, Sales, Inventory, Accounting modules
- **AI/LLM Integration** - OpenAI, Anthropic, embeddings, RAG
- **Full-Stack Development** - JavaScript, React, CSS, APIs
## πŸ“œ License
Apache 2.0
## πŸ”— Links
- [GitHub Repository](https://github.com/David2024patton/n8n-docs-datasets)
- [Qwen3-VL Documentation](https://huggingface.co/Qwen)