DavidrPatton
/

n8n-docs-datasets

Model card Files Files and versions

n8n-docs-datasets / docs-dataset /multimodal /README.md

DavidrPatton's picture

Add docs-dataset dataset

4484c0f verified 14 days ago

|

history blame contribute delete

3.06 kB

	# n8n-Toolkit: Qwen3-VL Multimodal Training Dataset

	A comprehensive multimodal training dataset for fine-tuning Qwen3-VL models on business automation, no-code tooling, and workflow expertise.

	## 📊 Dataset Stats

	\| Metric \| Count \|
	\|--------\|-------\|
	\| Total Examples \| 8,521 \|
	\| Vision Examples \| 2,391 \|
	\| Text Examples \| 6,130 \|
	\| n8n Workflow Screenshots \| 2,274 \|
	\| GoHighLevel Screenshots \| 58 \|
	\| Odoo Screenshots \| 170 \|

	## 🏗️ Schema

	Flat schema optimized for HuggingFace compatibility:

	\| Column \| Type \| Description \|
	\|--------\|------\|-------------\|
	\| `instruction` \| string \| User's question or prompt \|
	\| `response` \| string \| Assistant's detailed answer \|
	\| `image_path` \| string \| Path to image (empty for text-only) \|
	\| `has_image` \| bool \| True for vision examples \|
	\| `id` \| string \| Unique example identifier \|
	\| `category` \| string \| Topic category \|
	\| `source` \| string \| Data source \|
	\| `platform` \| string \| Platform (n8n, GHL, Odoo) \|
	\| `complexity` \| string \| Difficulty level \|

	## 🚀 Quick Start

	```python
	from datasets import load_dataset

	# Load the dataset
	dataset = load_dataset("DavidrPatton/n8n-Toolkit", data_files={'train': 'train.jsonl'})

	print(f"Total examples: {len(dataset['train'])}")
	# Total examples: 8521
	```

	## 🔄 Convert to Qwen3-VL Format

	For fine-tuning with Unsloth, transform to the messages format:

	```python
	from PIL import Image

	def to_qwen3_vl_format(row, screenshots_dir):
	"""Convert flat row to Qwen3-VL messages format."""
	user_content = [{"type": "text", "text": row['instruction']}]

	# Add image if present
	if row['has_image'] and row['image_path']:
	img_path = f"{screenshots_dir}/{row['image_path']}"
	img = Image.open(img_path)
	user_content.insert(0, {"type": "image", "image": img})

	return {
	"messages": [
	{"role": "user", "content": user_content},
	{"role": "assistant", "content": [{"type": "text", "text": row['response']}]}
	]
	}
	```

	## 📁 Repository Structure

	```
	├── train.jsonl # Main training data (8,521 examples)
	├── screenshots/
	│ ├── n8n-workflows/ # 2,274 n8n workflow screenshots
	│ ├── gohighlevel/ # 58 GoHighLevel screenshots
	│ └── odoo/ # 170 Odoo ERP screenshots
	└── json_workflows/ # 323 n8n workflow JSON templates
	```

	## 🎯 Topics Covered

	- n8n Workflow Automation - Triggers, nodes, integrations, debugging
	- GoHighLevel CRM - Marketing automation, funnels, campaigns
	- Odoo ERP - CRM, Sales, Inventory, Accounting modules
	- AI/LLM Integration - OpenAI, Anthropic, embeddings, RAG
	- Full-Stack Development - JavaScript, React, CSS, APIs

	## 📜 License

	Apache 2.0

	## 🔗 Links

	- [GitHub Repository](https://github.com/David2024patton/n8n-docs-datasets)
	- [Qwen3-VL Documentation](https://huggingface.co/Qwen)