Instructions to use DanielPFlorian/comfyui-workflowgenerator-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DanielPFlorian/comfyui-workflowgenerator-models with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("DanielPFlorian/comfyui-workflowgenerator-models")

sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

llama-cpp-python

How to use DanielPFlorian/comfyui-workflowgenerator-models with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="DanielPFlorian/comfyui-workflowgenerator-models",
	filename="Qwen2.5-7B-Instruct-q8_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use DanielPFlorian/comfyui-workflowgenerator-models with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf DanielPFlorian/comfyui-workflowgenerator-models:Q8_0
# Run inference directly in the terminal:
llama-cli -hf DanielPFlorian/comfyui-workflowgenerator-models:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf DanielPFlorian/comfyui-workflowgenerator-models:Q8_0
# Run inference directly in the terminal:
llama-cli -hf DanielPFlorian/comfyui-workflowgenerator-models:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf DanielPFlorian/comfyui-workflowgenerator-models:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf DanielPFlorian/comfyui-workflowgenerator-models:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf DanielPFlorian/comfyui-workflowgenerator-models:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf DanielPFlorian/comfyui-workflowgenerator-models:Q8_0

Use Docker

docker model run hf.co/DanielPFlorian/comfyui-workflowgenerator-models:Q8_0

LM Studio
Jan

vLLM

How to use DanielPFlorian/comfyui-workflowgenerator-models with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DanielPFlorian/comfyui-workflowgenerator-models"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DanielPFlorian/comfyui-workflowgenerator-models",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/DanielPFlorian/comfyui-workflowgenerator-models:Q8_0

Ollama
How to use DanielPFlorian/comfyui-workflowgenerator-models with Ollama:
```
ollama run hf.co/DanielPFlorian/comfyui-workflowgenerator-models:Q8_0
```

Unsloth Studio new

How to use DanielPFlorian/comfyui-workflowgenerator-models with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DanielPFlorian/comfyui-workflowgenerator-models to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DanielPFlorian/comfyui-workflowgenerator-models to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for DanielPFlorian/comfyui-workflowgenerator-models to start chatting

Pi new

How to use DanielPFlorian/comfyui-workflowgenerator-models with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf DanielPFlorian/comfyui-workflowgenerator-models:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "DanielPFlorian/comfyui-workflowgenerator-models:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use DanielPFlorian/comfyui-workflowgenerator-models with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf DanielPFlorian/comfyui-workflowgenerator-models:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default DanielPFlorian/comfyui-workflowgenerator-models:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use DanielPFlorian/comfyui-workflowgenerator-models with Docker Model Runner:
```
docker model run hf.co/DanielPFlorian/comfyui-workflowgenerator-models:Q8_0
```

Lemonade

How to use DanielPFlorian/comfyui-workflowgenerator-models with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull DanielPFlorian/comfyui-workflowgenerator-models:Q8_0

Run and chat with the model

lemonade run user.comfyui-workflowgenerator-models-Q8_0

List all available models

lemonade list

DanielPFlorian commited on Nov 21, 2025

Commit

b5eff20

verified ·

1 Parent(s): ffe0987

Update README.md

Browse files

Files changed (1) hide show

README.md +275 -275

README.md CHANGED Viewed

@@ -1,275 +1,275 @@
----
-tags:
-- gguf
-- comfyui
-- workflow-generation
-- qwen
-- text-generation
-- sentence-transformers
-library_name: gguf
-base_model: Qwen/Qwen2.5-14B
-license: gpl-3.0
-language:
-- en
-pipeline_tag: text-generation
----
-# ComfyUI-WorkflowGenerator Models
-This repository contains the quantized GGUF models required for [ComfyUI-WorkflowGenerator](https://github.com/danielpflorian/ComfyUI-WorkflowGenerator), a custom node implementation that generates ComfyUI workflows from natural language descriptions.
-## Models Included
-### 1. Workflow Generator Model (Required)
-- **File**: `workflow-generator-q8_0.gguf`
-- **Tokenizer**: `workflow-generator/` directory
-- **Purpose**: Generates workflow diagrams from natural language instructions
-- **Base Model**: Qwen2.5-14B
-- **Training**: Fine-tuned from Qwen2.5-14B using LLaMA-Factory (see [original ComfyGPT repository](https://github.com/comfygpt/comfygpt/tree/main))
-- **Status**: **Required** - This model is always needed
-### 2. Embedding Model (Required)
-- **Directory**: `paraphrase-multilingual-MiniLM-L12-v2/`
-- **Purpose**: Semantic search for node name matching and validation
-- **Base Model**: [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
-- **Status**: **Required** - Always needed for semantic search in NodeValidator
-### 3. Node Validator Model (Optional)
-- **File**: `Qwen2.5-7B-Instruct-q8_0.gguf`
-- **Tokenizer**: `Qwen2.5-7B-Instruct/` directory
-- **Purpose**: Refines and corrects node names in workflow diagrams (LLM refinement mode)
-- **Base Model**: Qwen2.5-7B-Instruct (base model, not fine-tuned)
-- **Status**: **Optional** - Only needed if using LLM refinement (`use_llm_refinement=True`)
-## Model Training Information
-### Workflow Generator Model
-The `workflow-generator-q8_0.gguf` model was trained based on the [ComfyGPT research](https://github.com/comfygpt/comfygpt/tree/main) methodology:
-- **Original Model Source**: [xiatianzs/resources](https://huggingface.co/xiatianzs/resources/tree/main) - Original fine-tuned model from ComfyGPT research team
-- **Base Model**: [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) from HuggingFace
-- **Training Method**: Full fine-tuning (Supervised Fine-Tuning / SFT)
-- **Training Framework**: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)
-- **Training Dataset**: `FlowDataset.json` - Contains instruction-input-output pairs where:
-  - Instruction: "Based on the description I provided, generate a JSON example of the required ComfyUi workflow."
-  - Input: Natural language workflow descriptions
-  - Output: JSON diagrams (list of edges representing workflow connections)
-- **Training Hyperparameters**:
-  - Learning rate: 1.0e-5
-  - Epochs: 3.0
-  - Batch size: 1 per device (gradient accumulation: 4 steps)
-  - LR scheduler: Cosine with 0.1 warmup ratio
-  - Precision: bf16
-  - Cutoff length: 8,192 tokens (training cutoff; model architecture supports up to 131,072 tokens)
-  - DeepSpeed: ZeRO-3 optimization
-- **Quantization**: q8_0 (8-bit quantization for efficient inference)
-For more details on the training process, see the [original ComfyGPT repository](https://github.com/comfygpt/comfygpt/tree/main) and [training configuration](https://github.com/comfygpt/comfygpt/tree/main/train/sft).
-### Embedding Model
-The `paraphrase-multilingual-MiniLM-L12-v2` model is a SentenceTransformer model used for semantic search in the NodeValidator. It encodes node names into embeddings and finds the most similar nodes when correcting invalid node names in workflow diagrams.
-- **Original Model**: [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
-- **Type**: SentenceTransformer (HuggingFace format)
-- **Size**: ~420 MB
-- **Dimensions**: 384 (embedding vector size)
-- **Use Case**: Semantic similarity search for node name matching
-### Node Validator Model
-The `Qwen2.5-7B-Instruct-q8_0.gguf` model is the base [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model (not fine-tuned), used for its built-in instruction-following capabilities to select the best node from semantic search candidates.
-## Quick Download
-Download all models at once:
-```bash
-huggingface-cli download DanielPFlorian/comfyui-workflowgenerator-models \
-    --local-dir ./ComfyUI/models/LLM/comfyui-workflowgenerator
-```
-Or download specific models:
-```bash
-# Download only the required models (workflow-generator + embedding model)
-huggingface-cli download DanielPFlorian/comfyui-workflowgenerator-models \
-    --include "workflow-generator-q8_0.gguf" "workflow-generator/*" "paraphrase-multilingual-MiniLM-L12-v2/*" \
-    --local-dir ./ComfyUI/models/LLM/comfyui-workflowgenerator
-```
-## Installation and Setup
-### Step 1: Download Models
-Download the models to your ComfyUI models directory:
-```bash
-# Navigate to ComfyUI directory
-cd /path/to/ComfyUI
-# Download all models
-huggingface-cli download DanielPFlorian/comfyui-workflowgenerator-models \
-    --local-dir ./models/LLM/comfyui-workflowgenerator
-```
-### Step 2: Organize Files in LLM Directory
-After downloading, organize the files in `ComfyUI/models/LLM/` as follows:
-```
-ComfyUI/models/LLM/
-├── workflow-generator-q8_0.gguf          # Main model (required)
-├── workflow-generator/                   # Main tokenizer (required)
-│   ├── tokenizer.json
-│   ├── tokenizer_config.json
-│   ├── vocab.json
-│   ├── merges.txt
-│   ├── special_tokens_map.json
-│   ├── added_tokens.json
-│   ├── config.json
-│   ├── generation_config.json
-│   └── model.safetensors.index.json
-├── paraphrase-multilingual-MiniLM-L12-v2/  # Embedding model (required)
-│   ├── config.json
-│   ├── model.safetensors
-│   ├── modules.json
-│   ├── sentence_bert_config.json
-│   ├── config_sentence_transformers.json
-│   ├── tokenizer.json
-│   ├── tokenizer_config.json
-│   ├── special_tokens_map.json
-│   ├── sentencepiece.bpe.model
-│   ├── unigram.json
-│   └── 1_Pooling/
-│       └── config.json
-├── Qwen2.5-7B-Instruct-q8_0.gguf         # NodeValidator model (optional)
-└── Qwen2.5-7B-Instruct/                   # NodeValidator tokenizer (optional)
-    ├── tokenizer.json
-    ├── tokenizer_config.json
-    ├── vocab.json
-    ├── merges.txt
-    ├── config.json
-    └── generation_config.json
-```
-**Important**: The tokenizer directory name must match the model name (without `.gguf` extension and quantization suffix). The auto-detection code looks for:
-- `workflow-generator-q8_0.gguf` → `workflow-generator/` tokenizer
-- `Qwen2.5-7B-Instruct-q8_0.gguf` → `Qwen2.5-7B-Instruct/` tokenizer
-## Usage
-### Required Models
-- **workflow-generator-q8_0.gguf** + **workflow-generator/** tokenizer - Always needed
-- **paraphrase-multilingual-MiniLM-L12-v2/** - Always needed for semantic search
-### Optional Models
-- **Qwen2.5-7B-Instruct-q8_0.gguf** + **Qwen2.5-7B-Instruct/** tokenizer - Only needed if using LLM refinement (`use_llm_refinement=True`)
-### Model Usage in ComfyUI-WorkflowGenerator
-1. **WorkflowGenerator Node**: Uses `workflow-generator-q8_0.gguf` to generate workflow diagrams from natural language
-2. **NodeValidator Node**:
-   - Uses `paraphrase-multilingual-MiniLM-L12-v2` for semantic search (always)
-   - Uses `Qwen2.5-7B-Instruct-q8_0.gguf` for LLM refinement (optional, when `use_llm_refinement=True`)
-3. **WorkflowBuilder Node**: No models needed (deterministic code)
-## Model Specifications
-### Workflow Generator Model
-- **Format**: GGUF (q8_0 quantization)
-- **Base**: Qwen2.5-14B
-- **Size**: ~8-9 GB (quantized)
-- **Context Window**: 131,072 tokens (128K) - Model architecture supports up to 131K tokens, though training used 16,384 token cutoff
-- **Quantization**: q8_0 (8-bit, good balance of quality and size)
-### Embedding Model
-- **Format**: SentenceTransformer (HuggingFace format)
-- **Base**: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
-- **Size**: ~420 MB
-- **Dimensions**: 384 (embedding vector size)
-### Node Validator Model
-- **Format**: GGUF (q8_0 quantization)
-- **Base**: Qwen2.5-7B-Instruct
-- **Size**: ~4-5 GB (quantized)
-- **Context Window**: 32,768 tokens
-- **Quantization**: q8_0 (8-bit, good balance of quality and size)
-## System Requirements
-- **VRAM**:
-  - Minimum: 8 GB (for workflow-generator with CPU offloading)
-  - Recommended: 16+ GB (for both models on GPU)
-- **RAM**: 16+ GB recommended
-- **Storage**: ~15-16 GB for all models, tokenizers, and embedding model
-## Performance Tips
-1. **Use GGUF models**: Smaller size and better VRAM efficiency than HuggingFace models
-2. **GPU Layers**: Use "auto" setting for optimal GPU layer allocation
-3. **LLM Refinement**: Only enable if you need higher accuracy (slower but more accurate)
-4. **Semantic Search Only**: Faster execution, deterministic results (recommended for most use cases)
-## Troubleshooting
-### Model Not Found
-- Verify models are in `ComfyUI/models/LLM/` directory
-- Check tokenizer directory name matches model name (without `.gguf` and quantization suffix)
-- Restart ComfyUI after moving files
-### Tokenizer Not Found
-- Ensure tokenizer directory exists with the correct name
-- Verify tokenizer files (`tokenizer.json`, `tokenizer_config.json`, `vocab.json`) are present
-- Check directory structure matches the expected format
-### Out of Memory
-- Reduce `n_gpu_layers` (try "auto" or lower number)
-- Use smaller quantization (q4_0 instead of q8_0) - note: you'll need to re-quantize
-- Set `device_preference` to "cpu" for some operations
-## Related Resources
-- **ComfyUI-WorkflowGenerator**: [GitHub Repository](https://github.com/danielpflorian/ComfyUI-WorkflowGenerator)
-- **Original ComfyGPT Research**: [GitHub Repository](https://github.com/comfygpt/comfygpt)
-- **Research Paper**: [arXiv:2503.17671](https://arxiv.org/abs/2503.17671)
-- **Project Website**: [https://comfygpt.github.io/](https://comfygpt.github.io/)
-## Citation
-If you use these models in your research, please cite the original ComfyGPT paper:
-```bibtex
-@article{huang2025comfygpt,
-  title={ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation},
-  author={Huang, Oucheng and Ma, Yuhang and Zhao, Zeng and Wu, Mingrui and Ji, Jiayi and Zhang, Rongsheng and Hu, Zhipeng and Sun, Xiaoshuai and Ji, Rongrong},
-  journal={arXiv preprint arXiv:2503.17671},
-  year={2025}
-}
-```
-## License
-These models are provided for use with ComfyUI-WorkflowGenerator. Please refer to:
-- Original ComfyGPT repository for model training details and licensing
-- Qwen2.5 model licenses from HuggingFace
-- ComfyUI-WorkflowGenerator repository for usage terms
-## Support
-For issues, questions, or contributions:
-- **Issues**: [ComfyUI-WorkflowGenerator Issues](https://github.com/danielpflorian/ComfyUI-WorkflowGenerator/issues)
-- **Documentation**: [ComfyUI-WorkflowGenerator Wiki](https://github.com/danielpflorian/ComfyUI-WorkflowGenerator/wiki)

+---
+tags:
+- gguf
+- comfyui
+- workflow-generation
+- qwen
+- text-generation
+- sentence-transformers
+library_name: gguf
+base_model: Qwen/Qwen2.5-14B
+license: gpl-3.0
+language:
+- en
+pipeline_tag: text-generation
+---
+# ComfyUI-WorkflowGenerator Models
+This repository contains the quantized GGUF models required for [ComfyUI-WorkflowGenerator](https://github.com/danielpflorian/ComfyUI-WorkflowGenerator), a custom node implementation that generates ComfyUI workflows from natural language descriptions.
+## Models Included
+### 1. Workflow Generator Model (Required)
+- **File**: `workflow-generator-q8_0.gguf`
+- **Tokenizer**: `workflow-generator/` directory
+- **Purpose**: Generates workflow diagrams from natural language instructions
+- **Base Model**: Qwen2.5-14B
+- **Training**: Fine-tuned from Qwen2.5-14B using LLaMA-Factory (see [original ComfyGPT repository](https://github.com/comfygpt/comfygpt/tree/main))
+- **Status**: **Required** - This model is always needed
+### 2. Embedding Model (Required)
+- **Directory**: `paraphrase-multilingual-MiniLM-L12-v2/`
+- **Purpose**: Semantic search for node name matching and validation
+- **Base Model**: [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
+- **Status**: **Required** - Always needed for semantic search in NodeValidator
+### 3. Node Validator Model (Optional)
+- **File**: `Qwen2.5-7B-Instruct-q8_0.gguf`
+- **Tokenizer**: `Qwen2.5-7B-Instruct/` directory
+- **Purpose**: Refines and corrects node names in workflow diagrams (LLM refinement mode)
+- **Base Model**: Qwen2.5-7B-Instruct (base model, not fine-tuned)
+- **Status**: **Optional** - Only needed if using LLM refinement (`use_llm_refinement=True`)
+## Model Training Information
+### Workflow Generator Model
+The `workflow-generator-q8_0.gguf` model was trained based on the [ComfyGPT research](https://github.com/comfygpt/comfygpt/tree/main) methodology:
+- **Original Model Source**: [xiatianzs/resources](https://huggingface.co/xiatianzs/resources/tree/main) - Original fine-tuned model from ComfyGPT research team
+- **Base Model**: [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) from HuggingFace
+- **Training Method**: Full fine-tuning (Supervised Fine-Tuning / SFT)
+- **Training Framework**: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)
+- **Training Dataset**: `FlowDataset.json` - Contains instruction-input-output pairs where:
+  - Instruction: "Based on the description I provided, generate a JSON example of the required ComfyUi workflow."
+  - Input: Natural language workflow descriptions
+  - Output: JSON diagrams (list of edges representing workflow connections)
+- **Training Hyperparameters**:
+  - Learning rate: 1.0e-5
+  - Epochs: 3.0
+  - Batch size: 1 per device (gradient accumulation: 4 steps)
+  - LR scheduler: Cosine with 0.1 warmup ratio
+  - Precision: bf16
+  - Cutoff length: 8,192 tokens (training cutoff; model architecture supports up to 131,072 tokens)
+  - DeepSpeed: ZeRO-3 optimization
+- **Quantization**: q8_0 (8-bit quantization for efficient inference)
+For more details on the training process, see the [original ComfyGPT repository](https://github.com/comfygpt/comfygpt/tree/main) and [training configuration](https://github.com/comfygpt/comfygpt/tree/main/train/sft).
+### Embedding Model
+The `paraphrase-multilingual-MiniLM-L12-v2` model is a SentenceTransformer model used for semantic search in the NodeValidator. It encodes node names into embeddings and finds the most similar nodes when correcting invalid node names in workflow diagrams.
+- **Original Model**: [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
+- **Type**: SentenceTransformer (HuggingFace format)
+- **Size**: ~420 MB
+- **Dimensions**: 384 (embedding vector size)
+- **Use Case**: Semantic similarity search for node name matching
+### Node Validator Model
+The `Qwen2.5-7B-Instruct-q8_0.gguf` model is the base [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model (not fine-tuned), used for its built-in instruction-following capabilities to select the best node from semantic search candidates.
+## Quick Download
+Download all models at once:
+```bash
+huggingface-cli download DanielPFlorian/comfyui-workflowgenerator-models \
+    --local-dir ./ComfyUI/models/LLM/
+```
+Or download specific models:
+```bash
+# Download only the required models (workflow-generator + embedding model)
+huggingface-cli download DanielPFlorian/comfyui-workflowgenerator-models \
+    --include "workflow-generator-q8_0.gguf" "workflow-generator/*" "paraphrase-multilingual-MiniLM-L12-v2/*" \
+    --local-dir ./ComfyUI/models/LLM/
+```
+## Installation and Setup
+### Step 1: Download Models
+Download the models to your ComfyUI models directory:
+```bash
+# Navigate to ComfyUI directory
+cd /path/to/ComfyUI
+# Download all models
+huggingface-cli download DanielPFlorian/comfyui-workflowgenerator-models \
+    --local-dir ./models/LLM/
+```
+### Step 2: Organize Files in LLM Directory
+After downloading, organize the files in `ComfyUI/models/LLM/` as follows:
+```
+ComfyUI/models/LLM/
+├── workflow-generator-q8_0.gguf          # Main model (required)
+├── workflow-generator/                   # Main tokenizer (required)
+│   ├── tokenizer.json
+│   ├── tokenizer_config.json
+│   ├── vocab.json
+│   ├── merges.txt
+│   ├── special_tokens_map.json
+│   ├── added_tokens.json
+│   ├── config.json
+│   ├── generation_config.json
+│   └── model.safetensors.index.json
+├── paraphrase-multilingual-MiniLM-L12-v2/  # Embedding model (required)
+│   ├── config.json
+│   ├── model.safetensors
+│   ├── modules.json
+│   ├── sentence_bert_config.json
+│   ├── config_sentence_transformers.json
+│   ├── tokenizer.json
+│   ├── tokenizer_config.json
+│   ├── special_tokens_map.json
+│   ├── sentencepiece.bpe.model
+│   ├── unigram.json
+│   └── 1_Pooling/
+│       └── config.json
+├── Qwen2.5-7B-Instruct-q8_0.gguf         # NodeValidator model (optional)
+└── Qwen2.5-7B-Instruct/                   # NodeValidator tokenizer (optional)
+    ├── tokenizer.json
+    ├── tokenizer_config.json
+    ├── vocab.json
+    ├── merges.txt
+    ├── config.json
+    └── generation_config.json
+```
+**Important**: The tokenizer directory name must match the model name (without `.gguf` extension and quantization suffix). The auto-detection code looks for:
+- `workflow-generator-q8_0.gguf` → `workflow-generator/` tokenizer
+- `Qwen2.5-7B-Instruct-q8_0.gguf` → `Qwen2.5-7B-Instruct/` tokenizer
+## Usage
+### Required Models
+- **workflow-generator-q8_0.gguf** + **workflow-generator/** tokenizer - Always needed
+- **paraphrase-multilingual-MiniLM-L12-v2/** - Always needed for semantic search
+### Optional Models
+- **Qwen2.5-7B-Instruct-q8_0.gguf** + **Qwen2.5-7B-Instruct/** tokenizer - Only needed if using LLM refinement (`use_llm_refinement=True`)
+### Model Usage in ComfyUI-WorkflowGenerator
+1. **WorkflowGenerator Node**: Uses `workflow-generator-q8_0.gguf` to generate workflow diagrams from natural language
+2. **NodeValidator Node**:
+   - Uses `paraphrase-multilingual-MiniLM-L12-v2` for semantic search (always)
+   - Uses `Qwen2.5-7B-Instruct-q8_0.gguf` for LLM refinement (optional, when `use_llm_refinement=True`)
+3. **WorkflowBuilder Node**: No models needed (deterministic code)
+## Model Specifications
+### Workflow Generator Model
+- **Format**: GGUF (q8_0 quantization)
+- **Base**: Qwen2.5-14B
+- **Size**: ~8-9 GB (quantized)
+- **Context Window**: 131,072 tokens (128K) - Model architecture supports up to 131K tokens, though training used 16,384 token cutoff
+- **Quantization**: q8_0 (8-bit, good balance of quality and size)
+### Embedding Model
+- **Format**: SentenceTransformer (HuggingFace format)
+- **Base**: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
+- **Size**: ~420 MB
+- **Dimensions**: 384 (embedding vector size)
+### Node Validator Model
+- **Format**: GGUF (q8_0 quantization)
+- **Base**: Qwen2.5-7B-Instruct
+- **Size**: ~4-5 GB (quantized)
+- **Context Window**: 32,768 tokens
+- **Quantization**: q8_0 (8-bit, good balance of quality and size)
+## System Requirements
+- **VRAM**:
+  - Minimum: 8 GB (for workflow-generator with CPU offloading)
+  - Recommended: 16+ GB (for both models on GPU)
+- **RAM**: 16+ GB recommended
+- **Storage**: ~15-16 GB for all models, tokenizers, and embedding model
+## Performance Tips
+1. **Use GGUF models**: Smaller size and better VRAM efficiency than HuggingFace models
+2. **GPU Layers**: Use "auto" setting for optimal GPU layer allocation
+3. **LLM Refinement**: Only enable if you need higher accuracy (slower but more accurate)
+4. **Semantic Search Only**: Faster execution, deterministic results (recommended for most use cases)
+## Troubleshooting
+### Model Not Found
+- Verify models are in `ComfyUI/models/LLM/` directory
+- Check tokenizer directory name matches model name (without `.gguf` and quantization suffix)
+- Restart ComfyUI after moving files
+### Tokenizer Not Found
+- Ensure tokenizer directory exists with the correct name
+- Verify tokenizer files (`tokenizer.json`, `tokenizer_config.json`, `vocab.json`) are present
+- Check directory structure matches the expected format
+### Out of Memory
+- Reduce `n_gpu_layers` (try "auto" or lower number)
+- Use smaller quantization (q4_0 instead of q8_0) - note: you'll need to re-quantize
+- Set `device_preference` to "cpu" for some operations
+## Related Resources
+- **ComfyUI-WorkflowGenerator**: [GitHub Repository](https://github.com/danielpflorian/ComfyUI-WorkflowGenerator)
+- **Original ComfyGPT Research**: [GitHub Repository](https://github.com/comfygpt/comfygpt)
+- **Research Paper**: [arXiv:2503.17671](https://arxiv.org/abs/2503.17671)
+- **Project Website**: [https://comfygpt.github.io/](https://comfygpt.github.io/)
+## Citation
+If you use these models in your research, please cite the original ComfyGPT paper:
+```bibtex
+@article{huang2025comfygpt,
+  title={ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation},
+  author={Huang, Oucheng and Ma, Yuhang and Zhao, Zeng and Wu, Mingrui and Ji, Jiayi and Zhang, Rongsheng and Hu, Zhipeng and Sun, Xiaoshuai and Ji, Rongrong},
+  journal={arXiv preprint arXiv:2503.17671},
+  year={2025}
+}
+```
+## License
+These models are provided for use with ComfyUI-WorkflowGenerator. Please refer to:
+- Original ComfyGPT repository for model training details and licensing
+- Qwen2.5 model licenses from HuggingFace
+- ComfyUI-WorkflowGenerator repository for usage terms
+## Support
+For issues, questions, or contributions:
+- **Issues**: [ComfyUI-WorkflowGenerator Issues](https://github.com/danielpflorian/ComfyUI-WorkflowGenerator/issues)
+- **Documentation**: [ComfyUI-WorkflowGenerator Wiki](https://github.com/danielpflorian/ComfyUI-WorkflowGenerator/wiki)