Instructions to use saik0s/comfy_backup with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use saik0s/comfy_backup with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="saik0s/comfy_backup", filename="ComfyUI/models/text_encoders/gemma-3-12b-it-q2_k.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use saik0s/comfy_backup with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf saik0s/comfy_backup:Q4_K_S # Run inference directly in the terminal: llama cli -hf saik0s/comfy_backup:Q4_K_S
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf saik0s/comfy_backup:Q4_K_S # Run inference directly in the terminal: llama cli -hf saik0s/comfy_backup:Q4_K_S
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf saik0s/comfy_backup:Q4_K_S # Run inference directly in the terminal: ./llama-cli -hf saik0s/comfy_backup:Q4_K_S
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf saik0s/comfy_backup:Q4_K_S # Run inference directly in the terminal: ./build/bin/llama-cli -hf saik0s/comfy_backup:Q4_K_S
Use Docker
docker model run hf.co/saik0s/comfy_backup:Q4_K_S
- LM Studio
- Jan
- Ollama
How to use saik0s/comfy_backup with Ollama:
ollama run hf.co/saik0s/comfy_backup:Q4_K_S
- Unsloth Studio
How to use saik0s/comfy_backup with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for saik0s/comfy_backup to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for saik0s/comfy_backup to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for saik0s/comfy_backup to start chatting
- Pi
How to use saik0s/comfy_backup with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf saik0s/comfy_backup:Q4_K_S
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "saik0s/comfy_backup:Q4_K_S" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use saik0s/comfy_backup with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf saik0s/comfy_backup:Q4_K_S
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default saik0s/comfy_backup:Q4_K_S
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use saik0s/comfy_backup with Docker Model Runner:
docker model run hf.co/saik0s/comfy_backup:Q4_K_S
- Lemonade
How to use saik0s/comfy_backup with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull saik0s/comfy_backup:Q4_K_S
Run and chat with the model
lemonade run user.comfy_backup-Q4_K_S
List all available models
lemonade list
| import os | |
| import importlib.util | |
| from comfy.cli_args import args, PerformanceFeature | |
| import subprocess | |
| #Can't use pytorch to get the GPU names because the cuda malloc has to be set before the first import. | |
| def get_gpu_names(): | |
| if os.name == 'nt': | |
| import ctypes | |
| # Define necessary C structures and types | |
| class DISPLAY_DEVICEA(ctypes.Structure): | |
| _fields_ = [ | |
| ('cb', ctypes.c_ulong), | |
| ('DeviceName', ctypes.c_char * 32), | |
| ('DeviceString', ctypes.c_char * 128), | |
| ('StateFlags', ctypes.c_ulong), | |
| ('DeviceID', ctypes.c_char * 128), | |
| ('DeviceKey', ctypes.c_char * 128) | |
| ] | |
| # Load user32.dll | |
| user32 = ctypes.windll.user32 | |
| # Call EnumDisplayDevicesA | |
| def enum_display_devices(): | |
| device_info = DISPLAY_DEVICEA() | |
| device_info.cb = ctypes.sizeof(device_info) | |
| device_index = 0 | |
| gpu_names = set() | |
| while user32.EnumDisplayDevicesA(None, device_index, ctypes.byref(device_info), 0): | |
| device_index += 1 | |
| gpu_names.add(device_info.DeviceString.decode('utf-8')) | |
| return gpu_names | |
| return enum_display_devices() | |
| else: | |
| gpu_names = set() | |
| out = subprocess.check_output(['nvidia-smi', '-L']) | |
| for l in out.split(b'\n'): | |
| if len(l) > 0: | |
| gpu_names.add(l.decode('utf-8').split(' (UUID')[0]) | |
| return gpu_names | |
| blacklist = {"GeForce GTX TITAN X", "GeForce GTX 980", "GeForce GTX 970", "GeForce GTX 960", "GeForce GTX 950", "GeForce 945M", | |
| "GeForce 940M", "GeForce 930M", "GeForce 920M", "GeForce 910M", "GeForce GTX 750", "GeForce GTX 745", "Quadro K620", | |
| "Quadro K1200", "Quadro K2200", "Quadro M500", "Quadro M520", "Quadro M600", "Quadro M620", "Quadro M1000", | |
| "Quadro M1200", "Quadro M2000", "Quadro M2200", "Quadro M3000", "Quadro M4000", "Quadro M5000", "Quadro M5500", "Quadro M6000", | |
| "GeForce MX110", "GeForce MX130", "GeForce 830M", "GeForce 840M", "GeForce GTX 850M", "GeForce GTX 860M", | |
| "GeForce GTX 1650", "GeForce GTX 1630", "Tesla M4", "Tesla M6", "Tesla M10", "Tesla M40", "Tesla M60" | |
| } | |
| def cuda_malloc_supported(): | |
| try: | |
| names = get_gpu_names() | |
| except: | |
| names = set() | |
| for x in names: | |
| if "NVIDIA" in x: | |
| for b in blacklist: | |
| if b in x: | |
| return False | |
| return True | |
| version = "" | |
| try: | |
| torch_spec = importlib.util.find_spec("torch") | |
| for folder in torch_spec.submodule_search_locations: | |
| ver_file = os.path.join(folder, "version.py") | |
| if os.path.isfile(ver_file): | |
| spec = importlib.util.spec_from_file_location("torch_version_import", ver_file) | |
| module = importlib.util.module_from_spec(spec) | |
| spec.loader.exec_module(module) | |
| version = module.__version__ | |
| except: | |
| pass | |
| if not args.cuda_malloc: | |
| try: | |
| if int(version[0]) >= 2 and "+cu" in version: # enable by default for torch version 2.0 and up only on cuda torch | |
| if PerformanceFeature.AutoTune not in args.fast: # Autotune has issues with cuda malloc | |
| args.cuda_malloc = cuda_malloc_supported() | |
| except: | |
| pass | |
| if args.disable_cuda_malloc: | |
| args.cuda_malloc = False | |
| if args.cuda_malloc: | |
| env_var = os.environ.get('PYTORCH_CUDA_ALLOC_CONF', None) | |
| if env_var is None: | |
| env_var = "backend:cudaMallocAsync" | |
| else: | |
| env_var += ",backend:cudaMallocAsync" | |
| os.environ['PYTORCH_CUDA_ALLOC_CONF'] = env_var | |
| def get_torch_version_noimport(): | |
| return str(version) | |