Instructions to use saik0s/comfy_backup with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use saik0s/comfy_backup with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="saik0s/comfy_backup",
	filename="ComfyUI/models/text_encoders/gemma-3-12b-it-q2_k.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use saik0s/comfy_backup with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf saik0s/comfy_backup:Q4_K_S
# Run inference directly in the terminal:
llama cli -hf saik0s/comfy_backup:Q4_K_S

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf saik0s/comfy_backup:Q4_K_S
# Run inference directly in the terminal:
llama cli -hf saik0s/comfy_backup:Q4_K_S

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf saik0s/comfy_backup:Q4_K_S
# Run inference directly in the terminal:
./llama-cli -hf saik0s/comfy_backup:Q4_K_S

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf saik0s/comfy_backup:Q4_K_S
# Run inference directly in the terminal:
./build/bin/llama-cli -hf saik0s/comfy_backup:Q4_K_S

Use Docker

docker model run hf.co/saik0s/comfy_backup:Q4_K_S

LM Studio
Jan
Ollama
How to use saik0s/comfy_backup with Ollama:
```
ollama run hf.co/saik0s/comfy_backup:Q4_K_S
```

Unsloth Studio

How to use saik0s/comfy_backup with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for saik0s/comfy_backup to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for saik0s/comfy_backup to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for saik0s/comfy_backup to start chatting

How to use saik0s/comfy_backup with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf saik0s/comfy_backup:Q4_K_S

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "saik0s/comfy_backup:Q4_K_S"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use saik0s/comfy_backup with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf saik0s/comfy_backup:Q4_K_S

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default saik0s/comfy_backup:Q4_K_S

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use saik0s/comfy_backup with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf saik0s/comfy_backup:Q4_K_S

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "saik0s/comfy_backup:Q4_K_S" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use saik0s/comfy_backup with Docker Model Runner:
```
docker model run hf.co/saik0s/comfy_backup:Q4_K_S
```

Lemonade

How to use saik0s/comfy_backup with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull saik0s/comfy_backup:Q4_K_S

Run and chat with the model

lemonade run user.comfy_backup-Q4_K_S

List all available models

lemonade list

comfy_backup / comfy /memory_management.py

saik0s

Add files using upload-large-folder tool

47223ae verified 20 days ago

Raw

History Blame Contribute Delete

7.03 kB

	import math
	import ctypes
	import dataclasses
	import torch
	from typing import NamedTuple

	import comfy_aimdo.host_buffer
	from comfy.quant_ops import QuantizedTensor


	class TensorFileSlice(NamedTuple):
	file_ref: object
	lock: object
	offset: int
	size: int


	def read_tensor_file_slice_into(tensor, destination, stream=None, destination2=None):

	if isinstance(tensor, QuantizedTensor):
	if not read_tensor_file_slice_into(tensor._qdata,
	destination._qdata if destination is not None else None, stream=stream,
	destination2=(destination2._qdata if destination2 is not None else None)):
	return False

	if destination is not None:
	dst_orig_dtype = destination._params.orig_dtype
	destination._params.copy_from(tensor._params, non_blocking=False)
	destination._params = dataclasses.replace(destination._params, orig_dtype=dst_orig_dtype)
	if destination2 is not None:
	dst_orig_dtype = destination2._params.orig_dtype
	destination2._params.copy_from(destination._params if destination is not None else tensor._params, non_blocking=True)
	destination2._params = dataclasses.replace(destination2._params, orig_dtype=dst_orig_dtype)
	return True

	info = getattr(tensor.untyped_storage(), "_comfy_tensor_file_slice", None)
	if info is None:
	return False

	if destination is not None and destination.device.type != "cpu" and destination2 is None:
	destination2 = destination
	destination = None

	file_obj = info.file_ref
	if (file_obj is None
	or (destination is None and destination2 is None)
	or (destination is not None and (destination.device.type != "cpu" or destination.numel() * destination.element_size() < info.size))
	or (destination2 is not None and (destination2.device.type == "cpu" or destination2.numel() * destination2.element_size() < info.size))
	or tensor.numel() * tensor.element_size() != info.size
	or tensor.storage_offset() != 0
	or not tensor.is_contiguous()):
	return False

	if info.size == 0:
	return True

	if destination is None:
	stream_ptr = getattr(stream, "cuda_stream", 0) if stream is not None else 0
	comfy_aimdo.host_buffer.read_file_to_device(file_obj, info.offset, info.size,
	stream_ptr, destination2.data_ptr(),
	destination2.device.index,
	mark_cold=False)
	return True

	hostbuf = getattr(destination.untyped_storage(), "_comfy_hostbuf", None)
	if hostbuf is not None:
	stream_ptr = getattr(stream, "cuda_stream", 0) if stream is not None else 0
	device_ptr = destination2.data_ptr() if destination2 is not None else 0
	with info.lock:
	hostbuf.read_file_slice(file_obj, info.offset, info.size,
	offset=destination.data_ptr() - hostbuf.get_raw_address(),
	stream=stream_ptr,
	device_ptr=device_ptr,
	device=None if destination2 is None else destination2.device.index)
	return True

	if not hasattr(file_obj, "seek") or not hasattr(file_obj, "readinto"):
	return False

	buf_type = ctypes.c_ubyte * info.size
	view = memoryview(buf_type.from_address(destination.data_ptr()))

	try:
	with info.lock:
	file_obj.seek(info.offset)
	done = 0
	while done < info.size:
	try:
	n = file_obj.readinto(view[done:])
	except OSError:
	return False
	if n <= 0:
	return False
	done += n
	return True
	finally:
	view.release()

	class TensorGeometry(NamedTuple):
	shape: any
	dtype: torch.dtype

	def element_size(self):
	info = torch.finfo(self.dtype) if self.dtype.is_floating_point else torch.iinfo(self.dtype)
	return info.bits // 8

	def numel(self):
	return math.prod(self.shape)

	def tensors_to_geometries(tensors, dtype=None):
	geometries = []
	for t in tensors:
	if t is None or isinstance(t, QuantizedTensor):
	geometries.append(t)
	continue
	tdtype = t.dtype
	if hasattr(t, "_model_dtype"):
	tdtype = t._model_dtype
	if dtype is not None:
	tdtype = dtype
	geometries.append(TensorGeometry(shape=t.shape, dtype=tdtype))
	return geometries

	def vram_aligned_size(tensor):
	if isinstance(tensor, list):
	return sum([vram_aligned_size(t) for t in tensor])

	if isinstance(tensor, QuantizedTensor):
	inner_tensors, _ = tensor.__tensor_flatten__()
	return vram_aligned_size([ getattr(tensor, attr) for attr in inner_tensors ])

	if tensor is None:
	return 0

	size = tensor.numel() * tensor.element_size()
	aligment_req = 1024
	return (size + aligment_req - 1) // aligment_req * aligment_req

	def interpret_gathered_like(tensors, gathered):
	offset = 0
	dest_views = []

	if gathered.dim() != 1 or gathered.element_size() != 1:
	raise ValueError(f"Buffer must be 1D and single-byte (got {gathered.dim()}D {gathered.dtype})")

	for tensor in tensors:

	if tensor is None:
	dest_views.append(None)
	continue

	if isinstance(tensor, QuantizedTensor):
	inner_tensors, qt_ctx = tensor.__tensor_flatten__()
	templates = { attr: getattr(tensor, attr) for attr in inner_tensors }
	else:
	templates = { "data": tensor }

	actuals = {}
	for attr, template in templates.items():
	size = template.numel() * template.element_size()
	if offset + size > gathered.numel():
	raise ValueError(f"Buffer too small: needs {offset + size} bytes, but only has {gathered.numel()}. ")
	actuals[attr] = gathered[offset:offset+size].view(dtype=template.dtype).view(template.shape)
	offset += vram_aligned_size(template)

	if isinstance(tensor, QuantizedTensor):
	dest_views.append(QuantizedTensor.__tensor_unflatten__(actuals, qt_ctx, 0, 0))
	else:
	dest_views.append(actuals["data"])

	return dest_views

	aimdo_enabled = False

	extra_ram_release_callback = None
	RAM_CACHE_HEADROOM = 0

	def set_ram_cache_release_state(callback, headroom):
	global extra_ram_release_callback
	global RAM_CACHE_HEADROOM
	extra_ram_release_callback = callback
	RAM_CACHE_HEADROOM = max(0, int(headroom))

	def extra_ram_release(target, free_active=False):
	if extra_ram_release_callback is None:
	return 0
	return extra_ram_release_callback(target, free_active=free_active)