Instructions to use KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF",
	filename="kalm-reranker-v1-large-q8_0.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0
# Run inference directly in the terminal:
llama cli -hf KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0
# Run inference directly in the terminal:
llama cli -hf KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0

Use Docker

docker model run hf.co/KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0

LM Studio
Jan
Ollama
How to use KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF with Ollama:
```
ollama run hf.co/KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0
```

Unsloth Studio

How to use KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF with Docker Model Runner:
```
docker model run hf.co/KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0
```

Lemonade

How to use KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull KaLM-Embedding/KaLM-Reranker-V1-Large-Q8_0-GGUF:Q8_0

Run and chat with the model

lemonade run user.KaLM-Reranker-V1-Large-Q8_0-GGUF-Q8_0

List all available models

lemonade list

KaLM-Reranker-V1-Large-Q8_0-GGUF / llama.cpp /PATCHSET.json

cosyy

Publish documentation, examples, checksums, and patched llama.cpp runtime

3445619 verified about 15 hours ago

Raw

History Blame Contribute Delete

2.15 kB

	{
	"schema_version": 1,
	"upstream_repository": "https://github.com/ggml-org/llama.cpp",
	"upstream_commit": "277a105dc8f8643dab54331926a9830860a03292",
	"final_fork_commit": "8c099e4eb6c79e5d2587c8205ee9971564c740cc",
	"final_tree": "253695d8b0ca0723742c0109806a831a968cdffd",
	"executable": "llama-kalm-reranker",
	"patch_count": 7,
	"patches": [
	{
	"file": "0001-convert-add-T5Gemma2-GGUF-support.patch",
	"commit": "fbf7115aa7319a38bfd1c962d07ebe38956f613e",
	"size_bytes": 18302,
	"sha256": "3f30434975197738f6433036c5b734f8c8f2b84c5187781fd364507543fca032"
	},
	{
	"file": "0002-model-add-T5Gemma2-Stage-3-loader.patch",
	"commit": "3c53f96b63308547be8ac91a39a1c900c75c9fb5",
	"size_bytes": 32774,
	"sha256": "07787fa25d3db0fb3d86a11a3cd6c40c082e98724e8273c278b2a58a87c8aeba"
	},
	{
	"file": "0003-model-implement-T5Gemma2-reranker-forward.patch",
	"commit": "ec203e11edaed544a9e4ae24e54d97d612e6ebb1",
	"size_bytes": 59449,
	"sha256": "1a3457a842ec7a22581a49ce29b2b8005c7c7d9f17938bbf12be8a32d6127b09"
	},
	{
	"file": "0004-examples-add-T5Gemma2-reranker-scoring.patch",
	"commit": "7701ab09d6cd234a6e5b906b80802c609a9b62b8",
	"size_bytes": 24632,
	"sha256": "74f0ccf745bd013e8a3f9bf7882a1f2e4d7e83631172908894afa797071a3685"
	},
	{
	"file": "0005-model-preserve-T5Gemma2-BF16-semantics-for-quantized-weights.patch",
	"commit": "b8371b906a79d67fd6e2becbe21f81f3bb528962",
	"size_bytes": 1667,
	"sha256": "2ca6009df0d1d5b079b381f81a0b0cdbab5a2efacd036e21f12522fef3020a61"
	},
	{
	"file": "0006-examples-rename-T5Gemma2-reranker-CLI-to-KaLM-reranker.patch",
	"commit": "df645884260b1c327ecdd57cdc9840ee225eafd6",
	"size_bytes": 5181,
	"sha256": "9dd0474fe3a2a4740b14bb8215f796d66b569563c33a5c0db9e8e7f8a346162e"
	},
	{
	"file": "0007-model-support-KaLM-reranker-Small-and-Large.patch",
	"commit": "8c099e4eb6c79e5d2587c8205ee9971564c740cc",
	"size_bytes": 7509,
	"sha256": "a835e6040fca6e05d98c4671e4c59101eb6c8097cdc07ede1c1458341bf89e95"
	}
	]
	}