Instructions to use kofdai/nullai-knowledge-system with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kofdai/nullai-knowledge-system with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kofdai/nullai-knowledge-system")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kofdai/nullai-knowledge-system")
model = AutoModelForCausalLM.from_pretrained("kofdai/nullai-knowledge-system")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use kofdai/nullai-knowledge-system with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="kofdai/nullai-knowledge-system",
	filename="phi-4-q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use kofdai/nullai-knowledge-system with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf kofdai/nullai-knowledge-system:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf kofdai/nullai-knowledge-system:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf kofdai/nullai-knowledge-system:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf kofdai/nullai-knowledge-system:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf kofdai/nullai-knowledge-system:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf kofdai/nullai-knowledge-system:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf kofdai/nullai-knowledge-system:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf kofdai/nullai-knowledge-system:Q4_K_M

Use Docker

docker model run hf.co/kofdai/nullai-knowledge-system:Q4_K_M

LM Studio
Jan

vLLM

How to use kofdai/nullai-knowledge-system with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kofdai/nullai-knowledge-system"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kofdai/nullai-knowledge-system",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kofdai/nullai-knowledge-system:Q4_K_M

SGLang

How to use kofdai/nullai-knowledge-system with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kofdai/nullai-knowledge-system" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kofdai/nullai-knowledge-system",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kofdai/nullai-knowledge-system" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kofdai/nullai-knowledge-system",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use kofdai/nullai-knowledge-system with Ollama:
```
ollama run hf.co/kofdai/nullai-knowledge-system:Q4_K_M
```

Unsloth Studio

How to use kofdai/nullai-knowledge-system with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kofdai/nullai-knowledge-system to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kofdai/nullai-knowledge-system to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for kofdai/nullai-knowledge-system to start chatting

Docker Model Runner
How to use kofdai/nullai-knowledge-system with Docker Model Runner:
```
docker model run hf.co/kofdai/nullai-knowledge-system:Q4_K_M
```

Lemonade

How to use kofdai/nullai-knowledge-system with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull kofdai/nullai-knowledge-system:Q4_K_M

Run and chat with the model

lemonade run user.nullai-knowledge-system-Q4_K_M

List all available models

lemonade list

kofdai commited on Dec 16, 2025

Commit

a5ac882

verified ·

1 Parent(s): ad74a4b

Upload iath_encoder.py with huggingface_hub

Browse files

Files changed (1) hide show

iath_encoder.py +168 -0

iath_encoder.py ADDED Viewed

	@@ -0,0 +1,168 @@

+import struct
+import zstandard as zstd
+from datetime import datetime
+import json # これを追加
+class IathEncoder:
+    """
+    Knowledge Tileオブジェクトを.iath互換の圧縮バイナリにエンコードします。
+    """
+    def _encode_reviewer_reference(self, reviewer: dict) -> bytes:
+        """
+        レビュアー情報をエンコードします。
+        当面はダミー実装とし、レビュアーIDを固定長で返します。
+        将来的にはVerifier Dictionaryを参照するインデックスを返す必要があります。
+        """
+        reviewer_id = reviewer.get("reviewer_id", "unknown").encode('utf-8')
+        return struct.pack("<36s", reviewer_id[:36]) # UUID string length
+    def _encode_string(self, s: str) -> bytes:
+        """NULL終端のUTF-8文字列をエンコードします。"""
+        return s.encode('utf-8') + b'\0'
+    def _encode_metadata(self, metadata: dict) -> bytes:
+        """メタデータをバイナリ化します。"""
+        kid = self._encode_string(metadata["knowledge_id"])
+        topic = self._encode_string(metadata["topic"])
+        created_at_iso = metadata.get("created_at", datetime.now().isoformat())
+        created_at = created_at_iso.encode('ascii')[:27] # ISO format with Z
+        return kid + topic + created_at
+    def _encode_coordinates(self, coordinates: dict) -> bytes:
+        """座標をバイナリ化（6つの浮動小数点数）。"""
+        medical_space = coordinates["medical_space"]
+        meta_space = coordinates["meta_space"]
+        return struct.pack(
+            "<ffffff",
+            float(medical_space[0]), float(medical_space[1]), float(medical_space[2]),
+            float(meta_space[0]), float(meta_space[1]), float(meta_space[2])
+        )
+    def _encode_content(self, content: dict) -> bytes:
+        """コンテンツ（テキスト）をバイナリ化します。"""
+        thinking = content["thinking_process"].encode('utf-8')
+        response = content["final_response"].encode('utf-8')
+        # 各パートの長さを前に付けて連結
+        result = struct.pack("<I", len(thinking)) + thinking
+        result += struct.pack("<I", len(response)) + response
+        return result
+    def _encode_verification(self, verification: dict) -> bytes:
+        """検証履歴をバイナリ化します。"""
+        status_map = {
+            "pending_review": 0, "partial_verified": 1,
+            "verified": 2, "expert_confirmed": 3
+        }
+        status_code = status_map.get(verification.get("status", "pending_review"), 0)
+        initial_certainty = int(verification.get("initial_certainty", 0))
+        reviewer_count = len(verification.get("reviewers", []))
+        result = struct.pack("<BBI", status_code, initial_certainty, reviewer_count)
+        for reviewer in verification.get("reviewers", []):
+            result += self._encode_reviewer_reference(reviewer)
+        return result
+    def encode_tile(self, tile: dict) -> bytes:
+        """
+        単一のKnowledge Tileをエンコードし、zstdで圧縮します。
+        Args:
+            tile (dict): Knowledge Tileオブジェクト。
+        Returns:
+            bytes: 圧縮されたバイナリデータ。
+        """
+        # 各セクションをエンコード
+        metadata_bin = self._encode_metadata(tile["metadata"])
+        coord_bin = self._encode_coordinates(tile["coordinates"])
+        content_bin = self._encode_content(tile["content"])
+        verification_bin = self._encode_verification(tile["verification"])
+        # NOTE: reasoning_path, source, historyなどは今回省略し、主要な部分のみ実装
+        # 長さプレフィックスを付けて連結
+        uncompressed = b"".join([
+            struct.pack("<I", len(metadata_bin)), metadata_bin,
+            struct.pack("<I", len(coord_bin)), coord_bin,
+            struct.pack("<I", len(content_bin)), content_bin,
+            struct.pack("<I", len(verification_bin)), verification_bin,
+        ])
+        # zstdで圧縮
+        cctx = zstd.ZstdCompressor(level=19)
+        compressed = cctx.compress(uncompressed)
+        return compressed
+    def encode_batch(self, tiles: List[Dict], domain_code: int = 1) -> bytes:
+        """
+        複数の知識タイルを受け取り、完全な.iathデータベースファイルのバイナリを生成します。
+        Args:
+            tiles (List[Dict]): エンコードする知識タイルの辞書のリスト。
+            domain_code (int): ヘッダーに書き込むドメインコード (1: medical, 2: legal, etc.)。
+        Returns:
+            bytes: 完全な.iathファイルのバイナリコンテンツ。
+        """
+        print(f"--- {len(tiles)}件のタイルのバッチエンコード開始 (ドメインコード: {domain_code}) ---")
+        index = []
+        data_chunks = []
+        current_offset = 0
+        # 1. 各タイルを個別にエンコードし、データチャンクとインデックスを作成
+        for tile in tiles:
+            tile_id = tile.get("metadata", {}).get("knowledge_id")
+            if not tile_id:
+                print("警告: knowledge_idのないタイルをスキップします。")
+                continue
+            compressed_data = self.encode_tile(tile)
+            data_length = len(compressed_data)
+            index.append({"id": tile_id, "offset": current_offset, "length": data_length})
+            data_chunks.append(compressed_data)
+            current_offset += data_length
+        print("  - 全タイルの個別エンコード完了。")
+        # 2. インデックスセクションをシリアライズ
+        index_binary = json.dumps(index, ensure_ascii=False).encode('utf-8')
+        print(f"  - インデックス作成完了 (サイズ: {len(index_binary)} bytes)")
+        # 3. データセクションを結合
+        data_section = b"".join(data_chunks)
+        # 4. ヘッダーを作成
+        header_size = 64
+        index_offset = header_size
+        data_offset = index_offset + len(index_binary)
+        checksum = b'\0' * 32
+        header = struct.pack(
+            "<4sIBB32sQQ6x",
+            b'ILMA',      # Magic number
+            1,           # Version
+            domain_code, # ドメインコードを引数から設定
+            1,           # Compression Type (0x01=zstd)
+            checksum,
+            index_offset,
+            data_offset
+        )
+        print("  - ヘッダー作成完了。")
+        # 5. すべてのセクションを結合
+        full_db_content = header + index_binary + data_section
+        print("--- バッチエンコード完了 ---")
+        return full_db_content