Instructions to use pthinc/prettybird_bce_basic_brain_mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pthinc/prettybird_bce_basic_brain_mini with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="pthinc/prettybird_bce_basic_brain_mini")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("pthinc/prettybird_bce_basic_brain_mini", dtype="auto")

llama-cpp-python

How to use pthinc/prettybird_bce_basic_brain_mini with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="pthinc/prettybird_bce_basic_brain_mini",
	filename="prettybird_bce_basic_brain_mini_fp16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use pthinc/prettybird_bce_basic_brain_mini with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf pthinc/prettybird_bce_basic_brain_mini:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf pthinc/prettybird_bce_basic_brain_mini:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf pthinc/prettybird_bce_basic_brain_mini:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf pthinc/prettybird_bce_basic_brain_mini:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf pthinc/prettybird_bce_basic_brain_mini:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf pthinc/prettybird_bce_basic_brain_mini:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf pthinc/prettybird_bce_basic_brain_mini:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf pthinc/prettybird_bce_basic_brain_mini:Q4_K_M

Use Docker

docker model run hf.co/pthinc/prettybird_bce_basic_brain_mini:Q4_K_M

LM Studio
Jan

vLLM

How to use pthinc/prettybird_bce_basic_brain_mini with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pthinc/prettybird_bce_basic_brain_mini"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pthinc/prettybird_bce_basic_brain_mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/pthinc/prettybird_bce_basic_brain_mini:Q4_K_M

SGLang

How to use pthinc/prettybird_bce_basic_brain_mini with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pthinc/prettybird_bce_basic_brain_mini" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pthinc/prettybird_bce_basic_brain_mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pthinc/prettybird_bce_basic_brain_mini" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pthinc/prettybird_bce_basic_brain_mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use pthinc/prettybird_bce_basic_brain_mini with Ollama:
```
ollama run hf.co/pthinc/prettybird_bce_basic_brain_mini:Q4_K_M
```

Unsloth Studio new

How to use pthinc/prettybird_bce_basic_brain_mini with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for pthinc/prettybird_bce_basic_brain_mini to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for pthinc/prettybird_bce_basic_brain_mini to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for pthinc/prettybird_bce_basic_brain_mini to start chatting

Pi new

How to use pthinc/prettybird_bce_basic_brain_mini with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf pthinc/prettybird_bce_basic_brain_mini:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "pthinc/prettybird_bce_basic_brain_mini:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use pthinc/prettybird_bce_basic_brain_mini with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf pthinc/prettybird_bce_basic_brain_mini:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default pthinc/prettybird_bce_basic_brain_mini:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use pthinc/prettybird_bce_basic_brain_mini with Docker Model Runner:
```
docker model run hf.co/pthinc/prettybird_bce_basic_brain_mini:Q4_K_M
```

Lemonade

How to use pthinc/prettybird_bce_basic_brain_mini with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull pthinc/prettybird_bce_basic_brain_mini:Q4_K_M

Run and chat with the model

lemonade run user.prettybird_bce_basic_brain_mini-Q4_K_M

List all available models

lemonade list

Prometech Computer Sciences Corp commited on Dec 19, 2025

Commit

2514f4f

verified ·

1 Parent(s): 85eb6c7

Update prettybird_brain.py

Browse files

Files changed (1) hide show

prettybird_brain.py +234 -109

prettybird_brain.py CHANGED Viewed

@@ -1,117 +1,242 @@
-import torch
 import json
 import re
-import os
-import transformers
-from transformers import LogitsProcessor, LogitsProcessorList, AutoModelForCausalLM, AutoTokenizer
-CONTROLLED_REASONING_CORE = "You are a helpful assistant with a Controlled Reasoning Core. Please reason step by step."
-class InterventionLogitsProcessor(LogitsProcessor):
-    def __init__(self, boost_token_id, boost_value=2.0):
-        self.boost_token_id = boost_token_id
-        self.boost_value = boost_value
-    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
-        scores[:, self.boost_token_id] += self.boost_value
-        return scores
-class prettybird_bce_basic_brain_mini:
-    def __init__(self, model_path="qwen_merged", device="cuda" if torch.cuda.is_available() else "cpu"):
-        self.device = device
-        print(f"Transformers version: {transformers.__version__}")
-        local_path = model_path
-        if not os.path.exists(local_path):
-            if os.path.exists(os.path.join("llama.cpp", model_path)):
-                local_path = os.path.join("llama.cpp", model_path)
-            elif os.path.exists("/content/qwen_merged"):
-                local_path = "/content/qwen_merged"
-        final_path = local_path if os.path.exists(local_path) else "Qwen/Qwen2.5-Math-1.5B-Instruct"
-        print(f"Loading model from {final_path}...")
         try:
-            # Attempt Native Load (trust_remote_code=False) first
-            self.tokenizer = AutoTokenizer.from_pretrained(final_path, trust_remote_code=False)
-            self.model = AutoModelForCausalLM.from_pretrained(
-                final_path,
-                device_map=device,
-                trust_remote_code=False,
-                torch_dtype=torch.float16
-            )
-            print("Loaded natively.")
-        except Exception as e:
-            print(f"Native load failed: {e}. Trying remote code...")
-            try:
-                self.tokenizer = AutoTokenizer.from_pretrained(final_path, trust_remote_code=True)
-                self.model = AutoModelForCausalLM.from_pretrained(
-                    final_path,
-                    device_map=device,
-                    trust_remote_code=True,
-                    torch_dtype=torch.float16
-                )
-                print("Loaded with remote code.")
-            except Exception as e2:
-                raise RuntimeError(f"Failed to load model: {e2}")
-        if self.tokenizer.pad_token is None:
-            self.tokenizer.pad_token = self.tokenizer.eos_token
-    def math_reward(self, response):
         score = 0.0
-        if re.search(r"\\boxed\{.*?\}", response):
-            score += 1.0
-        if len(response) > 50:
             score += 0.5
         return score
-    def parameter_editing(self, layer_idx=0, noise_scale=1e-5):
-        print(f"Editing parameters in layer {layer_idx}...")
-        try:
-            with torch.no_grad():
-                if hasattr(self.model, 'model'):
-                    layers = self.model.model.layers
-                else:
-                    layers = self.model.layers
-                weights = layers[layer_idx].self_attn.q_proj.weight
-                noise = torch.randn_like(weights) * noise_scale
-                weights.add_(noise)
-            print("Parameter editing complete.")
-        except Exception as e:
-            print(f"Error editing params: {e}")
-    def run_tool(self, tool_name, query):
-        if tool_name == "calculator":
-            try:
-                clean_query = re.sub(r"[^0-9+\-*/(). ]", "", query)
-                if not clean_query.strip(): return "Invalid"
-                return str(eval(clean_query))
-            except:
-                return "Error"
-        return "Unknown"
-    def generate_response(self, query, use_tool=False, use_intervention=False):
-        input_text = query
-        if use_tool or "calculate" in query.lower():
-            match = re.search(r"([\d\.\s\+\-\*\/\(\)]+)", query)
-            if match and len(match.group(1).strip()) > 3:
-                res = self.run_tool("calculator", match.group(1))
-                input_text += f"\nTool Result: {res}"
-        messages = [{"role": "system", "content": CONTROLLED_REASONING_CORE}, {"role": "user", "content": input_text}]
-        inputs = self.tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(self.device)
-        logits_processor = LogitsProcessorList()
-        if use_intervention:
-            print("Applying intervention...")
-            ids = self.tokenizer.encode("Therefore", add_special_tokens=False)
-            if ids:
-                logits_processor.append(InterventionLogitsProcessor(ids[0], 5.0))
-        outputs = self.model.generate(inputs, max_new_tokens=100, logits_processor=logits_processor, do_sample=True)
-        response = self.tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
-        print(f"Response: {response}")
-        print(f"Reward: {self.math_reward(response)}")
-        return response

+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+PrettyBird Skull Engine
+- GGUF = mathematical optimization brain (skull)
+- Bodies = interchangeable (text/image/audio/video/3D adapters)
+- Single-file, backend-clean, optimizer-compatible
+"""
 import json
 import re
+import ast
+import numpy as np
+from dataclasses import dataclass
+from typing import Any, Dict, List, Optional, Tuple
+from llama_cpp import Llama
+# ============================================================
+# 1) SYSTEM PROMPT (FINAL – kilitli)
+# ============================================================
+SYSTEM_PROMPT = """You are a controlled reasoning core operating as a mathematical optimization brain.
+You are NOT an autonomous agent.
+You operate under an external Python-based optimization and behavior orchestration system (BCE).
+Hard rules:
+- Output MUST be valid JSON.
+- Output MUST contain ONLY JSON.
+- Do NOT reveal chain-of-thought.
+- Use double quotes only.
+- Keep structure deterministic across revisions.
+If information is missing, list it in "needs".
+JSON CONTRACT:
+{
+  "version": "1.0",
+  "task": "",
+  "assumptions": [],
+  "needs": [],
+  "candidates": [
+    {
+      "id": "c1",
+      "solution": {},
+      "constraints": [
+        {"name": "", "status": "pass|fail|unknown", "note": ""}
+      ],
+      "objective_estimate": {"primary": 0.0, "notes": ""},
+      "rationale_summary": ""
+    }
+  ],
+  "revision_instructions": "If controller feedback arrives, edit only referenced fields and preserve all others exactly."
+}
+"""
+# ============================================================
+# 2) Güvenli mini-tool (opsiyonel, math destek)
+# ============================================================
+_ALLOWED_AST = {
+    ast.Expression, ast.BinOp, ast.UnaryOp, ast.Constant,
+    ast.Add, ast.Sub, ast.Mult, ast.Div, ast.Pow, ast.Mod,
+    ast.USub, ast.UAdd,
+}
+def safe_calc(expr: str) -> Optional[float]:
+    if not re.fullmatch(r"[0-9\.\s\+\-\*\/\(\)]+", expr):
+        return None
+    try:
+        tree = ast.parse(expr, mode="eval")
+        for n in ast.walk(tree):
+            if type(n) not in _ALLOWED_AST:
+                return None
+        return float(eval(compile(tree, "<calc>", "eval"), {"__builtins__": {}}))
+    except Exception:
+        return None
+# ============================================================
+# 3) Skull (GGUF Math Brain)
+# ============================================================
+@dataclass
+class Skull:
+    gguf_path: str
+    n_ctx: int = 8192
+    n_gpu_layers: int = 0
+    chat_format: str = "chatml"
+    verbose: bool = False
+    def __post_init__(self):
+        self.llm = Llama(
+            model_path=self.gguf_path,
+            n_ctx=self.n_ctx,
+            n_gpu_layers=self.n_gpu_layers,
+            chat_format=self.chat_format,
+            verbose=self.verbose,
+        )
+    def _parse_json(self, text: str) -> Dict[str, Any]:
+        t = text.strip()
         try:
+            return json.loads(t)
+        except json.JSONDecodeError:
+            s, e = t.find("{"), t.rfind("}")
+            if s != -1 and e != -1 and e > s:
+                return json.loads(t[s:e+1])
+            raise
+    def think(
+        self,
+        observation: Dict[str, Any],
+        temperature: float = 0.2,
+        top_p: float = 0.9,
+        max_tokens: int = 512,
+    ) -> Dict[str, Any]:
+        messages = [
+            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "user", "content": json.dumps(observation, ensure_ascii=False)},
+        ]
+        resp = self.llm.create_chat_completion(
+            messages=messages,
+            temperature=temperature,
+            top_p=top_p,
+            max_tokens=max_tokens,
+            response_format={"type": "json_object"},
+        )
+        content = resp["choices"][0]["message"]["content"]
+        return self._parse_json(content)
+# ============================================================
+# 4) Objective + Constraint (C)
+# ============================================================
+class ObjectiveEngine:
+    """
+    GGUF çıktısını tekrar değerlendiren deterministik katman.
+    """
+    def score(self, result: Dict[str, Any]) -> float:
         score = 0.0
+        # valid JSON already guaranteed
+        cands = result.get("candidates", [])
+        if not cands:
+            return -1e9
+        c = cands[0]
+        # constraint satisfaction
+        for con in c.get("constraints", []):
+            if con.get("status") == "pass":
+                score += 1.0
+            elif con.get("status") == "fail":
+                score -= 2.0
+        # model's own estimate
+        oe = c.get("objective_estimate", {})
+        if isinstance(oe.get("primary"), (int, float)):
+            score += float(oe["primary"])
+        # small structure bonus
+        if isinstance(c.get("solution"), dict):
             score += 0.5
         return score
+# ============================================================
+# 5) Body (örnek: text body)
+# ============================================================
+class TextBody:
+    def observe(self, text: str) -> Dict[str, Any]:
+        # İleride image/audio/video/3D body'ler aynı fonksiyonu sağlar
+        return {
+            "task": "optimization_request",
+            "body": "text",
+            "input": text,
+        }
+# ============================================================
+# 6) Orchestrator (brain loop)
+# ============================================================
+class BrainSystem:
+    def __init__(self, skull: Skull, body: Any):
+        self.skull = skull
+        self.body = body
+        self.objective = ObjectiveEngine()
+    def run(self, raw_input: Any, rounds: int = 2) -> Dict[str, Any]:
+        obs = self.body.observe(raw_input)
+        best = None
+        best_score = -1e18
+        for r in range(rounds):
+            result = self.skull.think(obs)
+            score = self.objective.score(result)
+            if score > best_score:
+                best = result
+                best_score = score
+            # revise loop (hafif)
+            if result.get("needs"):
+                obs["_feedback"] = {
+                    "issue": "missing_data",
+                    "needs": result["needs"],
+                }
+        return {
+            "best_score": best_score,
+            "decision": best,
+        }
+# ============================================================
+# 7) Demo
+# ============================================================
+if __name__ == "__main__":
+    skull = Skull(
+        gguf_path="prettybird_bce_basic_brain_mini_q4_k_m.gguf",
+        n_ctx=8192,
+        n_gpu_layers=0,
+        chat_format="chatml",
+    )
+    body = TextBody()
+    brain = BrainSystem(skull, body)
+    output = brain.run(
+        "5 işi 2 makineye ata ve makespan minimize et. Süreler: [3,5,2,6,4].",
+        rounds=2,
+    )
+    print(json.dumps(output, ensure_ascii=False, indent=2))