Upload Kaiju Coder 7 OpenCode helper package

Browse files

Files changed (8) hide show

.opencode/agents/kaiju-coder-7.md +12 -0
PUBLIC_TESTING_QUICKSTART.md +24 -7
README.md +22 -4
opencode.kaiju-coder-7.jsonc +1 -1
scripts/check_hf_uploaded_release.py +1 -1
scripts/kaiju_opencode_fast_proxy.py +234 -0
scripts/run_kaiju_public_demo_pack.py +193 -0
scripts/run_kaiju_public_opencode_smoke.py +1 -1

.opencode/agents/kaiju-coder-7.md CHANGED Viewed

@@ -39,15 +39,27 @@ permission:
 You are Kaiju Coder 7, a local coding model for business owners and practical product builders.
 Keep responses short while working. Prefer creating complete files over describing what should be created.
 Rules:
 - Confirm the current working directory with `pwd` before writing files.
 - Write artifacts into the requested project folder only.
 - Use relative paths for write/edit tool calls. Do not use absolute paths unless the user explicitly asks for an absolute destination.
 - For multi-file tasks, create every requested file before summarizing.
 - After `pwd`, write the first requested file immediately. Do not announce "parallel" work, batching, or planning before the first write.
 - Create files sequentially with write/edit tool calls; do not wait to draft all files in the chat response.
 - Do not say a file exists unless you wrote it or read it from disk.
 - Do not ask the user to finish setup that you can do locally.
 - Do not invent secrets, API keys, private client data, payments, or live integrations.

 You are Kaiju Coder 7, a local coding model for business owners and practical product builders.
 Keep responses short while working. Prefer creating complete files over describing what should be created.
+For general chat, identity, capability, or "what can you do" questions, answer in 45 words or less unless the user asks for detail.
+Public identity:
+- Present yourself as "Kaiju Coder 7" with spaces, not "Kaiju-Coder-7" unless referring to the OpenCode agent id.
+- Say you are built for local-first business-owner build work: websites, booking/payment flows, intake/CRM systems, proposals, SOPs, dashboards, automations, and practical repo fixes.
+- When asked why someone should use you, answer like a product that is ready for serious testing: concrete, confident, and specific. Avoid generic phrases such as "no-fluff coding assistant" unless the user uses that wording first.
+- Do not claim frontier-model superiority. Say the strength is practical execution inside a project folder with OpenCode tools, privacy-friendly local or private-network serving, and business artifacts an owner can inspect.
+- Do not say customer data "never leaves your computer" unless the model is actually running on the same machine. For the default Gojira/Tailscale setup, say data stays inside the owner's controlled local/private runtime.
+- Do not imply you can browse, access accounts, send emails, process payments, or use live integrations unless the available tools and user approval make that true.
 Rules:
 - Confirm the current working directory with `pwd` before writing files.
 - Write artifacts into the requested project folder only.
 - Use relative paths for write/edit tool calls. Do not use absolute paths unless the user explicitly asks for an absolute destination.
+- For complete website, landing-page, or owner business-pack tasks, use the Kaiju router/harness if `scripts/run_kaiju_router.py` is available in the current repo. Run it first, then report the generated artifact path and checks instead of hand-streaming a large HTML file.
 - For multi-file tasks, create every requested file before summarizing.
 - After `pwd`, write the first requested file immediately. Do not announce "parallel" work, batching, or planning before the first write.
 - Create files sequentially with write/edit tool calls; do not wait to draft all files in the chat response.
+- When a file must contain exact text, write only the requested bytes. Do not include XML/tool wrapper markers such as `<content>`, `</content>`, `<file>`, or markdown fences in the file.
 - Do not say a file exists unless you wrote it or read it from disk.
 - Do not ask the user to finish setup that you can do locally.
 - Do not invent secrets, API keys, private client data, payments, or live integrations.

PUBLIC_TESTING_QUICKSTART.md CHANGED Viewed

@@ -19,7 +19,7 @@ Use this if you already have Kaiju Coder 7 served at an OpenAI-compatible
 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
 cd kaiju-coder-7-opencode
-python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18083/v1
 ```
 Then run OpenCode inside the project you want to edit:
@@ -65,23 +65,31 @@ the server to expose:
 ```text
 model id: kaiju-coder-7
-base URL: http://127.0.0.1:18083/v1
 context: 16384
 ```
 Then install the OpenCode helper with:
 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
 cd kaiju-coder-7-opencode
-python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18083/v1
 ```
 ### Path 3: Runtime-Quantized Local Candidate
 Use this only if you are comfortable with advanced serving setups. The current
-working quantized option is a runtime bitsandbytes recipe, not a separate
-persisted quantized weights repo.
 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
@@ -115,9 +123,12 @@ Expected result:
 - Public model id: `kaiju-coder-7`
 - OpenCode context: `16384`
 - Output cap for public testing: `2500`
 - Current reliable product path: model plus deterministic business-owner
-  harness plus verifier
-- Raw multi-file OpenCode generation: still too slow for broad paid API claims
 - Paid API: not public until launch preflight passes
 ## What Not To Claim Yet
@@ -134,15 +145,21 @@ Do claim:
 - Kaiju Coder 7 has a working local/OpenCode release candidate
 - the current tested OpenCode default is 16k context
 - the helper package includes a lean agent and compaction loop guard
 - the paid API scaffold has tests and a launch preflight, but is not yet public
 - the packaged public smoke verifies a fresh OpenCode one-file write before
   public claims are refreshed
 ## Current Blockers Before Public Release
 - Hugging Face repo creation still requires a write-capable token or namespace.
 - Full merged model upload has not completed; the merged folder must first have
   the metadata packet synced by `prepare_hf_merged_model_metadata.sh`.
 - Public paid API launch needs real Cloudflare D1/KV/R2 bindings, Wrangler
   secret verification, Stripe webhook staging evidence, staging traffic, latency
   evidence, and rollback proof.

 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
 cd kaiju-coder-7-opencode
+python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
 ```
 Then run OpenCode inside the project you want to edit:
 ```text
 model id: kaiju-coder-7
+base URL: http://127.0.0.1:18084/v1
 context: 16384
 ```
+For the fastest OpenCode behavior, run the bundled fast proxy in a separate
+terminal and point OpenCode at the proxy:
+```bash
+KAIJU_OPENAI_BASE_URL=http://127.0.0.1:18084/v1 \
+python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
+```
 Then install the OpenCode helper with:
 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-opencode
 cd kaiju-coder-7-opencode
+python3 scripts/install_kaiju_opencode_profile.py --base-url http://127.0.0.1:18181/v1
 ```
 ### Path 3: Runtime-Quantized Local Candidate
 Use this only if you are comfortable with advanced serving setups. The current
+working quantized option is a runtime bitsandbytes recipe. A Q8_0 GGUF artifact
+has been converted, but it is still a candidate until runtime smoke passes.
 ```bash
 git clone https://huggingface.co/RMDWLLC/kaiju-coder-7-quantized-runtime
 - Public model id: `kaiju-coder-7`
 - OpenCode context: `16384`
 - Output cap for public testing: `2500`
+- Fast OpenCode path: vLLM bitsandbytes runtime behind the Kaiju fast proxy
 - Current reliable product path: model plus deterministic business-owner
+  harness/router plus verifier
+- Raw multi-file OpenCode generation: still too slow for broad paid claims;
+  useful for testing, but paid API claims should favor harnessed product
+  workflows until broader latency gates pass
 - Paid API: not public until launch preflight passes
 ## What Not To Claim Yet
 - Kaiju Coder 7 has a working local/OpenCode release candidate
 - the current tested OpenCode default is 16k context
 - the helper package includes a lean agent and compaction loop guard
+- the fast proxy keeps OpenCode tool calls intact while forcing bounded,
+  non-thinking generation
 - the paid API scaffold has tests and a launch preflight, but is not yet public
 - the packaged public smoke verifies a fresh OpenCode one-file write before
   public claims are refreshed
+- a GGUF Q8_0 candidate exists, but is not public quantized-weights release
+  evidence until runtime smoke passes
 ## Current Blockers Before Public Release
 - Hugging Face repo creation still requires a write-capable token or namespace.
 - Full merged model upload has not completed; the merged folder must first have
   the metadata packet synced by `prepare_hf_merged_model_metadata.sh`.
+- The GGUF Q8_0 candidate still needs a runtime smoke before public
+  quantized-weights upload.
 - Public paid API launch needs real Cloudflare D1/KV/R2 bindings, Wrangler
   secret verification, Stripe webhook staging evidence, staging traffic, latency
   evidence, and rollback proof.

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ the absolute path where you copied `kaiju-no-autocontinue.mjs`:
       "npm": "@ai-sdk/openai-compatible",
       "name": "Kaiju Coder",
       "options": {
-        "baseURL": "http://100.109.109.14:18083/v1",
         "apiKey": "not-needed",
         "timeout": 900000,
         "chunkTimeout": 120000
@@ -95,9 +95,27 @@ file/output facts into the summary.
 - Model id: `kaiju-coder-7`
 - Endpoint shape: OpenAI-compatible `/v1/chat/completions`
-- Current Gojira-B restored default: 16,384 context
-- Tested high-context target: 32,768 context
-- Serving path: merged full model through SGLang
 - OpenCode guard: lean agent plus scoped no-autocontinue plugin
 - Product caveat: raw generation is useful but slow; paid workflows should use
   deterministic harnesses and verifiers until broader raw-model gates pass.

       "npm": "@ai-sdk/openai-compatible",
       "name": "Kaiju Coder",
       "options": {
+        "baseURL": "http://127.0.0.1:18181/v1",
         "apiKey": "not-needed",
         "timeout": 900000,
         "chunkTimeout": 120000
 - Model id: `kaiju-coder-7`
 - Endpoint shape: OpenAI-compatible `/v1/chat/completions`
+- Fast OpenCode base URL: `http://127.0.0.1:18181/v1`
+- Fast proxy upstream for Richard's current setup: vLLM bitsandbytes on Gojira-B port `18084`
+- Current tested context: 16,384
+- Tested high-context target: 32,768, but not the current fast default
+- Serving path for speed testing: merged full model through vLLM runtime bitsandbytes
 - OpenCode guard: lean agent plus scoped no-autocontinue plugin
 - Product caveat: raw generation is useful but slow; paid workflows should use
   deterministic harnesses and verifiers until broader raw-model gates pass.
+## Fast Proxy
+The helper bundle includes `scripts/kaiju_opencode_fast_proxy.py`. It preserves
+OpenCode tool-call streaming while forcing the fast model settings Kaiju needs:
+`thinking=false`, model id `kaiju-coder-7`, and bounded output budgets.
+Run it in one terminal, then point OpenCode at `http://127.0.0.1:18181/v1`:
+```bash
+KAIJU_OPENAI_BASE_URL=http://127.0.0.1:18084/v1 \
+python3 scripts/kaiju_opencode_fast_proxy.py --host 127.0.0.1 --port 18181
+```
+If your vLLM server is remote, set `KAIJU_OPENAI_BASE_URL` to that remote
+OpenAI-compatible `/v1` endpoint instead.

opencode.kaiju-coder-7.jsonc CHANGED Viewed

@@ -5,7 +5,7 @@
       "npm": "@ai-sdk/openai-compatible",
       "name": "Kaiju Coder",
       "options": {
-        "baseURL": "http://100.109.109.14:18083/v1",
         "apiKey": "not-needed",
         "timeout": 900000,
         "chunkTimeout": 120000

       "npm": "@ai-sdk/openai-compatible",
       "name": "Kaiju Coder",
       "options": {
+        "baseURL": "http://127.0.0.1:18181/v1",
         "apiKey": "not-needed",
         "timeout": 900000,
         "chunkTimeout": 120000

scripts/check_hf_uploaded_release.py CHANGED Viewed

@@ -24,7 +24,7 @@ from typing import Any
 MODEL_ID = "kaiju-coder-7"
 DEFAULT_NAMESPACE = "RMDWLLC"
-DEFAULT_BASE_URL = "http://100.109.109.14:18083/v1"
 @dataclass(frozen=True)

 MODEL_ID = "kaiju-coder-7"
 DEFAULT_NAMESPACE = "RMDWLLC"
+DEFAULT_BASE_URL = "http://127.0.0.1:18181/v1"
 @dataclass(frozen=True)

scripts/kaiju_opencode_fast_proxy.py ADDED Viewed

	@@ -0,0 +1,234 @@

+#!/usr/bin/env python3
+"""Tool-safe OpenAI-compatible fast proxy for Kaiju Coder 7 OpenCode.
+The normal Gojira gateway is product/API oriented and aggregates content. OpenCode
+needs raw tool-call chunks preserved, so this proxy only patches serving knobs
+and then passes upstream responses through unchanged.
+"""
+from __future__ import annotations
+import argparse
+import json
+import os
+import time
+import urllib.error
+import urllib.request
+from http import HTTPStatus
+from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
+from typing import Any
+DEFAULT_HOST = "127.0.0.1"
+DEFAULT_PORT = int(os.environ.get("KAIJU_OPENCODE_FAST_PROXY_PORT", "18181"))
+UPSTREAM_BASE_URL = os.environ.get("KAIJU_OPENAI_BASE_URL", "http://100.109.109.14:18084/v1")
+DEFAULT_MODEL = os.environ.get("KAIJU_DEFAULT_MODEL", "kaiju-coder-7")
+API_KEY = os.environ.get("KAIJU_OPENAI_API_KEY", "")
+NORMAL_MAX_TOKENS = int(os.environ.get("KAIJU_NORMAL_MAX_TOKENS", "384"))
+WORK_MAX_TOKENS = int(os.environ.get("KAIJU_WORK_MAX_TOKENS", "1536"))
+ARTIFACT_MAX_TOKENS = int(os.environ.get("KAIJU_ARTIFACT_MAX_TOKENS", "4096"))
+MAX_REQUEST_BYTES = int(os.environ.get("KAIJU_MAX_REQUEST_BYTES", "2097152"))
+def normalize_messages(messages: Any) -> list[dict[str, Any]]:
+    if not isinstance(messages, list):
+        return []
+    return [message for message in messages if isinstance(message, dict)]
+def message_text(messages: list[dict[str, Any]]) -> str:
+    parts: list[str] = []
+    for message in messages:
+        content = message.get("content", "")
+        if isinstance(content, str):
+            parts.append(content)
+        else:
+            parts.append(json.dumps(content, ensure_ascii=False))
+    return "\n".join(parts).lower()
+def classify_job(messages: list[dict[str, Any]]) -> str:
+    text = message_text(messages)
+    artifact_terms = (
+        "complete html",
+        "html file",
+        "one-file website",
+        "landing page",
+        "build a website",
+        "make a website",
+        "full file",
+    )
+    work_terms = (
+        "create ",
+        "write ",
+        "edit ",
+        "implement",
+        "debug",
+        "fix",
+        "refactor",
+        "test",
+        "repo",
+        "file",
+    )
+    if any(term in text for term in artifact_terms):
+        return "artifact"
+    if any(term in text for term in work_terms):
+        return "work"
+    return "normal"
+def target_tokens(job_class: str) -> int:
+    if job_class == "artifact":
+        return ARTIFACT_MAX_TOKENS
+    if job_class == "work":
+        return WORK_MAX_TOKENS
+    return NORMAL_MAX_TOKENS
+def patch_chat_payload(body: dict[str, Any]) -> dict[str, Any]:
+    patched = dict(body)
+    patched["model"] = DEFAULT_MODEL
+    messages = normalize_messages(patched.get("messages"))
+    job_class = classify_job(messages)
+    patched["max_tokens"] = target_tokens(job_class)
+    patched["chat_template_kwargs"] = {
+        **(patched.get("chat_template_kwargs") if isinstance(patched.get("chat_template_kwargs"), dict) else {}),
+        "enable_thinking": False,
+        "thinking": False,
+    }
+    return patched
+class Handler(BaseHTTPRequestHandler):
+    server_version = "KaijuOpenCodeFastProxy/0.1"
+    def log_message(self, fmt: str, *args: Any) -> None:
+        print(f"{time.strftime('%Y-%m-%d %H:%M:%S')} {self.address_string()} - {fmt % args}", flush=True)
+    def _json(self, status: int, payload: dict[str, Any]) -> None:
+        data = json.dumps(payload).encode("utf-8")
+        self.send_response(status)
+        self.send_header("content-type", "application/json; charset=utf-8")
+        self.send_header("cache-control", "no-store")
+        self.send_header("content-length", str(len(data)))
+        self.end_headers()
+        self.wfile.write(data)
+    def _read_json(self) -> dict[str, Any]:
+        length = int(self.headers.get("content-length", "0"))
+        if length > MAX_REQUEST_BYTES:
+            raise ValueError("request body too large")
+        raw = self.rfile.read(length)
+        if not raw:
+            return {}
+        value = json.loads(raw.decode("utf-8"))
+        if not isinstance(value, dict):
+            raise ValueError("request body must be a JSON object")
+        return value
+    def do_GET(self) -> None:  # noqa: N802 - BaseHTTPRequestHandler API.
+        if self.path == "/health":
+            self._json(
+                HTTPStatus.OK,
+                {
+                    "ok": True,
+                    "model": DEFAULT_MODEL,
+                    "upstream": UPSTREAM_BASE_URL,
+                    "normal_max_tokens": NORMAL_MAX_TOKENS,
+                    "work_max_tokens": WORK_MAX_TOKENS,
+                    "artifact_max_tokens": ARTIFACT_MAX_TOKENS,
+                },
+            )
+            return
+        if self.path == "/v1/models":
+            self._forward_get("/models")
+            return
+        self._json(HTTPStatus.NOT_FOUND, {"error": {"message": "Not found", "type": "not_found"}})
+    def do_POST(self) -> None:  # noqa: N802 - BaseHTTPRequestHandler API.
+        if self.path != "/v1/chat/completions":
+            self._json(HTTPStatus.NOT_FOUND, {"error": {"message": "Not found", "type": "not_found"}})
+            return
+        try:
+            body = patch_chat_payload(self._read_json())
+        except Exception as error:  # noqa: BLE001 - return request parse failures.
+            self._json(HTTPStatus.BAD_REQUEST, {"error": {"message": str(error), "type": "bad_request"}})
+            return
+        self._forward_post("/chat/completions", body)
+    def _headers(self) -> dict[str, str]:
+        headers = {"content-type": "application/json"}
+        if API_KEY:
+            headers["authorization"] = f"Bearer {API_KEY}"
+        return headers
+    def _forward_get(self, suffix: str) -> None:
+        request = urllib.request.Request(
+            f"{UPSTREAM_BASE_URL.rstrip('/')}{suffix}",
+            headers=self._headers(),
+            method="GET",
+        )
+        try:
+            with urllib.request.urlopen(request, timeout=30) as upstream:
+                data = upstream.read()
+                self.send_response(upstream.status)
+                self.send_header("content-type", upstream.headers.get("content-type", "application/json"))
+                self.send_header("cache-control", "no-store")
+                self.send_header("content-length", str(len(data)))
+                self.end_headers()
+                self.wfile.write(data)
+        except urllib.error.HTTPError as error:
+            self._json(error.code, {"error": {"message": error.read().decode("utf-8", errors="replace")[:500]}})
+        except Exception as error:  # noqa: BLE001 - proxy health should surface upstream failures.
+            self._json(HTTPStatus.BAD_GATEWAY, {"error": {"message": str(error), "type": "upstream_error"}})
+    def _forward_post(self, suffix: str, body: dict[str, Any]) -> None:
+        data = json.dumps(body).encode("utf-8")
+        request = urllib.request.Request(
+            f"{UPSTREAM_BASE_URL.rstrip('/')}{suffix}",
+            data=data,
+            headers=self._headers(),
+            method="POST",
+        )
+        try:
+            timeout = 1200 if classify_job(normalize_messages(body.get("messages"))) == "artifact" else 600
+            with urllib.request.urlopen(request, timeout=timeout) as upstream:
+                content_type = upstream.headers.get("content-type", "application/json")
+                if body.get("stream") is True:
+                    self.send_response(upstream.status)
+                    self.send_header("content-type", content_type)
+                    self.send_header("cache-control", "no-store, no-transform")
+                    self.send_header("connection", "close")
+                    self.end_headers()
+                    for chunk in upstream:
+                        self.wfile.write(chunk)
+                        self.wfile.flush()
+                    return
+                response = upstream.read()
+                self.send_response(upstream.status)
+                self.send_header("content-type", content_type)
+                self.send_header("cache-control", "no-store")
+                self.send_header("content-length", str(len(response)))
+                self.end_headers()
+                self.wfile.write(response)
+        except urllib.error.HTTPError as error:
+            detail = error.read().decode("utf-8", errors="replace")[:500]
+            self._json(error.code, {"error": {"message": detail, "type": "upstream_error"}})
+        except Exception as error:  # noqa: BLE001 - proxy should report upstream failures.
+            self._json(HTTPStatus.BAD_GATEWAY, {"error": {"message": str(error), "type": "upstream_error"}})
+def main() -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--host", default=DEFAULT_HOST)
+    parser.add_argument("--port", type=int, default=DEFAULT_PORT)
+    args = parser.parse_args()
+    server = ThreadingHTTPServer((args.host, args.port), Handler)
+    print(f"Kaiju OpenCode fast proxy listening on http://{args.host}:{args.port}", flush=True)
+    print(f"Upstream: {UPSTREAM_BASE_URL}", flush=True)
+    server.serve_forever()
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

scripts/run_kaiju_public_demo_pack.py ADDED Viewed

	@@ -0,0 +1,193 @@

+#!/usr/bin/env python3
+"""Run the public Kaiju Coder 7 business-owner demo pack.
+The demo pack exercises the release path customers will actually use: a compact
+model-planned prompt where useful, deterministic harness rendering, and static
+verification before any public claim is refreshed.
+"""
+from __future__ import annotations
+import argparse
+import datetime as dt
+import json
+import sys
+import time
+from dataclasses import asdict, dataclass
+from pathlib import Path
+ROOT = Path(__file__).resolve().parents[1]
+sys.path.insert(0, str(ROOT))
+from kaiju_harness.router import result_to_json, run_task
+@dataclass
+class DemoTask:
+    task_id: str
+    kind: str
+    prompt: str
+@dataclass
+class DemoResult:
+    task_id: str
+    kind: str
+    seconds: float
+    task_type: str
+    artifact_type: str
+    artifact_path: str | None
+    project_dir: str | None
+    changed_files: int
+    errors: list[str]
+DEMO_TASKS = [
+    DemoTask(
+        task_id="service-website",
+        kind="website",
+        prompt=(
+            "Build a premium one-page website for Harborline Bookkeeping in "
+            "Savannah. Include trust-focused copy, clear services, pricing "
+            "signals, FAQ, and the CTA Book a Cleanup Call."
+        ),
+    ),
+    DemoTask(
+        task_id="owner-ai-company-pack",
+        kind="business_suite",
+        prompt=(
+            "Build the owner-ready AI company operating pack for Harborline "
+            "Bookkeeping with launch kit, connector pack, intake CRM, "
+            "reporting agent, lead generator, sales closer, ROI dashboard, "
+            "operator training, and teach-once Workshop handoff."
+        ),
+    ),
+    DemoTask(
+        task_id="stripe-safety-plan",
+        kind="business_document",
+        prompt=(
+            "Write a practical Stripe checkout and webhook safety plan for a "
+            "local service business selling paid AI setup calls. Include key "
+            "states, failure handling, refund/debit rules, and launch checks."
+        ),
+    ),
+    DemoTask(
+        task_id="csv-parser",
+        kind="coding",
+        prompt=(
+            "Write a safe Node.js CSV parser utility for business-owner lead "
+            "imports. Include validation rules, typed output shape, example "
+            "usage, and a small test plan."
+        ),
+    ),
+]
+def utc_stamp() -> str:
+    return dt.datetime.now(dt.UTC).strftime("%Y%m%dT%H%M%SZ")
+def write_summary(run_dir: Path, results: list[DemoResult], manifests: list[dict]) -> None:
+    payload = {
+        "product": "Kaiju Coder 7",
+        "model_id": "kaiju-coder-7",
+        "created_at": utc_stamp(),
+        "summary": {
+            "tasks": len(results),
+            "passed": sum(1 for result in results if not result.errors),
+            "failed": sum(1 for result in results if result.errors),
+            "total_seconds": round(sum(result.seconds for result in results), 3),
+        },
+        "results": [asdict(result) for result in results],
+        "manifests": manifests,
+    }
+    (run_dir / "results.json").write_text(json.dumps(payload, indent=2) + "\n", encoding="utf-8")
+    lines = [
+        "# Kaiju Coder 7 Public Demo Pack",
+        "",
+        f"- Run dir: `{run_dir}`",
+        f"- Tasks: `{payload['summary']['tasks']}`",
+        f"- Passed: `{payload['summary']['passed']}`",
+        f"- Failed: `{payload['summary']['failed']}`",
+        f"- Total seconds: `{payload['summary']['total_seconds']}`",
+        "",
+        "| Task | Kind | Result | Seconds | Changed files | Artifact |",
+        "|---|---|---:|---:|---:|---|",
+    ]
+    for result in results:
+        status = "pass" if not result.errors else "fail"
+        artifact = result.artifact_path or result.project_dir or ""
+        lines.append(
+            f"| `{result.task_id}` | `{result.kind}` | {status} | "
+            f"{result.seconds:.2f} | {result.changed_files} | `{artifact}` |"
+        )
+    (run_dir / "summary.md").write_text("\n".join(lines) + "\n", encoding="utf-8")
+def main() -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--out-dir", type=Path, default=ROOT / "runs/public-demo-pack")
+    parser.add_argument("--openai-base-url", default="http://127.0.0.1:18181/v1")
+    parser.add_argument("--model", default="kaiju-coder-7")
+    parser.add_argument("--api-key-env", default="KAIJU_EVAL_API_KEY")
+    parser.add_argument("--planner-timeout", type=int, default=120)
+    args = parser.parse_args()
+    run_dir = args.out_dir / utc_stamp()
+    run_dir.mkdir(parents=True, exist_ok=True)
+    results: list[DemoResult] = []
+    manifests: list[dict] = []
+    for task in DEMO_TASKS:
+        started = time.time()
+        task_dir = run_dir / task.task_id
+        try:
+            result = run_task(
+                task.prompt,
+                task_dir,
+                kind=task.kind,
+                openai_base_url=args.openai_base_url,
+                model=args.model,
+                api_key_env=args.api_key_env,
+                planner_timeout=args.planner_timeout,
+            )
+            seconds = time.time() - started
+            manifests.append(json.loads(result_to_json(result)))
+            results.append(
+                DemoResult(
+                    task_id=task.task_id,
+                    kind=task.kind,
+                    seconds=round(seconds, 3),
+                    task_type=result.task_type,
+                    artifact_type=result.artifact_type,
+                    artifact_path=str(result.artifact_path) if result.artifact_path else None,
+                    project_dir=str(result.project_dir) if result.project_dir else None,
+                    changed_files=len(result.changed_files),
+                    errors=result.errors,
+                )
+            )
+        except Exception as exc:
+            seconds = time.time() - started
+            results.append(
+                DemoResult(
+                    task_id=task.task_id,
+                    kind=task.kind,
+                    seconds=round(seconds, 3),
+                    task_type=task.kind,
+                    artifact_type="error",
+                    artifact_path=None,
+                    project_dir=None,
+                    changed_files=0,
+                    errors=[str(exc)],
+                )
+            )
+    write_summary(run_dir, results, manifests)
+    failed = [result for result in results if result.errors]
+    print(f"Demo summary: {run_dir / 'summary.md'}")
+    return 1 if failed else 0
+if __name__ == "__main__":
+    raise SystemExit(main())

scripts/run_kaiju_public_opencode_smoke.py CHANGED Viewed

@@ -29,7 +29,7 @@ AGENT = "kaiju-coder-7"
 MODEL_ID = "kaiju-coder-7"
 EXPECTED_TEXT = "Kaiju Coder 7 public OpenCode smoke ok"
 DEFAULT_RUNS_DIR = ROOT / "runs/public-opencode-smoke"
-DEFAULT_BASE_URL = "http://100.109.109.14:18083/v1"
 @dataclass

 MODEL_ID = "kaiju-coder-7"
 EXPECTED_TEXT = "Kaiju Coder 7 public OpenCode smoke ok"
 DEFAULT_RUNS_DIR = ROOT / "runs/public-opencode-smoke"
+DEFAULT_BASE_URL = "http://127.0.0.1:18181/v1"
 @dataclass